首页 > 解决方案 > Langid detect language showing wrong result in #Python

问题描述

I need to detect the language of the strings stored below.

0                                                          nice
1                                                       Insane3
2                                                           NaN
3                                                @bertelsen1986
4                       20 or 30 mm rise on the Renthal Fatbar?
Name: Comments, dtype: object

Using langid module to detect the language of the comments stored in df['Comments']:

import langid
for row in df['Comments']:
  lang, log_prob = langid.classify(row)
  TM['Detected_Language']=lang

Below is the result, which is wrong:

    Comments                                          Detected_Language
0   nice                                                      zh
1   Insane3                                                   zh
2   ❤️                                                   zh
3   @bertelsen1986                                            zh
4   20 or 30 mm rise on the Renthal Fatbar?                   zh

The comments should return an 'en' instead. (In the dataset, there are comments with other languages)

标签: pythonpandasnlp

解决方案


推荐阅读