首页 > 解决方案 > 用熊猫和正则表达式提取字母直到逗号

问题描述

我有这个玩具数据集,位置来自法国:

df = pd.DataFrame({'id':[1,2,3,4,5,6],'location':['Noisy-le-Sec, Seine-Saint-Denis, Île-de-France, France',\
                                                 'France','Paris, Paris, Île-de-France, France',\
                                                 'Orléans, Loiret, Centre-Val de Loire, France',\
                                                 'Dunkirk, Nord, Hauts-de-France, France',\
                                                 'Paris, France']})
df
    id  location
0   1   Noisy-le-Sec, Seine-Saint-Denis, Île-de-France...
1   2   France
2   3   Paris, Paris, Île-de-France, France
3   4   Orléans, Loiret, Centre-Val de Loire, France
4   5   Dunkirk, Nord, Hauts-de-France, France
5   6   Paris, France

我想创建一个额外的列redux,在出现逗号之前取第一个单词(例外是没有逗号时,然后我什么都不抓住)。我用正则表达式尝试了这个,但我得到了 NaN:

df['redux'] = df['location'].str.extract(r'(^w,)')

    id  location                                            redux
0   1   Noisy-le-Sec, Seine-Saint-Denis, Île-de-France...   NaN
1   2   France                                              NaN
2   3   Paris, Paris, Île-de-France, France                 NaN
3   4   Orléans, Loiret, Centre-Val de Loire, France        NaN
4   5   Dunkirk, Nord, Hauts-de-France, France              NaN
5   6   Paris, France                                       NaN

预期结果是:

    id  location                                            redux
0   1   Noisy-le-Sec, Seine-Saint-Denis, Île-de-France...   Noisy-le-Sec
1   2   France                                              
2   3   Paris, Paris, Île-de-France, France                 Paris
3   4   Orléans, Loiret, Centre-Val de Loire, France        Orléans
4   5   Dunkirk, Nord, Hauts-de-France, France              Dunkirk
5   6   Paris, France                                       Paris

请,任何帮助或建议将不胜感激。

标签: pythonregexpandas

解决方案


df['redux'] = df['location'].str.extract('(.+?),')

屈服

   id                                           location         redux
0   1  Noisy-le-Sec, Seine-Saint-Denis, Île-de-France...  Noisy-le-Sec
1   2                                             France           NaN
2   3                Paris, Paris, Île-de-France, France         Paris
3   4       Orléans, Loiret, Centre-Val de Loire, France       Orléans
4   5             Dunkirk, Nord, Hauts-de-France, France       Dunkirk
5   6                                      Paris, France         Paris

推荐阅读