首页 > 解决方案 > 在 pandas 列中的 split() 方法之后提取第一个单词

问题描述

我有一个 df 列的句子(df.sentence),如下所示:

sentence 
His name is Paul. He's in jail.
Her name is Allison. She's a doctor.
He is named Steve. He's an engineer.

等等

目前,我有一个循环设置如下来提取名称:

for i in range(len(df.sentence):
  if 'name is' in df['sentence'][i]:
    name = re.findall(r'(?<=name is\s)[a-z]+',str(df['sentence'][i]),re.I)

然而,这不起作用。或者我可能需要帮助正确设置正则表达式。

更新(不正确输出):

for i in range(len(df)):
  if '[name is|named]' in df['sentence'][i]:
    name = df.sentence.i.str.extract('[name is|named]\s(.*?)(?=\.|\s)')
  else:
    pass

标签: stringpandas

解决方案


使用后向断言

df.str.extract(r'(?<= name is |is named )(\w+)')

输出:

         0
0     Paul
1  Allison
2    Steve

推荐阅读