首页 > 解决方案 > Pandas:为元素列表的数据框赋值(如果存在)

问题描述

我正在尝试从列表中的元素分配值,如果它是startswith这个子字符串到熊猫数据框列

代码:

searchwords = ['harry','harry potter','lotr','secret garden']

l1 = [1, 2, 3,4,5]
l2 = ['Harry Potter is a great book',
      'Harry Potter is very famous',
      'I enjoyed reading Harry Potter series',
      'LOTR is also a great book along',
      'Have you read Secret Garden as well?'
]
df = pd.DataFrame({'id':l1,'text':l2})
df['text'] = df['text'].str.lower()

数据预览:

   id   text
0   1   harry potter is a great book
1   2   harry potter is very famous
2   3   i enjoyed reading harry potter series
3   4   lotr is also a great book along
4   5   have you read secret garden as well?

试过:

df.loc[df['text'].str.startswith(tuple(searchwords)),'tags'] if (df['text'].str.startswith(tuple(searchwords))) == True else np.NaN

错误:ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().我做错了什么?== True我认为您可以在 if/else 逻辑中分配值

寻找这样的输出:

   id   text                                     tags
0   1   harry potter is a great book             harry;harry potter
1   2   harry potter is very famous              haryy;harry potter
2   3   i enjoyed reading harry potter series    NaN
3   4   lotr is also a great book along          lotr
4   5   have you read secret garden as well?     NaN

标签: pythonpandas

解决方案


尝试使用apply

df['tags'] = df.text.apply(
    lambda text: [searchword for searchword in searchwords if text.startswith(searchword)]
)

这将为您提供tags包含相应标签列表的列,如下所示: 在此处输入图像描述

如果您更喜欢nan空列表[],则可以在第二步中这样做。

df['tags'] = df.tags.apply(
    lambda current_tag: float('nan') if len(current_tag)==0 else current_tag
)

推荐阅读