首页 > 解决方案 > 根据列表项匹配数据框列值并返回列表项

问题描述

我有一个数据框,我希望检查一列的值是否与列表的值匹配。我最初的尝试:

dataframe = pd.DataFrame({'Description': ['foo blah', 'new foo', 'newfoo', 'bar','random']})
keywords_list = ["foo", "bar"]

dataframe = dataframe['Description'].str.split(expand = True).isin(keywords_list).any()

print(dataframe)

我希望 dataframe['Description'] 的值与keywords_list 匹配。类似于替换的东西。以下方法均无效:

dataframe['Description'] = [x.strip().replace(' ', keywords_list[x]) for x in dataframe['Description']]

或者

dataframe['Description'] = np.where(df['Description'].isin(keywords_list), df['Site'], '')

因此,原始数据框:

  Description
0    foo blah
1     new foo
2      newfoo
3         bar
4      random

现在应该返回:

  Description
0    foo
1    foo
2    foo
3    bar
4    random

标签: pythondataframereplace

解决方案


我将构建一个每个关键字一列的辅助数据框,以便能够在列表上进行迭代并准备预期的结果。

这足以在这些列上使用 combine first 来获得预期的结果:

dataframe = pd.DataFrame({'Description': ['foo blah', 'new foo', 'newfoo', 'bar','random']})
keywords_list = ["foo", "bar"]

# compute a tmp dataframe with a column per keyword and having the keyword
#  as value is there is a match
tmp = pd.DataFrame(index = dataframe.index)
for k in keywords_list:
    tmp.loc[dataframe['Description'].str.contains(k), k] = k

# compute the result by combining all the keyword columns
tmp['resul'] = np.nan
for k in keywords_list:
    tmp['resul'] = tmp['resul'].combine_first(tmp[k])

# update the initial dataframe
dataframe['Description'] = tmp['resul'].combine_first(dataframe['Description'])

它按预期给出:

  Description
0         foo
1         foo
2         foo
3         bar
4      random

推荐阅读