用于测试的数据。

import pandas as pd

data1 = {'Bucket':['basket', 'bushel', 'peck', 'box'], 'Summary':['This is a basket of red apples. They are sour.', 'We found a bushel of fruit. They are red and sweet.', 'There is a peck of pears that taste sweet. They are very green.', 'We have a box of plums. They are sour and have a great color.']}

data2 = {'Category':['Fruit', 'Color'], 'Filters':['apple, pear, plum, grape', 'red, purple, green']}

df1 = pd.DataFrame(data1)

df2 = pd.DataFrame(data2)

print(df1)

   Bucket                                            Summary
0  basket     This is a basket of red apples. They are sour.
1  bushel  We found a bushel of fruit. They are red and s...
2    peck  There is a peck of pears that taste sweet. The...
3     box  We have a box of plums. They are sour and have...

print(df2)

  Category                   Filters
0    Fruit  apple, pear, plum, grape
1    Color        red, purple, green

这行脚本将表中的 Category 列转换为列表以供以后使用。

category_list =  df2['Category'].values

category_list = list(set(category_list))

尝试匹配文本。

for item in category_list:

    item = df2.loc[df2['Category'] == item]

    filter_list =  item['Filters'].values

    filter_list = list(set(filter_list))

    df1 = df1 [df1 ['Summary'].isin(filter_list)]

我希望第一个数据框有分配给它的类别，用逗号分隔。

结果：

Bucket      Category                                            Summary
0  basket  Fruit, Color     This is a basket of red apples. They are sour.
1  bushel         Color  We found a bushel of fruit. They are red and s...
2    peck  Fruit, Color  There is a peck of pears that taste sweet. The...
3     box         Fruit  We have a box of plums. They are sour and have...

我希望这很清楚。我已经用头撞了一个星期了。

先感谢您

标签： pythonpandasdataframefiltering

解决方案

使用pandas.Series.str.contains通过循环检查过滤器：

df2['Filters']=[key.replace(' ','') for key in df2['Filters']]
df2['Filters']=df2['Filters'].apply(lambda x : x.split(','))
Fruit=pd.DataFrame([df1['Summary'].str.contains(key) for key in df2.set_index('Category')['Filters']['Fruit']]).any()
Color=pd.DataFrame([df1['Summary'].str.contains(key) for key in df2.set_index('Category')['Filters']['Color']]).any()
print(Fruit)
print(Color)

0     True
1    False
2     True
3     True
dtype: bool 

0     True
1     True
2     True
3    False
dtype: bool

然后使用np.where和Series.str.cat来获取数据帧输出：

df1['Fruit']=np.where(Fruit,'Fruit','')
df1['Color']=np.where(Color,'Color','')
df1['Category']=df1['Fruit'].str.cat(df1['Color'],sep=', ')
df1=df1[['Bucket','Category','Summary']]
print(df1)

   Bucket      Category                                            Summary
0  basket  Fruit, Color     This is a basket of red apples. They are sour.
1  bushel       , Color  We found a bushel of fruit. They are red and s...
2    peck  Fruit, Color  There is a peck of pears that taste sweet. The...
3     box       Fruit,   We have a box of plums. They are sour and have...

到 n 类别过滤器：

df2['Filters']=[key.replace(' ','') for key in df2['Filters']]
df2['Filters']=df2['Filters'].apply(lambda x : x.split(','))
Categories=[pd.Series(np.where(( pd.DataFrame([df1['Summary'].str.contains(key) for key in df2.set_index('Category')['Filters'][category_filter]]).any() ),category_filter,'')) for category_filter in df2['Category']]
df1['Category']=Categories[0].str.cat(Categories[1:],sep=', ')
df1=df1.reindex(columns=['Bucket','Category','Summary'])
print(df1)

   Bucket      Category                                            Summary
0  basket  Fruit, Color     This is a basket of red apples. They are sour.
1  bushel       , Color  We found a bushel of fruit. They are red and s...
2    peck  Fruit, Color  There is a peck of pears that taste sweet. The...
3     box       Fruit,   We have a box of plums. They are sour and have...

python - 如果数据框中包含来自另一个数据框中的任何元素，如何在数据框中分配类别？

问题描述

用于测试的数据。

这行脚本将表中的 Category 列转换为列表以供以后使用。

尝试匹配文本。

解决方案

推荐阅读