首页 > 解决方案 > Python将不相等的数据帧与真/假文本进行比较以获取列输出

问题描述

我有以下两个数据框

df1

Animal         Categ_Class
--------------------------
Cat            Soft
Dog            Soft
Dinosaur       Hard

df2

Text                               Animal_Exist
-----------------------------------------------
The Cat is purring                  True
Cat drank the milk                  True
Lizard is crawling over the wall    False
The dinosaurs are extinct now       True

df2 中的列派生自 df2.Text 中存在的 df1.Animal

我需要帮助来理解要编写的代码,我可以得到这样的输出

输出

Text                               Animal_Exist   Categ_Class
--------------------------------------------------------------
The Cat is purring                  True          Soft
Cat drank the milk                  True          Soft
Lizard is crawling over the wall    False         NA
The dinosaurs are extinct now       True          Hard

我是 python 新手,从几天以来一直在尝试多种方式。任何帮助表示赞赏。

问候。

标签: pythonpandas

解决方案


用于Series.str.extract获取Animal转换为小写的值,然后使用Series.map

import re

s = df1.assign(Animal = df1['Animal'].str.lower()).set_index('Animal')['Categ_Class']
pat = f'({"|".join(s.index)})'
cat = df2['Text'].str.extract(pat, expand=False, flags=re.I).str.lower().map(s)

df2 = df2.assign(Animal_Exist = cat.notna(), Categ_Class = cat)
print (df2)
                               Text  Animal_Exist Categ_Class
0                The Cat is purring          True        Soft
1                Cat drank the milk          True        Soft
2  Lizard is crawling over the wall         False         NaN
3     The dinosaurs are extinct now          True        Hard

推荐阅读