首页 > 解决方案 > 搜索特定文本集的列,如果找到文本,则在新列中输入新的文本字符串 pandas

问题描述

我正在尝试查找在我的数据框中的任何列中出现“gas”、“diesel”或“ev”一词的实例(不区分大小写)。如果在列中找到这些单词的任何版本,我想在名为“FUEL”的新列中输入燃料类型的缩写。

excerpt of my dataframe

SUMN                 SOUN               MATN   
Light duty vehicle   Diesel Tire wear   Rubber
Heavy duty diesel    Non-catalyst       Diesel
Light duty truck     catalyst           Gasoline
Medium duty vehicle  EV brake wear      brakes

What I'm hoping to output
SUMN                 SOUN               MATN      FUEL
Light duty vehicle   Diesel Tire wear   Rubber    DSL
Heavy duty diesel    Non-catalyst       Diesel    DSL
Light duty truck     catalyst           Gasoline  GAS
Medium duty vehicle  EV brake wear      brakes    ELEC

我该如何做到这一点?

我已经开始能够查看一种类型的字符串的一列,但是对于如何超越这一点感到困惑。

df['FUEL'] = df['SUMN'].str.contains('diesel', case=False)

标签: pythonpython-3.xpandas

解决方案


这是一种使用applywithstr.contains检查每个单词的所有列的方法。最后,我们将单词映射到正确的单词,例如ev -> ELECT

请注意,我?i在我的正则表达式中使用,这使得它区分大小写:

words = ['gas', 'diesel', 'ev']
mapping = {'gas':'GAS', 'diesel':'DSL', 'ev':'ELEC'}

for word in words:
    m = df.apply(lambda x: x.str.contains(f'(?i)({word})')).any(axis=1)
    df.loc[m, 'FUEL'] = mapping[word]

输出

                  SUMN              SOUN      MATN  FUEL
0   Light duty vehicle  Diesel Tire wear    Rubber   DSL
1    Heavy duty diesel      Non-catalyst    Diesel   DSL
2     Light duty truck          catalyst  Gasoline   GAS
3  Medium duty vehicle     EV brake wear    brakes  ELEC

推荐阅读