首页 > 解决方案 > 从列中提取值

问题描述

我有一列包含几个由连字符分隔的数据。例如,

column A
TTT-Changing Car-BBBB-KKKK
TTT-KKKK - Changing device-KKKK
Releasing device-RRRR-KKKK-TTTT
RRRR-BBBB-Switching Car-TTTT
Login issue -RRRR-KKKK-TTTT
CCCC-Activation issue-RRRR-KKKK-TTTT

我有一个单词列表,我想从 A 列查找到 B 列。举个例子,如果 A 列包含“Changing”或“change”或“a change”,它会在 B 列中返回“Change”,如果它包含“激活”或“注册”在 B 列等中返回“激活”...

我正在寻找类似于 [if(isnumber(search( excel 中的公式 ] ) 但可以在 python 中使用的东西。

谢谢,

标签: pythonpython-3.xpandasdataframe

解决方案


您可以使用以下extract功能:

df['column B'] = df['column A'].str.extract('(Changing[^-]*)')

df
                               column A         column B
0            TTT-Changing Car-BBBB-KKKK     Changing Car
1       TTT-KKKK - Changing device-KKKK  Changing device
2       Releasing device-RRRR-KKKK-TTTT              NaN
3          RRRR-BBBB-Switching Car-TTTT              NaN
4           Login issue -RRRR-KKKK-TTTT              NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT              NaN

编辑

如果要替换内容,请考虑使用字典:

dct = {'changing': 'Change',
       'change':'Change',
       'activation':'Activation',
       'registration':'Activation'}

pat = f"(?i).*\\b({'|'.join(dct.keys())})\\b.*"

df['column A'].str.replace(pat, lambda x: dct.get(x.group(1).lower(), None))
0                             Change
1                             Change
2    Releasing device-RRRR-KKKK-TTTT
3       RRRR-BBBB-Switching Car-TTTT
4        Login issue -RRRR-KKKK-TTTT
5                         Activation
Name: column A, dtype: object

推荐阅读