首页 > 解决方案 > 如何在熊猫中检查文本列是否包含特定字符串

问题描述

我在熊猫中有以下数据框

 job_desig             salary
 senior analyst        12
 junior researcher     5
 scientist             20
 sr analyst            12

现在我想生成一列,该列将设置如下标志

 sr = ['senior','sr']
 job_desig             salary     senior_profile
 senior analyst        12         1  
 junior researcher     5          0
 scientist             20         0 
 sr analyst            12         1

我正在关注熊猫

 df['senior_profile'] = [1 if x.str.contains(sr) else 0 for x in 
                        df['job_desig']]

标签: pythonpandas

解决方案


|您可以通过for regex连接 list 的所有值OR,传递 toSeries.str.contains和 last 转换为 integer 以True/False进行1/0映射:

df['senior_profile'] = df['job_desig'].str.contains('|'.join(sr)).astype(int)

如有必要,使用单词边界:

pat = '|'.join(r"\b{}\b".format(x) for x in sr)
df['senior_profile'] = df['job_desig'].str.contains(pat).astype(int)

print (df)
           job_desig  salary  senior_profile
0     senior analyst      12               1
1  junior researcher       5               0
2          scientist      20               0
3         sr analyst      12               1

带有集合的解决方案,如果列表中只有一个单词值:

df['senior_profile'] = [int(bool(set(sr).intersection(x.split()))) for x in df['job_desig']]

推荐阅读