首页 > 解决方案 > 根据字符串在新列中添加值包含在另一列中

问题描述

我有 DataFarame

    date        descriptions           Code
1. 1/1/2020     this is aPple          6546
2. 21/8/2019    this is fan for him    4478
3. 15/3/2020    this is ball of hockey 5577
4. 12/2/2018    this is Green apple    7899
5. 13/3/2002    this is iron fan       7788
6. 14/5/2020    this ball is soft      9991

我想创建一个新列'category',其值将是如果描述列中有表达式apple,fan,ball(大写或小写字母),那么值A001,F009,B099应该分别输入到类别列中,必填DataFrame 将是。

    date        descriptions           Code   category
1. 1/1/2020     this is aPple          6546   A001
2. 21/8/2019    this is fan for him    4478   F009
3. 15/3/2020    this is ball of hockey 5577   B099
4. 12/2/2018    this is Green apple    7899   A001
5. 13/3/2002    this is iron fan       7788   F009
6. 14/5/2020    this ball is soft      9991   B099

标签: pythonpandas

解决方案


用于str.extract从基于字符串的列中获取子字符串

d = {'apple': 'A001', 'ball': 'B099', 'fan': 'F009'}

df['category'] = (
    df.descriptions
      .str.lower()
      .str.extract('(' + '|'.join(d.keys()) + ')')
      .squeeze().map(d)
)

推荐阅读