首页 > 解决方案 > 基于Dataframe中逗号分隔列中的文本分组

问题描述

我有一个包含逗号分隔列的数据表。我想根据逗号分隔列中的每个值按数据分组。

Date        Investment Type                                    Medium
1/1/2000    Mutual Fund, Stocks, Fixed Deposit, Real Estate    Own, Online,Through Agent
1/2/2000    Mutual Fund, Stocks, Real Estate                   Own
1/3/2000    Fixed Deposit                                      Online
1/3/2000    Mutual Fund, Fixed Deposit, Real Estate            Through Agent
1/2/2000    Stocks                                             Own, Online,                               Through Agent

我必须按中等和投资类型分组,如下所示。媒介作为我正在编写的软件的输入。

中等投资类型日期

Online        Stocks            1/2/2000,1/1/2000
Own           Mutual Fund       1/1/2000,1/3/2000

我已经使用收到的输入进行了搜索,并且确实得到了结果。但我无法进入我想要的聚合格式。我是 Python 和 Pandas 的新手。感谢你的帮助。谢谢

标签: pythonjsonpandasdataframe

解决方案


首先使用和正则表达式单词边界Medium逐列提取值:Series.str.findall

L = ['Online','Own']
pat = '|'.join(r"\b{}\b".format(x) for x in L)
df['New_Medium'] = df.pop('Medium').str.findall('('+ pat + ')').str.join(', ')
#remove rows with empty values
df = df[df['New_Medium'].astype(bool)]

print (df)
       Date                                  Investment Type   New_Medium
0  1/1/2000  Mutual Fund, Stocks, Fixed Deposit, Real Estate  Own, Online
1  1/2/2000                 Mutual Fund, Stocks, Real Estate          Own
2  1/3/2000                                    Fixed Deposit       Online
4  1/2/2000                                           Stocks  Own, Online

最后获取所有组合product和最后一个聚合join

from  itertools import product
df1 = pd.DataFrame([j for i in df.apply(lambda x: x.str.split(',\s*')).values 
                      for j in product(*i)], columns=df.columns)
df = df1.groupby(['Investment Type','New_Medium'])['Date'].agg(', '.join).reset_index()
print (df)
  Investment Type New_Medium                          Date
0   Fixed Deposit     Online            1/1/2000, 1/3/2000
1   Fixed Deposit        Own                      1/1/2000
2     Mutual Fund     Online                      1/1/2000
3     Mutual Fund        Own            1/1/2000, 1/2/2000
4     Real Estate     Online                      1/1/2000
5     Real Estate        Own            1/1/2000, 1/2/2000
6          Stocks     Online            1/1/2000, 1/2/2000
7          Stocks        Own  1/1/2000, 1/2/2000, 1/2/2000

推荐阅读