首页 > 解决方案 > 如何分组两列和单词计算熊猫(或python)中的最后一列

问题描述

1)例如,我有 3 列,如下所示

 date      categories     contents  
 2018-01   fish_tank1     Goldfish Gombessa Goosefish Gopher rockfish   
 2018-01   fish_tank2     Grass carp Goosefish Grayling mullet shark  
 2018-02   fish_tank2     Goosefish Gopher rockfish Grayling mullet shark  
 2018-01   fish_tank1     carp Goosefish Grayling Goldfish Gombessa   
 2018-02   fish_tank2     carp Goosefish Grayling Grass carp Goosefish  
 2018-03   fish_tank3     Grass carp Goosefish Grayling mullet shark  
 2018-03   fish_tank2     Goosefish Gopher rockfish Goosefish Grayling  

2)我有点想做df.groupby(['date','categories']).agg(df.contents.str.split(expand=True).stack().value_counts()得到类似下面的结果。但最近几天我无法弄清楚这一点。

    date   categories       contents  
 2018-01   fish_tank1  2    Goldfish    2   
                            Gombessa    2   
                            Goosefish   2    
                            Gopher      1   
                            rockfish    1   
                            ......   
           fish_tank2      Grass    1   
                           carp     1   
                           .....  
 2018-02   fish_tank2     Goosefish    3  
                          Grayling     2  
                          Gopher       1  
                          ........    
........................  

3)谁能给我洞察力以得到我想做的结果?

标签: pythonpandas

解决方案


利用 -

from collections import Counter
df['contents2'] = df['contents'].str.split()
df.groupby(['date', 'categories'])['contents2'].apply(lambda x: Counter(x.sum()))

输出

date     categories           
2018-01  fish_tank1  Goldfish     2.0
                     Gombessa     2.0
                     Goosefish    2.0
                     Gopher       1.0
                     Grayling     1.0
                     carp         1.0
                     rockfish     1.0
         fish_tank2  Goosefish    1.0
                     Grass        1.0
                     Grayling     1.0
                     carp         1.0
                     mullet       1.0
                     shark        1.0
2018-02  fish_tank2  Goosefish    3.0
                     Gopher       1.0
                     Grass        1.0
                     Grayling     2.0
                     carp         2.0
                     mullet       1.0
                     rockfish     1.0
                     shark        1.0
2018-03  fish_tank2  Goosefish    2.0
                     Gopher       1.0
                     Grayling     1.0
                     rockfish     1.0
         fish_tank3  Goosefish    1.0
                     Grass        1.0
                     Grayling     1.0
                     carp         1.0
                     mullet       1.0
                     shark        1.0
Name: contents2, dtype: float64

推荐阅读