首页 > 解决方案 > Remove substring and merge rows in python/pandas

问题描述

my df:

   description               total      average      number
0 NFL football (white) L     49693        66       1007
1 NFL football (white) XL    79682        74       1198
2 NFL football (white) XS    84943        81       3792
3 NFL football (white) S     78371        73       3974
4 NFL football (blue) L      99482        92       3978
5 NFL football (blue) M      32192        51       3135
6 NFL football (blue XL      75343        71       2879
7 NFL football (red) XXL     84391        79       1192
8 NFL football (red) XS      34727        57       992
9 NFL football (red) L       44993        63       1562

What I would like to do is remove the sizes and be left with a sum total, mean average and sum number for each colour of football:

   description               total      average    number
0 NFL football (white)       292689       74       9971
1 NFL football (blue)        207017       71       9992
2 NFL football (red)         164111       66       3746

Any suggestions much appreciated!

标签: pythonpandasmergesubstring

解决方案


您可以groupby重新格式化的description字段(不修改 的原始内容description),其中重新格式化是通过用空格分隔并使用.str.split(),排除最后一部分来完成的.str.join()。然后与 聚合.agg()

通过四舍五入和强制转换以与.round()和整数进一步将输出重新格式化为所需的输出.astype()

(df.groupby(
            df['description'].str.split(' ').str[:-1].str.join(' ')
           )
   .agg({'total': 'sum', 'average': 'mean', 'number': 'sum'})
   .round(0)
   .astype(int)
).reset_index()

结果:

            description   total  average  number
0   NFL football (blue)  207017       71    9992
1    NFL football (red)  164111       66    3746
2  NFL football (white)  292689       74    9971

推荐阅读