首页 > 解决方案 > How to aggregate data and plot all groups

问题描述

I have the following dataset :

df=pd.read_csv('https://raw.githubusercontent.com/michalis0/DataMining_and_MachineLearning/master/data/sales.csv')

I want to visualize the average sales by types of Customer (Segment).

I calculated the average sales by segment with this :

average_sales = df.groupby(['Segment','OrderYear'],as_index=True)['Sales'].agg({"mean"})
print(average_sales)

And I get the following output:

                           mean
Segment     OrderYear            
Consumer    2015       251.633302
            2016       238.200804
            2017       223.269145
            2018       200.469005
Corporate   2015       212.641424
            2016       189.902305
            2017       263.348456
            2018       243.634951
Home Office 2015       290.234240
            2016       222.101830
            2017       226.382196
            2018       242.532951

Now I want to plot it in a line chart with the year on the x_axis and average sales on y_axis but everytime I try I only get one line for mean while I want one line for each Segment. One line for 'Consumer', one for 'Corporate', one for 'Home Office'. I think it might be because Segment is an index rather than a column but I still can't plot by segment.

标签: pythonpandasmatplotlibplotbar-chart

解决方案


  • 最好用 来塑造日期.pivot_table,这会导致数据框处于正确的形状以进行绘制。
    • 使用现有的 groupby 进行绘图,需要 unstacking:
      • df.groupby(['Segment','OrderYear'], as_index=True)['Sales'].agg({"mean"}).unstack(level=0).plot()
  • 直接ptpandas.DataFrame.plot
  • 这种类型的聚合数据应以分组条形显示,但也提供了折线图。
    • 多段线图更难阅读
    • 斜率暗示的信息可能不完全正确
  • 如果您有兴趣添加注释,请参阅如何绘制和注释分组条形 seaborn / matplotlib如何绘制和注释分组条形图。
  • python 3.8.12, pandas 1.3.4,中测试matplotlib 3.4.3
import pandas as pd

df=pd.read_csv('https://raw.githubusercontent.com/michalis0/DataMining_and_MachineLearning/master/data/sales.csv')

# convert to datetime
df['Order Date'] = pd.to_datetime(df['Order Date'])

# extract year
df['OrderYear'] = df['Order Date'].dt.year

# pivot table
pt = df.pivot_table(index='OrderYear', columns='Segment', values='Sales', aggfunc='mean')

# display(pt)
Segment      Consumer   Corporate  Home Office
OrderYear                                     
2015       251.633302  212.641424   290.234240
2016       238.629760  196.834867   222.101830
2017       223.269145  264.486862   228.730257
2018       200.368580  243.595111   242.532951

# plot bar
ax1 = pt.plot(kind='bar', rot=0)

# or plot line
ax2 = pt.plot(xticks=pt.index)

在此处输入图像描述

在此处输入图像描述

样本数据

  • 如果 github repo 不再可用
Order Date,Segment,Sales
08/04/2018,Home Office,195.76
23/05/2015,Consumer,17.96
01/12/2018,Consumer,406.368
26/03/2016,Consumer,40.032
04/11/2015,Corporate,8.376
14/11/2017,Consumer,48.576
08/05/2017,Consumer,10.368
25/11/2015,Consumer,539.92
17/06/2017,Consumer,319.41
14/01/2015,Corporate,61.96
01/12/2018,Consumer,105.584
24/11/2016,Consumer,91.392
12/07/2015,Consumer,123.136
26/01/2018,Consumer,11.84
01/03/2015,Consumer,137.352
06/12/2016,Consumer,19.92
07/04/2017,Consumer,3.64
17/09/2016,Consumer,4228.704
29/09/2018,Home Office,7.968
17/08/2018,Corporate,2518.29
04/11/2015,Home Office,275.94
20/06/2017,Consumer,17.712
09/09/2018,Consumer,6.56
03/03/2017,Corporate,563.43
11/10/2017,Consumer,27.72
08/12/2018,Consumer,19.44
01/06/2015,Home Office,47.88
28/10/2017,Corporate,756.8
31/07/2016,Consumer,2309.65
08/11/2016,Corporate,4.712
20/10/2016,Corporate,16.02
23/12/2018,Corporate,367.96
15/02/2016,Corporate,134.97
29/12/2015,Consumer,23.976
05/10/2018,Home Office,39.92
25/06/2016,Home Office,31.104
28/10/2017,Consumer,47.952
25/09/2015,Home Office,3.264
18/12/2016,Corporate,18.432
07/09/2018,Consumer,25.16
26/06/2017,Home Office,8.02
16/06/2018,Consumer,18.54
06/12/2016,Consumer,198.272
04/05/2018,Corporate,9.396
23/10/2018,Consumer,10.272
21/02/2017,Corporate,39.98
22/07/2015,Home Office,19.68
29/09/2018,Home Office,27.968
03/08/2015,Consumer,218.75
07/10/2018,Home Office,18.936
18/04/2016,Consumer,115.44
04/04/2016,Consumer,644.076
03/07/2018,Home Office,24.56
10/11/2016,Consumer,577.584
12/05/2018,Consumer,87.4
21/02/2017,Home Office,3.762
18/08/2018,Consumer,21.38
13/07/2016,Consumer,11.808
17/12/2018,Consumer,66.284
02/12/2015,Corporate,58.36
01/12/2015,Consumer,45.84
23/05/2016,Home Office,850.5
14/10/2015,Corporate,22.92
23/10/2018,Corporate,11.56
20/07/2015,Corporate,41.94
16/06/2016,Consumer,133.98
02/09/2015,Consumer,21.24
11/11/2017,Corporate,95.968
03/10/2015,Home Office,6.192
19/11/2018,Consumer,25.06
25/08/2015,Consumer,40.096
29/12/2018,Consumer,34.58
05/12/2018,Consumer,11.07
23/07/2017,Consumer,4.448
05/03/2016,Consumer,11.212
09/06/2015,Consumer,911.424
21/11/2016,Consumer,10.92
13/02/2018,Consumer,10.71
27/04/2016,Consumer,1379.92
30/10/2018,Home Office,33.94
08/08/2016,Consumer,447.86
07/12/2016,Consumer,79.92
21/08/2018,Corporate,33.18
26/01/2015,Home Office,19.44
09/06/2015,Consumer,1706.184
26/09/2016,Consumer,79.056
05/04/2016,Home Office,547.136
27/10/2018,Corporate,5.607
03/07/2016,Consumer,294.93
16/11/2015,Home Office,169.45
08/12/2015,Corporate,60.416
23/11/2016,Consumer,16.56
05/10/2018,Home Office,75.792
19/03/2016,Consumer,17.568
21/08/2017,Corporate,2887.056
25/04/2016,Corporate,21.34
14/05/2017,Corporate,4.768
03/11/2016,Home Office,42.6
21/10/2017,Consumer,22.92
10/07/2018,Corporate,41.91
16/11/2018,Consumer,811.28
17/09/2018,Corporate,10.776
01/12/2018,Home Office,62.958
07/12/2018,Consumer,374.376
19/11/2018,Consumer,821.88
16/06/2018,Consumer,23.92
19/05/2017,Consumer,242.9
06/06/2017,Corporate,105.52
05/12/2015,Corporate,29.94
12/08/2018,Consumer,299.99
08/04/2018,Home Office,41.95
04/10/2015,Consumer,95.648
25/11/2017,Consumer,194.352
18/09/2016,Corporate,11.68
20/12/2016,Home Office,11.696
24/04/2017,Consumer,3.984
14/05/2015,Corporate,310.88
22/09/2015,Consumer,579.528
02/05/2015,Consumer,26.136
19/08/2015,Corporate,69.456
08/07/2018,Corporate,28.91
26/11/2015,Corporate,7.312
24/06/2018,Consumer,21.744
12/11/2018,Consumer,221.024
27/08/2016,Consumer,3.08
18/11/2018,Consumer,127.386
21/11/2016,Corporate,246.1328
12/05/2017,Consumer,120.0
30/12/2017,Home Office,481.32
20/07/2016,Consumer,913.43
23/11/2018,Corporate,10.688
23/04/2015,Home Office,22.336
17/09/2016,Consumer,3.264
20/10/2016,Consumer,24.56
04/06/2017,Consumer,14.94
19/11/2016,Consumer,5.984
30/07/2016,Consumer,209.93
17/09/2016,Consumer,110.96
12/10/2016,Consumer,263.96
02/09/2017,Consumer,65.94
12/10/2016,Consumer,81.96
14/05/2016,Home Office,198.272
09/12/2018,Corporate,37.208
23/05/2017,Consumer,122.382
23/09/2018,Consumer,199.95
28/12/2015,Corporate,704.25
19/01/2018,Consumer,6.0
12/10/2016,Home Office,19.9
14/11/2016,Corporate,37.0
03/10/2018,Home Office,6.63
20/07/2015,Consumer,104.85
10/09/2015,Consumer,1487.04
12/10/2018,Corporate,39.984
23/12/2015,Corporate,56.52
17/11/2016,Consumer,106.32
18/03/2015,Home Office,1856.19
01/09/2016,Home Office,1088.76
05/07/2016,Home Office,19.0
03/11/2015,Home Office,6.72
28/05/2017,Consumer,22.72
13/06/2018,Home Office,164.736
26/09/2016,Consumer,239.8
12/10/2018,Consumer,17.9
02/10/2018,Corporate,21.984
12/11/2018,Home Office,23.12
21/01/2018,Home Office,242.94
09/08/2015,Consumer,2060.744
25/04/2016,Consumer,128.058
04/03/2018,Corporate,15.25
04/08/2017,Home Office,35.06
18/12/2016,Corporate,55.936
19/12/2016,Consumer,675.96
12/07/2016,Consumer,659.168
06/04/2015,Corporate,70.95
19/05/2018,Home Office,681.408
09/07/2016,Consumer,153.36
21/08/2016,Home Office,4.28
22/05/2018,Consumer,22.344
26/08/2015,Consumer,17.34
19/09/2016,Corporate,66.36
06/11/2018,Home Office,449.568
21/11/2017,Consumer,21.568
24/12/2017,Home Office,27.882
09/07/2015,Home Office,23.92
05/08/2016,Corporate,33.488
20/11/2017,Consumer,2.628
07/03/2015,Corporate,481.568
25/11/2017,Consumer,59.98
14/07/2018,Consumer,276.69
03/10/2015,Consumer,14.48
28/07/2017,Home Office,302.72
05/09/2017,Corporate,43.6
16/03/2016,Home Office,17.52
02/09/2017,Home Office,84.272
22/06/2015,Consumer,170.058
08/07/2018,Home Office,86.376
01/11/2016,Home Office,3.168
04/11/2017,Consumer,11.376
18/12/2018,Consumer,46.672
05/12/2017,Consumer,465.18

推荐阅读