首页 > 解决方案 > 如何分别聚合度量和绘图组

问题描述

我有这个数据集:

df = pd.DataFrame()
df['year'] = [2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011]
df['month'] = [1,2,3,4,5,6,1,2,3,4,5,6]
df['after'] = [0,0,0,1,1,1,0,0,0,1,1,1]
df['campaign'] = [0,0,0,0,0,0,1,1,1,1,1,1]
df['sales'] = [10000,11000,12000,10500,10000,9500,7000,8000,5000,6000,6000,7000]
df['date_m'] = pd.to_datetime(df.year.astype(str) + '-' + df.month.astype(str))

我想制作一个按月份和活动分组的线图,所以我尝试了这段代码:

df['sales'].groupby(df['date_m','campaign']).mean().plot.line()

但我收到此错误消息KeyError: ('date_m', 'campaign')。请,任何帮助将不胜感激。

标签: pythonpandasmatplotlibseaborn

解决方案


  • 绘图通常取决于 DataFrame 的形状。
  • .groupby创建一个长格式的 DataFrame,非常适合seaborn
  • .pivot_table创建一个宽格式的 DataFrame,它很容易与pandas.DataFrame.plot

.groupby数据框

  • df['sales'].groupby(...)不正确,因为df['sales']选择了数据框的一列;其他列均不可用
  • .groupby将 DataFrame 转换为长格式,非常适合使用seaborn.lineplot.
    • 指定hue要分隔的参数'campaign'
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# perform groupby and reset the index
dfg = df.groupby(['date_m','campaign'])['sales'].mean().reset_index()

# display(dfg.head())
      date_m  campaign  sales
0 2011-01-01         0  10000
1 2011-01-01         1   7000
2 2011-02-01         0  11000
3 2011-02-01         1   8000
4 2011-03-01         0  12000

# plot with seaborn
sns.lineplot(data=dfg, x='date_m', y='sales', hue='campaign')

在此处输入图像描述

.pivot_table数据框

  • .pivot_table正确塑造 DataFrame 以使用 绘图pandas.DataFrame.plot,并且它具有聚合参数。
    • DataFrame 被塑造成宽格式。
# pivot the dataframe into the correct shape for plotting
dfp = df.pivot_table(index='date_m', columns='campaign', values='sales', aggfunc='mean')

# display(dfp.head())
campaign        0     1
date_m                 
2011-01-01  10000  7000
2011-02-01  11000  8000
2011-03-01  12000  5000
2011-04-01  10500  6000
2011-05-01  10000  6000

# plot the dataframe
dfp.plot()

在此处输入图像描述

matplotlib直接绘图

fig, ax = plt.subplots(figsize=(8, 6))
for v in df.campaign.unique():
    # select the data based on the campaign
    data = df[df.campaign.eq(v)]
    # this is only necessary if there is more than one value per date
    data = data.groupby(['date_m','campaign'])['sales'].mean().reset_index()

    ax.plot('date_m', 'sales', data=data, label=f'{v}')
plt.legend(title='campaign')
plt.show()

在此处输入图像描述

笔记

  • 软件包版本:
    • pandas v1.2.4
    • seaborn v0.11.1
    • matplotlib v3.3.4

推荐阅读