首页 > 解决方案 > 我可以使用 matplotlib 和 pandas 为每个 x 刻度设置默认值吗?

问题描述

我有以下代码:

# Ratings by day, divided by Staff member

from datetime import datetime as dt

by_staff = df.groupby('User ID')

plt.figure(figsize=(15,8))

# Those are used to calculate xticks and yticks
xmin, xmax = pd.to_datetime(dt.now()), pd.to_datetime(0)
ymin, ymax = 0, 0

for index, data in by_staff:

    by_day = data.groupby('Date')
    
    x = pd.to_datetime(by_day.count().index)
    y = by_day.count()['Value']
    
    xmin = min(xmin, x.min())
    xmax = max(xmax, x.max())
    
    ymin = min(ymin, min(y))
    ymax = max(ymax, max(y))

    plt.plot_date(x, y, marker='o', label=index, markersize=12)

plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)

ticks = pd.date_range(xmin, xmax, freq='D')

plt.xticks(ticks, rotation=60)
plt.yticks(range(ymin, ymax + 1))

plt.gcf().autofmt_xdate()

plt.grid()

plt.legend([a for a, b in by_staff],
          title="Ratings given",
          loc="center left",
          bbox_to_anchor=(1, 0, 0.5, 1))

plt.show()

如果当天没有数据,我想将特定 xtick 处显示的值设置为 0。目前,这是显示的情节:

显示的情节

我尝试了一些谷歌搜索,但我似乎无法正确解释我的问题。我怎么能解决这个问题?

我的数据集:https ://cdn.discordapp.com/attachments/311932890017693700/800789506328100934/sample-ratings.csv

标签: pythonpandasmatplotlib

解决方案


让我们尝试通过让 pandas 聚合数据来简化任务。我们同时按日期和用户 ID 分组,然后取消堆叠数据框。这允许我们用一个像 0 这样的预设值来填充缺失的数据点。形式x = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0)a= df.groupby(["Date",'User ID']), b=a.count(), c=b.Value,的紧凑链x=c.unstack(fill_value=0)。您可以打印出这些链式 pandas 操作的每个中间结果,以查看它的作用。

from matplotlib import pyplot as plt
import pandas as pd

df = pd.read_csv("test.csv", sep=",", parse_dates=["Date"])

#by_staff = df.groupby(["Date",'User ID']) - group entries by date and ID
#.count - count identical date-ID pairs
#.Value - use only this column
#.unstack(fill_value=0) bring resulting data from long to wide form
#and fill missing data with zero
by_staff = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0)

ax = by_staff.plot(marker='o', markersize=12, linestyle="None", figsize=(15,8))

plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)

#labeling only the actual rating values shown in the grid
plt.yticks(range(df.Value.max() + 1))
#this is not really necessary, it just labels zero differently
#labels = ["No rating"] + [str(i) for i in range(1, df.Value.max() + 1)]
#ax.set_yticklabels(labels)

plt.gcf().autofmt_xdate()
plt.grid()

plt.show()

样本输出: 在此处输入图像描述

显然,您不会看到多个条目。


推荐阅读