python - 如何重新索引月份和年份列以插入丢失的数据?
问题描述
考虑以下数据框:
df = pd.read_csv("data.csv")
print(df)
Category Year Month Count1 Count2
0 a 2017 December 5 9
1 a 2018 January 3 5
2 b 2017 October 7 6
3 b 2017 November 4 1
4 b 2018 March 3 3
我想实现这一点:
Category Year Month Count1 Count2
0 a 2017 October
1 a 2017 November
2 a 2017 December 5 9
3 a 2018 January 3 5
4 a 2018 February
5 a 2018 March
6 b 2017 October 7 6
7 b 2017 November 4 1
8 b 2017 December
9 b 2018 January
10 b 2018 February
11 b 2018 March 3 3
到目前为止,我已经完成了:
months = {"January": 1, "February": 2, "March": 3, "April": 4, "May": 5, "June": 6, "July": 7, "August": 8, "September": 9, "October": 10, "November": 11, "December": 12}
df["Date"] = pd.to_datetime(10000 * df["Year"] + 100 * df["Month"].apply(months.get) + 1, format="%Y%m%d")
date_min = df["Date"].min()
date_max = df["Date"].max()
new_index = pd.MultiIndex.from_product([df["Category"].unique(), pd.date_range(date_min, date_max, freq="M")], names=["Category", "Date"])
df = df.set_index(["Category", "Date"]).reindex(new_index).reset_index()
df["Year"] = df["Date"].dt.year
df["Month"] = df["Date"].dt.month_name()
df = df[["Category", "Year", "Month", "Count1", "Count2"]]
在上个月(3 月)生成的数据框中缺失,所有“Count1”、“Count2”值均为 NaN
解决方案
由于您要填写类别以及缺少的日期,这使情况变得复杂。一种解决方案是为每个类别创建一个单独的数据框,然后将它们连接在一起。
df['Date'] = pd.to_datetime('1 '+df.Month.astype(str)+' '+df.Year.astype(str))
df_ix = pd.Series(1, index=df.Date.sort_values()).resample('MS').first().reset_index()
df_list = []
for cat in df.Category.unique():
df_temp = (df.query('Category==@cat')
.merge(df_ix, on='Date', how='right')
.get(['Date','Category','Count1','Count2'])
.sort_values('Date')
)
df_temp.Category = cat
df_temp = df_temp.fillna(0)
df_temp.loc[:,['Count1', 'Count2']] = df_temp.get(['Count1', 'Count2']).astype(int)
df_list.append(df_temp)
df2 = pd.concat(df_list, ignore_index=True)
df2['Month'] = df2.Date.apply(lambda x: x.strftime('%B'))
df2['Year'] = df2.Date.apply(lambda x: x.year)
df2.drop('Date', axis=1)
# returns:
Category Count1 Count2 Month Year
0 a 0 0 October 2017
1 a 0 0 November 2017
2 a 5 9 December 2017
3 a 3 5 January 2018
4 a 0 0 February 2018
5 a 0 0 March 2018
6 b 7 6 October 2017
7 b 4 1 November 2017
8 b 0 0 December 2017
9 b 0 0 January 2018
10 b 0 0 February 2018
11 b 3 3 March 2018
推荐阅读
- php - 以编程方式发送 eth php
- javascript - Redux:dispatch(...).then 不是函数
- mysql - Mysql插入顺序
- numpy - GDAL 2.3.1 已安装,但 llinux 终端正在使用 GDAL 2.2.2
- python-3.x - 我想在 python 中迭代我的一组按钮点击
- error-handling - 创建使用 try 运算符的闭包时,如何修复错误“需要类型注释”?
- csrf - 带有正则表达式模式的 Tomcat 8.52 版本 CsrfPreventionFilter entryPoints 参数
- node.js - module.js:549 抛出错误;错误:找不到模块 './models/TodoListModel' Nodemon 在添加模型文件时显示此错误
- android - 跟踪每个用户在 Android 应用程序中每天单击按钮多少次的最佳方法?
- corda - 我可以在 Corda oracle 服务中使用依赖注入吗?