pandas - 如何在熊猫数据框中创建将每月数据转换为每日、每周的函数?
问题描述
我在数据框中有以下每月数据,我需要将数据转换为每周、每天、每两周一次。
date chair_price vol_chair
01-09-2018 23 30
01-10-2018 53 20
daily: price as same and vol_chair divided by days of the month
weekly: price as same and vol_chair divided by number of weeks in a month
预期产出: 每日:
date chair_price vol_chair
01-09-2018 23 1
02-09-2018 23 1
03-09-2018 23 1
..
30-09-2018 23 1
01-10-2018 53 0.64
..
31-10-2018 53 0.64
每周:
date chair_price vol_chair
02-09-2018 23 6
09-09-2018 23 6
16-09-2018 23 6
23-09-2018 23 6
30-09-2018 23 6
07-10-2018 53 5
14-10-2018 53 5
..
我使用下面的代码作为列 vol,任何快速的方法一起做,即保持价格相同和 vol - 采取行动并找出一个月的周数
df.resample('W').ffill().agg(lambda x: x/4)
df.resample('D').ffill().agg(lambda x: x/30)
and need to use calendar.monthrange(2012,1)[1] to identify days
def func_count_number_of_weeks(df):
return len(calendar.monthcalendar(df['DateRange'].year, df['DateRange'].month))
def func_convert_from_monthly(df, col, category, columns):
if category == "Daily":
df['number_of_days'] = df['DateRange'].dt.daysinmonth
for column in columns:
df[column] = df[column] / df['number_of_days']
df.drop('number_of_days', axis=1, inplace=True)
elif category == "Weekly":
df['number_of_weeks'] = df.apply(func_count_number_of_weeks, axis=1)
for column in columns:
df[column] = df[column] / df['number_of_weeks']
df.drop('number_of_weeks', axis=1, inplace=True)
return df
def func_resample_from_monthly(df,col, category):
df.set_index(col, inplace=True)
df.index = pd.to_datetime(df.index, dayfirst=True)
if category == "Monthly":
df = df.resample('MS').ffill()
elif category == "Weekly":
df = df.resample('W').ffill()
return df
解决方案
利用:
#convert to datetimeindex
df.index = pd.to_datetime(df.index, dayfirst=True)
#add new next month for correct resample
idx = df.index[-1] + pd.offsets.MonthBegin(1)
df = df.append(df.iloc[[-1]].rename({df.index[-1]: idx}))
#resample with forward filling values, remove last helper row
#df1 = df.resample('D').ffill().iloc[:-1]
df1 = df.resample('W').ffill().iloc[:-1]
#divide by size of months
df1['vol_chair'] /= df1.resample('MS')['vol_chair'].transform('size')
print (df1)
chair_price vol_chair
date
2018-09-02 23 6.0
2018-09-09 23 6.0
2018-09-16 23 6.0
2018-09-23 23 6.0
2018-09-30 23 6.0
2018-10-07 53 5.0
2018-10-14 53 5.0
2018-10-21 53 5.0
2018-10-28 53 5.0
推荐阅读
- ios - 如何从 URL 将 Live Photo 保存到图库?
- java - B 类中构造函数的参数 0 需要 A 类类型的 bean,但无法找到
- git - 在 git commit 但不是在本地隐藏或更改一行(或多行)的值
- json - 如何在 MongoDB 中查询子对象内的数据?
- java - java.net.SocketException:连接重置 - 尝试在 java8 中使用 https restful api 时出现异常
- android - 进度对话框不关闭
- apache-kafka - 是否可以在程序运行时动态调整kafka流的num.stream.threads配置?
- laravel-5.2 - 引导功能未在模型上启动
- elasticsearch - 将字段编号转换为 kibana 上的日期,弹性搜索
- java - 升级到最新版本的 Boofcv