首页 > 解决方案 > 将每周预测 (Pandas df) 转换为每月格式

问题描述

我有一个过程,它生成一个数据框,其中包含每周格式的产品(和版本)预测(wc/星期一日期 - 列名作为字符串)。例子:

product     version     2021-06-07     2021-06-14     2021-06-21     2021-06-28

   a           1           500            400            300            200

   a           2           750            600            450            200

   b           1           200            150            100            100

   b           2           500            400            300            200

我被要求将预测更改为每月预测而不是每周预测。例子:

product     version       Jun-21         Jul-21         Aug-21         Sep-21

   a           1           350             x              x              x

   a           2           500             x              x              x
 
   b           1           100             x              x              x

   b           2           350             x              x              x

数字是为了展示 - 我要做的是平均每周列(每行)以创建每月输出但以准确的方式,即如果每周列是 wc/2 月 26 日,那么只有 3 天的价值将是包括在 2 月的平均值中,3 月只有 4 天。

我知道这只是格式化/分桶的问题,但我正在努力想出一个解决方案,因为我以前从来没有做过这样的事情。

不期待一个完整的解决方案,但我应该如何处理任务的正确方向将不胜感激。

标签: pythonpandasforecasting

解决方案


这是一个过程,因为您需要计算一个月中的天数,确定哪些天流入下个月,进行数学计算并将它们向前移动。这应该可以解决问题。

import pandas as pd
import numpy as np

df = pd.DataFrame({'product': ['a', 'a', 'b', 'b'],
 'version': [1, 2, 1, 2],
 '6/7/2021': [500, 750, 200, 500],
 '6/14/2021': [400, 600, 150, 400],
 '6/21/2021': [300, 450, 100, 300],
 '6/28/2021': [200, 200, 100, 200],
 })

# Convert data to long format
df = df.melt(id_vars=['product','version'], var_name='date')
# Convert date to datetime object
df['date'] = pd.to_datetime(df['date'])

# Add 7 days to the day of the month to compare to the number of days in a month
df['month_day'] = df['date'].dt.day + 7

# Get the number of days in the month
df['days_in_month'] = df['date'].dt.daysinmonth

# Subtract to see how many days the current date would extend into the next month
df['overrun'] = df['month_day']-df['days_in_month']

# Calculate the percentage of the values to push forward into the next month
df['push_forward'] = np.where(df['overrun']>0, df['value']/df['days_in_month']*df['overrun'], 0)

# Reduce the current values by the amount to be pushed forward
df['value'] = df['value'] - df['push_forward']

# Copy the records with a push_forward value to a new dataframe
df2 = df.loc[df['push_forward']>0].copy()

# Drop push_foward column
df.drop(columns='push_forward', inplace=True)

# Add a week to the date values of records with a push_foward value
df2['date'] = df2['date']+pd.DateOffset(weeks=1)

# Merge the pushed data back to the original dataframe
df = df.merge(df2[['product','version','date','push_forward']], on=['product','version','date'], how='outer')

# Fill null values
df.fillna(0, inplace=True)

# Add the push forward values to their respective weekly values
df['value'] = df['value'] + df['push_forward']

# Convert date to just the month
df['date'] = df['date'].dt.strftime('%Y-%m')

# Group and take the average
df = df.groupby(['product','version','date'])['value'].mean().reset_index()


# # Create final pivot table
df.pivot_table(index=['product','version'], columns='date', values='value')

输出

            date       2021-06    2021-07
product version     
      a        1    341.666667  33.333333
               2    491.666667  33.333333
      b        1    133.333333  16.666667
               2    341.666667  33.333333

推荐阅读