首页 > 解决方案 > 如何在给定的时间范围内从熊猫的其他列中分配特定值?

问题描述

我想为接下来的 12 个月创建滚动预测,该月的结果和条目也必须成为数据框的一部分(稍后它将作为更大数据框的一部分写入 excel)。

需要根据标准计算新数据框的条目,即预测月份在 start_date 和 start_date 之间 + 持续时间也在预测的 12 个月范围内。如果满足这些条件,则应在此处写入持续时间的值。

预期产出

为此,我想我必须使用 numpy.where(),但是我无法绕开它。

我遇到了Use lambda with pandas to calculate a new column conditional on existing column,但经过一番尝试后,我得出结论,这不是我的情况的全部真相。

import numpy as np
import pandas as pd
import datetime as dt

months = ["Jan", "Feb", "Mrz", "Apr", "Mai", "Jun", "Jul", "Aug", "Sep", "Okt", "Nov", "Dez"]
cur_month = dt.date.today().month - 1
cur_year = dt.date.today().year

d = {'start_date': ['2020-12-23', '2021-02-08', '2021-06-11', '2022-01-07'], 'duration': [12, 6, 8, 3], 
'effort': [0.3, 0.5, 1.2, 0.1]}
df = pd.DataFrame(data=d)
i = 0
while i < 12:
    # this creates the header rows for the 12 month period
    next_month = months[(cur_month + i) % len(months)]
    
    # here goes the calculation/condition I am stuck with...
    df[next_month] = np.where(...)
    i += 1

标签: python-3.xpandasnumpy

解决方案


所以我想出了这个并且似乎工作,我还添加了一些逻辑来为项目在一个月中的某个时间开始的案例加权,所以我们得到了一个更准确的工作量数字。

d = {"id": [1,2,3,4], "start_date": ['2020-12-23', '2021-02-08', '2021-06-11', '2022-01-07'], "duration": [12, 6, 8, 3], 
"effort": [0.3, 0.5, 1.2, 0.1]}

df = pd.DataFrame(data=d)
df["EndDates"] = df["start_date"].dt.to_period("M") + df_["duration"]
i = 0
forecast = pd.Series(pd.period_range(today, freq="M", periods=12))

while i < 12:
    next_month = months[(cur_month + i) % len(months)]
    df[next_month] = ""

    for index, row in df.iterrows():
        df_tmp = df.loc[df['id'] == int(row['id'])]
        if not df_tmp.empty and pd.notna(df_tmp["start_date"].item()):
            if df_tmp["start_date"].item().to_period("M") <= forecast[i] <= df_tmp["EndDates"].item():

                # For the current month let's calculate with the remaining value
                if i == 0:
                    act_enddate = monthrange(today.year, today.month)[1]
                    weighter = 1 - (int(today.day) / int(act_enddate))
                    df.at[index, next_month] = round(df_tmp['effort'].values[0] * weighter,
                                                                ndigits=2)
                # If it is the first entry for the oppty, how many FTEs will be needed for the first month
                # of the assignment
                elif df_tmp["start_date"].item().to_period("M") == forecast[i]:
                    first_day = df_tmp["start_date"].item().day
                    if first_day != 1:
                        months_enddate = monthrange(forecast[i].year, forecast[i].month)[1]
                        weighter = 1 - (int(first_day) / int(months_enddate))
                        df.at[index, next_month] = round(df_tmp['effort'].values[0] * weighter,
                                                                    ndigits=2)
                    else:
                        df.at[index, next_month] = df_tmp['effort'].values[0]
                # How many FTEs are needed for the last month of the assignment
                elif df_tmp["EndDates"].item() == forecast[i]:
                    end_day = df_tmp["start_date"].item().day
                    if end_day != 1:
                        months_enddate = monthrange(forecast[i].year, forecast[i].month)[1]
                        weighter = int(end_day) / int(months_enddate)
                        df.at[index, next_month] = round(df_tmp['Umrechnung in FTEs'].values[0] * weighter,
                                                                    ndigits=2)
                    else:
                        continue
                else:
                    df.at[index, next_month] = df_tmp['effort'].values[0]

推荐阅读