python-3.x - 如何在给定的时间范围内从熊猫的其他列中分配特定值?
问题描述
我想为接下来的 12 个月创建滚动预测,该月的结果和条目也必须成为数据框的一部分(稍后它将作为更大数据框的一部分写入 excel)。
需要根据标准计算新数据框的条目,即预测月份在 start_date 和 start_date 之间 + 持续时间也在预测的 12 个月范围内。如果满足这些条件,则应在此处写入持续时间的值。
为此,我想我必须使用 numpy.where(),但是我无法绕开它。
我遇到了Use lambda with pandas to calculate a new column conditional on existing column,但经过一番尝试后,我得出结论,这不是我的情况的全部真相。
import numpy as np
import pandas as pd
import datetime as dt
months = ["Jan", "Feb", "Mrz", "Apr", "Mai", "Jun", "Jul", "Aug", "Sep", "Okt", "Nov", "Dez"]
cur_month = dt.date.today().month - 1
cur_year = dt.date.today().year
d = {'start_date': ['2020-12-23', '2021-02-08', '2021-06-11', '2022-01-07'], 'duration': [12, 6, 8, 3],
'effort': [0.3, 0.5, 1.2, 0.1]}
df = pd.DataFrame(data=d)
i = 0
while i < 12:
# this creates the header rows for the 12 month period
next_month = months[(cur_month + i) % len(months)]
# here goes the calculation/condition I am stuck with...
df[next_month] = np.where(...)
i += 1
解决方案
所以我想出了这个并且似乎工作,我还添加了一些逻辑来为项目在一个月中的某个时间开始的案例加权,所以我们得到了一个更准确的工作量数字。
d = {"id": [1,2,3,4], "start_date": ['2020-12-23', '2021-02-08', '2021-06-11', '2022-01-07'], "duration": [12, 6, 8, 3],
"effort": [0.3, 0.5, 1.2, 0.1]}
df = pd.DataFrame(data=d)
df["EndDates"] = df["start_date"].dt.to_period("M") + df_["duration"]
i = 0
forecast = pd.Series(pd.period_range(today, freq="M", periods=12))
while i < 12:
next_month = months[(cur_month + i) % len(months)]
df[next_month] = ""
for index, row in df.iterrows():
df_tmp = df.loc[df['id'] == int(row['id'])]
if not df_tmp.empty and pd.notna(df_tmp["start_date"].item()):
if df_tmp["start_date"].item().to_period("M") <= forecast[i] <= df_tmp["EndDates"].item():
# For the current month let's calculate with the remaining value
if i == 0:
act_enddate = monthrange(today.year, today.month)[1]
weighter = 1 - (int(today.day) / int(act_enddate))
df.at[index, next_month] = round(df_tmp['effort'].values[0] * weighter,
ndigits=2)
# If it is the first entry for the oppty, how many FTEs will be needed for the first month
# of the assignment
elif df_tmp["start_date"].item().to_period("M") == forecast[i]:
first_day = df_tmp["start_date"].item().day
if first_day != 1:
months_enddate = monthrange(forecast[i].year, forecast[i].month)[1]
weighter = 1 - (int(first_day) / int(months_enddate))
df.at[index, next_month] = round(df_tmp['effort'].values[0] * weighter,
ndigits=2)
else:
df.at[index, next_month] = df_tmp['effort'].values[0]
# How many FTEs are needed for the last month of the assignment
elif df_tmp["EndDates"].item() == forecast[i]:
end_day = df_tmp["start_date"].item().day
if end_day != 1:
months_enddate = monthrange(forecast[i].year, forecast[i].month)[1]
weighter = int(end_day) / int(months_enddate)
df.at[index, next_month] = round(df_tmp['Umrechnung in FTEs'].values[0] * weighter,
ndigits=2)
else:
continue
else:
df.at[index, next_month] = df_tmp['effort'].values[0]
推荐阅读
- oracle - 是否可以在 Xamarin Android 中使用 Oracle ManagedDataAccess?
- ruby-on-rails - 如何将选择选项作为下拉列表中的第一个选项?
- python - 在我的索引列上使用 where 子句来获取数据范围是否会使查询更快
- javascript - 在 React 渲染中循环对象
- php - Android Studio:org.json.JSONArray 类型无法转换为 JSONObject
- python - 尝试使用一个命令多次从会话中获取 JSON X 时间
- math - 按位运算符的代数
- regex - 使用正则表达式查找/替换两个 html 标签
- apache-spark - 数据整理和 ETL(提取、转换和加载)有什么区别?
- python - 如何找到具有唯一标识符的列表的平均值