python-3.x - Create new column in a df based the appended list of dictionary and looping over the list of dictionary Pandas
问题描述
I have a df and list of dictionary as shown below.
df:
Date t_factor
2020-02-01 5
2020-02-02 23
2020-02-03 14
2020-02-04 23
2020-02-05 23
2020-02-06 23
2020-02-07 30
2020-02-08 29
2020-02-09 100
2020-02-10 38
2020-02-11 38
2020-02-12 38
2020-02-13 70
2020-02-14 70
REQUEST_OBJ = {
"blue": {
"best": [
{'type': 'quadratic',
'from': '2020-02-03T20:00:00.000Z',
'to': '2020-02-06T20:00:00.000Z',
'days': 3,
'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
{'type': 'linear',
'from': '2020-02-06T20:00:00.000Z',
'to': '2020-02-10T20:00:00.000Z',
'days': 3,
'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
{'type': 'polynomial',
'from': '2020-02-10T20:00:00.000Z',
'to': '2020-02-14T20:00:00.000Z',
'days': 3,
'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]}]
}
}
Step1: From that I would like to change the "best" list in the dictionary as shown below.
Step1.1: Sort the list based on the value of "from" key in dictionary
[
{"type": "quadratic",
"from": "2020-02-03T20:00:00.000Z",
"to": "2020-02-10T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "linear",
"from": "2020-02-04T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "polynomial",
"from": "2020-02-05T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}]
Step1.2:add a dictionary with value of "from" key as minimum date of df and "to" should be "from" date the first dictionary in the sorted list. "days" = 0, "coef":[0.1,0.1,0.1,0.1,0.1,0.1].
{"type": "df_first",
"from": "2020-02-01T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}
Step1.3:add a dictionary with value of "from" key as 7 days after minimum date of df and "to" should be one days after from
{"type": "df_mid",
"from": "2020-02-08T20:00:00.000Z",
"to": "2020-02-09T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}
Step1.4:add a dictionary with value of "from" key as maximum date of df and "to" should be same as well as "from".
{"type": "df_last",
"from": "2020-02-14T20:00:00.000Z",
"to": "2020-02-14T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}
Step 1.5: Sort all the dictionary based on "from" date.
Expected Output:
[{"type": "df_first",
"from": "2020-02-01T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "quadratic",
"from": "2020-02-03T20:00:00.000Z",
"to": "2020-02-10T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "linear",
"from": "2020-02-04T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "polynomial",
"from": "2020-02-05T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "df_mid",
"from": "2020-02-08T20:00:00.000Z",
"to": "2020-02-09T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "df_last",
"from": "2020-02-14T20:00:00.000Z",
"to": "2020-02-14T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}
]
Step 1.6:
Replace the "to" value of each dictionary with "from" value of next dictionary. "to" value of last dictionary be as it is.
Expected output:
[{"type": "df_first",
"from": "2020-02-01T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "quadratic",
"from": "2020-02-03T20:00:00.000Z",
"to": "2020-02-04T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "linear",
"from": "2020-02-04T20:00:00.000Z",
"to": "2020-02-05T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "polynomial",
"from": "2020-02-05T20:00:00.000Z",
"to": "2020-02-08T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "df_mid",
"from": "2020-02-08T20:00:00.000Z",
"to": "2020-02-14T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "df_last",
"from": "2020-02-14T20:00:00.000Z",
"to": "2020-02-14T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}
]
Based on the updated dictionary create a new column in df I would like to create a new column into df based on the "type" and date column specified by the dictionary.
Explanation:
if "type" == df_first:
df['new_col'] = df['t_factor'] (duration only from the "from" and "to" date specified in that dictionary)
if "type" == df_mid:
df['new_col'] = df['t_factor'] (duration only from the "from" and "to" date specified in that dictionary)
elif "type" == "quadratic":
df['new_col'] = a0 + a1*(T) + a2*(T)**2 + previous value of df['new_col']
where T = 1 for one day after the "from" date of that dictionary and T counted in days based Date value
elif "type" == "linear":
df['new_col'] = a0 + a1*(T) + previous value of df['new_col']
where T = 1 for one day after the "from" date of that dictionary.
elif "type" == "polynomial":
df['new_col'] = a0 + a1*(T) + a2*(T)**2 + a3*(T)**3 + a4*(T)**4 + a5*(T)**5 + previous value of df['new_col']
where T = 1 for start_date of that dictionary.
if "type" == df_last:
df['new_col'] = df['t_factor'] (duration only from the "from" and "to" date specified in that dictionary)
I tried below code:
df = pd.read_csv(StringIO("""Date t_factor
2020-02-01 5
2020-02-02 23
2020-02-03 14
2020-02-04 23
2020-02-05 23
2020-02-06 23
2020-02-07 30
2020-02-08 29
2020-02-09 100
2020-02-10 38
2020-02-11 38
2020-02-12 38
2020-02-13 70
2020-02-14 70"""), sep="\s+", parse_dates=[0])
REQUEST_OBJ = {
"blue": {
"best": [
{'type': 'quadratic',
'from': '2020-02-03T20:00:00.000Z',
'to': '2020-02-06T20:00:00.000Z',
'days': 3,
'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
{'type': 'linear',
'from': '2020-02-06T20:00:00.000Z',
'to': '2020-02-10T20:00:00.000Z',
'days': 3,
'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
{'type': 'polynomial',
'from': '2020-02-10T20:00:00.000Z',
'to': '2020-02-14T20:00:00.000Z',
'days': 3,
'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]}]
}
}
def add_dct(lst, _type, _from, _to):
lst.append({
'type': _type,
'from': _from if isinstance(_from, str) else _from.strftime("%Y-%m-%dT20:%M:%S.000Z"),
'to': _to if isinstance(_to, str) else _to.strftime("%Y-%m-%dT20:%M:%S.000Z"),
'days': 0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
})
def fn_graph(df, REQUEST_OBJ):
REQUIRED_KEYS = ["blue"]
for bluewhite_category in REQUIRED_KEYS:
print(bluewhite_category)
if bluewhite_category in REQUEST_OBJ.keys():
for bestworst_category in REQUEST_OBJ[bluewhite_category].keys():
print(bestworst_category)
param_obj_list = REQUEST_OBJ[bluewhite_category][bestworst_category]
dmin, dmax = df['Date'].min(), df['Date'].max()
#sort input list based on d['from']
param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
# add a dictionary with d['from'] = dmin
param_obj_list = add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])
# add a dictionary with d['from'] as data_end
param_obj_list = add_dct(param_obj_list, 'df_mid', dmin + pd.Timedelta(days=7), dmin + pd.Timedelta(days=8))
# add dictionary with d['from'] as projection end
param_obj_list = add_dct(param_obj_list, 'df_last', dmax, dmax)
# sort the final list of dictionary based on d['from']
param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
# Replace the 'to' date as from of previous dictionary
df1ist = pd.DataFrame(param_obj_list)
df1ist['to'] = df1ist['from'].shift(-1).fillna(df1ist['to'])
param_obj_list = df1ist.to_dict('r')
print(param_obj_list)
kind = bluewhite_category + '_' + bestworst_category
df['time_function'] = np.nan
for d in param_obj_list:
a0, a1, a2, a3, a4, a5 = d['coef']
start = pd.Timestamp(d['from']).strftime('%Y-%m-%d')
end = pd.Timestamp(d['to']).strftime('%Y-%m-%d')
T = df['Date'].sub(pd.Timestamp(start)).dt.days
mask = df['Date'].between(start, end, inclusive=True)
if d['type'] == 'df_first':
df.loc[mask, 'time_function'] = df['t_factor']
elif d['type'] == 'quadratic':
df.loc[mask, 'time_function'] = a0 + a1 * T + a2 * (T)**2 + df['new_col'].ffill()
elif d['type'] == 'linear':
df.loc[mask, 'time_function'] = a0 + a1 * T + df['new_col'].ffill()
elif d['type'] == 'polynomial':
df.loc[mask, 'time_function'] = a0 + a1*(T) + a2*(T)**2 + a3 * \
(T)**3 + a4*(T)**4 + a5*(T)**5 + df['new_col'].ffill()
elif d['type'] == 'df_mid':
df.loc[mask, 'time_function'] = df['t_factor']
elif d['type'] == 'df_mid':
df.loc[mask, 'time_function'] = df['t_factor']
elif d['type'] == 'df_last':
df.loc[mask, 'time_function'] = df['t_factor']
else:
return df
return df
fn_graph(df, REQUEST_OBJ)
And I am getting below error.
AttributeError: 'NoneType' object has no attribute 'append'
解决方案
在这里,我如何纠正该错误
刚变
param_obj_list = add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])
至
add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])
这是完整的代码:
df = pd.read_csv(StringIO("""Date t_factor
2020-02-01 5
2020-02-02 23
2020-02-03 14
2020-02-04 23
2020-02-05 23
2020-02-06 23
2020-02-07 30
2020-02-08 29
2020-02-09 100
2020-02-10 38
2020-02-11 38
2020-02-12 38
2020-02-13 70
2020-02-14 70"""), sep="\s+", parse_dates=[0])
REQUEST_OBJ = {
"blue": {
"best": [
{'type': 'quadratic',
'from': '2020-02-03T20:00:00.000Z',
'to': '2020-02-06T20:00:00.000Z',
'days': 3,
'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
{'type': 'linear',
'from': '2020-02-06T20:00:00.000Z',
'to': '2020-02-10T20:00:00.000Z',
'days': 3,
'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
{'type': 'polynomial',
'from': '2020-02-10T20:00:00.000Z',
'to': '2020-02-14T20:00:00.000Z',
'days': 3,
'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]}]
}
}
def add_dct(lst, _type, _from, _to):
lst.append({
'type': _type,
'from': _from if isinstance(_from, str) else _from.strftime("%Y-%m-%dT20:%M:%S.000Z"),
'to': _to if isinstance(_to, str) else _to.strftime("%Y-%m-%dT20:%M:%S.000Z"),
'days': 0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
})
def fn_graph(df, REQUEST_OBJ):
REQUIRED_KEYS = ["blue"]
for bluewhite_category in REQUIRED_KEYS:
print(bluewhite_category)
if bluewhite_category in REQUEST_OBJ.keys():
for bestworst_category in REQUEST_OBJ[bluewhite_category].keys():
print(bestworst_category)
param_obj_list = REQUEST_OBJ[bluewhite_category][bestworst_category]
dmin, dmax = df['Date'].min(), df['Date'].max()
#sort input list based on d['from']
param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
# add a dictionary with d['from'] = dmin
add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])
# add a dictionary with d['from'] as data_end
add_dct(param_obj_list, 'df_mid', dmin + pd.Timedelta(days=7), dmin + pd.Timedelta(days=8))
# add dictionary with d['from'] as projection end
add_dct(param_obj_list, 'df_last', dmax, dmax)
# sort the final list of dictionary based on d['from']
param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
# Replace the 'to' date as from of previous dictionary
df1ist = pd.DataFrame(param_obj_list)
df1ist['to'] = df1ist['from'].shift(-1).fillna(df1ist['to'])
param_obj_list = df1ist.to_dict('r')
print(param_obj_list)
kind = bluewhite_category + '_' + bestworst_category
df['time_function'] = np.nan
for d in param_obj_list:
a0, a1, a2, a3, a4, a5 = d['coef']
start = pd.Timestamp(d['from']).strftime('%Y-%m-%d')
end = pd.Timestamp(d['to']).strftime('%Y-%m-%d')
T = df['Date'].sub(pd.Timestamp(start)).dt.days
mask = df['Date'].between(start, end, inclusive=True)
if d['type'] == 'df_first':
df.loc[mask, 'time_function'] = df['t_factor']
elif d['type'] == 'quadratic':
df.loc[mask, 'time_function'] = a0 + a1 * T + a2 * (T)**2 + df['new_col'].ffill()
elif d['type'] == 'linear':
df.loc[mask, 'time_function'] = a0 + a1 * T + df['new_col'].ffill()
elif d['type'] == 'polynomial':
df.loc[mask, 'time_function'] = a0 + a1*(T) + a2*(T)**2 + a3 * \
(T)**3 + a4*(T)**4 + a5*(T)**5 + df['new_col'].ffill()
elif d['type'] == 'df_mid':
df.loc[mask, 'time_function'] = df['t_factor']
elif d['type'] == 'df_mid':
df.loc[mask, 'time_function'] = df['t_factor']
elif d['type'] == 'df_last':
df.loc[mask, 'time_function'] = df['t_factor']
else:
return df
return df
fn_graph(df, REQUEST_OBJ)
推荐阅读
- c# - “无法获取类型库的文件路径”尝试在 .NET 中使用 OCX
- mongodb - MongoDB 提供哪些通信安全选项?
- spring-kafka - 如何扩展 spring-kafka 的“@KafkaListener”注释以创建我自己的具有有限属性的注释?
- angular - Angular 9 首次构建需要很长时间,对 CI 来说很不方便,有没有办法加快速度?
- python - Matplotlib:有没有办法用 matplotlibrc 设置默认的散布样式?
- java - 从数据库中检索的日期值使用 java.util.date 显示前一天的值
- jenkins - Jenkins 管道构建分支触发器
- javascript - 在句柄函数上反应钩子状态未定义
- business-objects - 在 WebI 报告中过滤高于 X% 组的行
- flutter - 如何在颤动中将自定义小部件传递给另一个自定义小部件?