python - 根据其他行更新熊猫数据框的行
问题描述
我有一个熊猫数据框,其中包含以下列( pk1, pk2 类型,qty_6,qty_7 )。我的类型为predicted_90,override_90,predicted_50,override 50。现在基于pk1和pk2的组合如果对于类型predicted_50,predicted_90包含一些override_50,override_90的值,除了NaN,我想用override_50更新我的数据框列predicted_50,predicted_90和 override_90 分别。此外,我想在名为 qty_6_overridden、qty_7_overridden 的布尔列中捕获此更改。另外,我想在 qty_6_dev、qty_7_dev 列中捕获两者之间的差异。qty_6_dev = qty_6 覆盖 - qty_6 预测
示例数据框:
data=[
['B01FV0FBX4','2019-01-13','predicted_90',2207.931,2217.841],
['B01FV0FBX4','2019-01-13','predicted_50',1561.033,1521.567],
['B01FV0FBX4','2019-01-13','override_90',1973.000,np.NaN],
['B01FV0FBX4','2019-01-13','override_50',1233.000,np.NaN],
['B01FV0FBX4','2019-01-06','override_50',np.NaN,1233.000],
['B01FV0FBX4','2019-01-06','predicted_50',1210.129,1213.803],
['B01FV0FBX4','2019-01-06','override_90',np.NaN,1973.000],
['B01FV0FBX4','2019-01-06','predicted_90',1911.205,1921.594]
]
df = pd.DataFrame(data,columns=['pk1','pk2', 'type', 'qty_6', 'qty_7'])
预期输出:
data=[
['B01FV0FBX4','2019-01-13','predicted_90',1973.000,2217.841,-234.931,0,True,False],
['B01FV0FBX4','2019-01-13','predicted_50',1233.000,1521.567,-328.033,0,True,False],
['B01FV0FBX4','2019-01-13','override_90',1973.000,np.NaN,0,0,False,False],
['B01FV0FBX4','2019-01-13','override_50',1233.000,np.NaN,0,0,False,False],
['B01FV0FBX4','2019-01-06','override_50',np.NaN,1233.000,0,0,False,False],
['B01FV0FBX4','2019-01-06','predicted_50',1210.129,1213.000,0,-0.803,False,True],
['B01FV0FBX4','2019-01-06','override_90',np.NaN,1973.000,0,0,False,False],
['B01FV0FBX4','2019-01-06','predicted_90',1911.205,1973.000,0,51.406,False,True]
]
df = pd.DataFrame(data,columns=['pk1','pk2', 'type', 'qty_6', 'qty_7','qty_6_dev','qty_7_dev', 'qty_6_overridden','qty_7_overridden'])
在示例中,您可以看到,具有覆盖的数量与预测的交换数量,我们得到相应的列 'qty_6_dev'、'qty_7_dev'、'qty_6_overridden'、'qty_7_overridden'。
我能够写一个解决方案。它有效,但它看起来很可怕,对其他人来说很难理解。
import pandas as pd
import numpy as np
import math
data=[
['B01FV0FBX4','2019-01-13','predicted_90',2207.931,2217.841],
['B01FV0FBX4','2019-01-13','predicted_50',1561.033,1521.567],
['B01FV0FBX4','2019-01-13','override_90',1973.000,np.NaN],
['B01FV0FBX4','2019-01-13','override_50',1233.000,np.NaN],
['B01FV0FBX4','2019-01-06','override_50',np.NaN,1233.000],
['B01FV0FBX4','2019-01-06','predicted_50',1210.129,1213.803],
['B01FV0FBX4','2019-01-06','override_90',np.NaN,1973.000],
['B01FV0FBX4','2019-01-06','predicted_90',1911.205,1921.594]
]
df = pd.DataFrame(data,columns=['pk1','pk2', 'type', 'qty_6', 'qty_7'])
override_map = {
"predicted_50" : "override_50",
"predicted_90" : "override_90"
}
def transform_df(df):
transformed_df = pd.DataFrame()
for index, row in df.iterrows():
row_type = row['type']
row_pk1 = row['pk1']
row_pk2 = row['pk2']
if row_type in override_map.keys():
override_type = override_map.get(row_type)
else:
for i in range(6,8):
qty_dev_col = 'qty_'+str(i)+'_dev'
qty_override_col = 'qty_'+str(i)+'_overridden'
row[qty_dev_col] = 0
row[qty_override_col] = False
transformed_df=transformed_df.append(row, ignore_index=True)
continue
corr_df = df.loc[(df.type == override_type)
& (df.pk1 == row_pk1)
& (df.pk2 == row_pk2)]
for i in range(6,8):
qty_col = 'qty_'+str(i)
qty_dev_col = 'qty_'+str(i)+'_dev'
qty_override_col = 'qty_'+str(i)+'_overridden'
if not (math.isnan(corr_df[qty_col])) and (corr_df[qty_col].values[0] != row[qty_col]):
row[qty_dev_col] = corr_df[qty_col].values[0] - row[qty_col]
row[qty_col] = corr_df[qty_col].values[0]
row[qty_override_col] = True
else:
row[qty_dev_col] = 0
row[qty_override_col] = False
transformed_df=transformed_df.append(row, ignore_index=True)
return transformed_df
x1 = transform_df(df)
有没有更好的方法来使用 lambdas 或其他东西来做到这一点?这也需要永远运行更大的数据帧。
解决方案
推荐阅读
- wordpress - 如何显示 wordpress 会员网站或学生的总数?
- sql - 月份之间的 SQL 拆分值
- javascript - (未知 url)的 Http 失败响应:400 Bad Request
- javascript - vue.js 使用箭头键平滑滚动到中间、底部和后顶部
- botframework - 创建一个选项卡作为自适应卡并以编程方式刷新内容
- r - 如何仅将 mutate 应用于行的子集,但仍为每个观察创建一个新列?
- unit-testing - 如何在本地对 Chainlink 的可验证随机函数进行单元测试?
- typescript - 如何将棱镜表加入打字稿中的原始查询表?
- grafana-loki - Loki - 将多个相关的日志条目组合在一起
- ios - 在 iOS/React Native 中处理麦克风输入