python - 每当两个重置之一时重置两个累积和列
问题描述
我被困在以下问题上:
我有这个数据集(虚拟值)
item price units
A 10 1
B 20 5
C 5 8
D 6 3
其中,我通过以下方式计算两者的累积总和:
threshold_price = 31
threshold_units = 8
ls = []
cumsum = 0
lastreset = 0
for _, row in df.iterrows():
if cumsum + row.price <= threshold_price:
cumsum += row.price
else:
last_reset = cumsum
cumsum = row.price
ls.append(cumsum)
df['cs_price'] = ls
ls = []
cumsum = 0
lastreset = 0
for _, row in df.iterrows():
if cumsum + row.units <= threshold_units:
cumsum += row.units
else:
last_reset = cumsum
cumsum = row.units
ls.append(cumsum)
df['cs_units'] = ls
从这里我得到:
item price units cs_price cs_units
A 10 1 10 1
B 20 5 30 6
C 5 8 5 8
D 6 3 11 3
我想要一个新列,我们可以称之为“存储”,它等于两列中任何一个的第一次重置,并且每次重置都会递增。
给定上面的例子:
item price units cs_price cs_units store
A 10 1 10 1 0
B 20 5 30 6 0
C 5 8 5 8 1 #+1 because of price
D 6 3 11 3 2 #+1 because of units
我通过这种方法遇到的主要问题是,当新商店属性化时,cs_price 和 cs_units 不会重置,这是我需要的。
理想的输出应该如下:
item price units cs_price cs_units store
A 10 1 10 1 0
B 20 5 30 6 0
C 5 8 5 8 1 #+1 because of price, cs_units resets
D 6 3 6 3 2 #+1 because of units, cs_price resets
E 7 4 13 7 2
F 8 2 8 2 3
G 20 2 28 4 3
H 6 3 6 3 4
我当前的输出(这不是我想要的)是:
item price units cs_price cs_units store
A 10 1 10 1 0
B 20 5 30 6 0
C 5 8 5 8 1 #+1 because of cs_price
D 6 3 11 3 1
E 7 4 18 7 1
F 8 2 26 2 2 #+1 because of cs_units but cs_price does not reset
G 20 2 20 4 2
H 6 3 26 7 2
完整代码(我没有包括上面的匹配列,我希望它仍然可读):
threshold_price = 31
threshold_units = 8
ls = []
cumsum = 0
lastreset = 0
for _, row in df.iterrows():
if cumsum + row.price <= threshold_price:
cumsum += row.price
else:
last_reset = cumsum
cumsum = row.price
ls.append(cumsum)
df['cs_price'] = ls
ls = []
cumsum = 0
lastreset = 0
for _, row in df.iterrows():
if cumsum + row.units <= threshold_units:
cumsum += row.units
else:
last_reset = cumsum
cumsum = row.units
ls.append(cumsum)
df['cs_units'] = ls
df['match'] = df.cs_price < df.cs_price.shift()
df["match"] = df["match"].astype(int)
df['match2'] = df.cs_units < df.cs_units.shift()
df["match2"] = df["match2"].astype(int)
df["store_prep"] = df["match"].cumsum()
df["store_prep2"] = df["match2"].cumsum()
df["store"] = df[["store_prep","store_prep2"]].max(axis=1)
解决方案
因为您的“重置”条件同时依赖于price
和units
列,所以您应该一起循环遍历两列。在循环期间,跟踪重置发生的时间,然后对该重置列表执行cumsum以获取您的store
列。
尝试这个:
import pandas as pd
import numpy as np
df = pd.DataFrame({'item': ['A','B','C','D','E','F','G','H'],
'price': [10,20,5,6,7,8,20,6],
'units': [1,5,8,3,4,2,2,3]})
threshold_price = 31
threshold_units = 8
reset = []
list_price = []
list_units = []
cs_price = 0
cs_units = 0
for price, units in zip(df['price'].to_list(), df['units'].to_list()):
if cs_price + price > threshold_price or cs_units + units > threshold_units:
reset.append(True)
cs_price = price
cs_units = units
else:
reset.append(False)
cs_price += price
cs_units += units
list_price.append(cs_price)
list_units.append(cs_units)
df['cs_price'] = list_price
df['cs_units'] = list_units
df['store'] = np.cumsum(reset)
print(df)
item price units cs_price cs_units store
0 A 10 1 10 1 0
1 B 20 5 30 6 0
2 C 5 8 5 8 1
3 D 6 3 6 3 2
4 E 7 4 13 7 2
5 F 8 2 8 2 3
6 G 20 2 28 4 3
7 H 6 3 6 3 4
推荐阅读
- reactjs - 使用菜单项 React-MaterialUI-Firebase 选择
- php - 选中的单选按钮未更改
- c# - resx 文件到 c# 中具有特定格式的 json
- c# - 如何导航到 UWP 中图形捕获选择器选择的屏幕
- visual-studio-code - Visual Studio Code - 将用户代码段放在 Emmet 代码段之上?
- c++ - 解决方案中的值 x 表示什么?
- linux - 如何像windows一样在linux中的文件中添加隐藏属性?
- python-3.x - 如何下载存储在 django 文件夹中的文件
- python - 如何从基于类的视图中将模型类属性作为上下文传递
- oracle - 用于插入/更新列的触发器在 Oracle 中不起作用