首页 > 解决方案 > 每当两个重置之一时重置两个累积和列

问题描述

我被困在以下问题上:

我有这个数据集(虚拟值)

item    price    units
A         10       1
B         20       5
C         5        8
D         6        3

其中,我通过以下方式计算两者的累积总和:

threshold_price = 31
threshold_units = 8

ls = [] 
cumsum = 0 
lastreset = 0 
for _, row in df.iterrows(): 
    if cumsum + row.price <= threshold_price: 
        cumsum += row.price 
    else: 
        last_reset = cumsum 
        cumsum = row.price 
    ls.append(cumsum) 
df['cs_price'] = ls

ls = [] 
cumsum = 0 
lastreset = 0 
for _, row in df.iterrows(): 
    if cumsum + row.units <= threshold_units: 
        cumsum += row.units 
    else: 
        last_reset = cumsum 
        cumsum = row.units 
    ls.append(cumsum) 
df['cs_units'] = ls

从这里我得到:

item    price    units   cs_price   cs_units
A         10       1        10         1
B         20       5        30         6
C         5        8         5         8
D         6        3        11         3

我想要一个新列,我们可以称之为“存储”,它等于两列中任何一个的第一次重置,并且每次重置都会递增。

给定上面的例子:

item    price    units   cs_price   cs_units   store
A         10       1        10         1         0
B         20       5        30         6         0
C         5        8        5          8         1 #+1 because of price
D         6        3        11         3         2 #+1 because of units

我通过这种方法遇到的主要问题是,当新商店属性化时,cs_price 和 cs_units 不会重置,这是我需要的。

理想的输出应该如下:

   item    price    units   cs_price   cs_units   store
    A         10       1        10         1         0
    B         20       5        30         6         0
    C         5        8        5          8         1 #+1 because of price, cs_units resets
    D         6        3        6          3         2 #+1 because of units, cs_price resets
    E         7        4        13         7         2
    F         8        2        8          2         3
    G         20       2        28         4         3
    H         6        3        6          3         4

我当前的输出(这不是我想要的)是:

   item    price    units   cs_price   cs_units   store
    A         10       1        10         1         0
    B         20       5        30         6         0
    C         5        8        5          8         1 #+1 because of cs_price
    D         6        3        11         3         1 
    E         7        4        18         7         1
    F         8        2        26         2         2 #+1 because of cs_units but cs_price does not reset 
    G         20       2        20         4         2
    H         6        3        26         7         2

完整代码(我没有包括上面的匹配列,我希望它仍然可读):

threshold_price = 31
threshold_units = 8

ls = [] 
cumsum = 0 
lastreset = 0 
for _, row in df.iterrows(): 
    if cumsum + row.price <= threshold_price: 
        cumsum += row.price 
    else: 
        last_reset = cumsum 
        cumsum = row.price 
    ls.append(cumsum) 
df['cs_price'] = ls

ls = [] 
cumsum = 0 
lastreset = 0 
for _, row in df.iterrows(): 
    if cumsum + row.units <= threshold_units: 
        cumsum += row.units 
    else: 
        last_reset = cumsum 
        cumsum = row.units 
    ls.append(cumsum) 
df['cs_units'] = ls

df['match'] = df.cs_price < df.cs_price.shift()
df["match"] = df["match"].astype(int)

df['match2'] = df.cs_units < df.cs_units.shift()
df["match2"] = df["match2"].astype(int)

df["store_prep"] = df["match"].cumsum()
df["store_prep2"] = df["match2"].cumsum()
df["store"] = df[["store_prep","store_prep2"]].max(axis=1)

标签: pythonpandasdataframe

解决方案


因为您的“重置”条件同时依赖于priceunits列,所以您应该一起循环遍历两列。在循环期间,跟踪重置发生的时间,然后对该重置列表执行cumsum以获取您的store列。

尝试这个:

import pandas as pd
import numpy as np

df = pd.DataFrame({'item': ['A','B','C','D','E','F','G','H'], 
                   'price': [10,20,5,6,7,8,20,6], 
                   'units': [1,5,8,3,4,2,2,3]})

threshold_price = 31
threshold_units = 8

reset = []

list_price = [] 
list_units = []

cs_price = 0 
cs_units = 0

for price, units in zip(df['price'].to_list(), df['units'].to_list()): 
    if cs_price + price > threshold_price or cs_units + units > threshold_units:
        reset.append(True)
        cs_price = price
        cs_units = units              
    else:
        reset.append(False)
        cs_price += price  
        cs_units += units        
    list_price.append(cs_price)
    list_units.append(cs_units)
    
df['cs_price'] = list_price
df['cs_units'] = list_units

df['store'] = np.cumsum(reset)
print(df)

  item  price  units  cs_price  cs_units  store
0    A     10      1        10         1      0
1    B     20      5        30         6      0
2    C      5      8         5         8      1
3    D      6      3         6         3      2
4    E      7      4        13         7      2
5    F      8      2         8         2      3
6    G     20      2        28         4      3
7    H      6      3         6         3      4


推荐阅读