首页 > 解决方案 > 使用 Python 的棘手转换和运行基线

问题描述

我有一个数据框,我想在其中按 id 和日期分组,并根据季度和 id 添加值,以及根据季度、id 和计数减去值。困境是一旦添加或减去值,这将成为新的基线。

df

数据

id  date1   name1   pwr1    name2   pwr2    position    base
aa  q1.22   hey     10                      10          20
aa  q1.22   ok      5                       10          20
aa  q2.22   hello   2                       10          20
bb  q1.22                                   5           50
bb  q222    sure    2                       5           50
bb  q222    yes     4                       5           50
bb  q322                    hiya       1    5           50
bb  q422    no      2       love       5    5           50
bb  q422    k       2       like       5    5           50
cc  q122                    oh         2    100         75
cc  q122                    kay        2    100         75
cc  q222                                    100         75

期望的

id  date    consumed    retro   position    base
aa  q122    15          0       8           5
aa  q222    2           0       7           3
bb  q122    0           0       5           50
bb  q222    6           0       3           44
bb  q322    0           1       4           45
bb  q422    4           10      4           51
cc  q122    0           4       102         79
cc  q222    0           0       102         79

逻辑

Looking at id aa with a date of q122, how we derive consumed, retro, position and base:

Group 'id' 'aa' and add up 'pwr1' column values (10 + 5 = 15).
The starting position is 10, and the base value is 20.
In 'aa' q122, the count is 2 (hey and ok) so we now subtract 2 from the starting position, 10, which gives us:
10 - 2 = 8
The base for q122 aa is now 5 because we subtract 15 (pwr1 columns values) from 20 (starting base)

retro column is derived from **adding** up the grouped values in column 'pwr2' , which we then add these values to the base value for that group and date. **Also the positions are added**, if there is any retro value.
ex

cc q1 22, has pwr2 values of 2 and (2+2=4) and a count of 2 (oh and kay)
we add the pwr2 of 4 to the base 75 = 79
we add the position of 2 to the position , which = 102

正在做

一个 SO 成员帮助处理了这段代码,但是,我正在尝试针对这种情况进一​​步调整它

def f(x):
    d = {'consumed': [x['pwr1'].sum()],
         'retro': [x['pwr2'].sum()],
         'position': [x['pwr2'].count() - x['pwr1'].count()]}  
    return pd.DataFrame(d)


out = df.groupby(['id', 'date', 'positions']) \
        .apply(f).reset_index().drop(columns='level_3')

我还在研究。任何建议表示赞赏。

标签: pythonpandasnumpy

解决方案


推荐阅读