首页 > 解决方案 > Change column values in groupby

问题描述

Here is a dataframe:

           id         value next_value      step
0       10764  612486.33921   800000.0       0.0
1       10764  612486.33921  1000000.0  200000.0
2       10822   97604.30520   100000.0       0.0
3       10823   97805.45851   100000.0       0.0
4       10823   97805.45851   165000.0   65000.0
5       10823   97805.45851   250000.0   85000.0
6       10823   97805.45851   400000.0  150000.0
7       10823   97805.45851   600000.0  200000.0
8       10823   97805.45851   800000.0  200000.0
9       10823   97805.45851  1000000.0  200000.0
10      10823   97805.45851  1200000.0  200000.0
11      10882   33843.33701    40000.0       0.0
12      10882   33843.33701    60000.0   20000.0
13      10882   33843.33701    90000.0   30000.0
14      10882   33843.33701   125000.0   35000.0
15      10882   33843.33701   150000.0   25000.0
16      10882   33843.33701   175000.0   25000.0
17      10882   33843.33701   205000.0   30000.0
18      10882   33843.33701   230000.0   25000.0
19      10882   33843.33701   300000.0   70000.0
20      10882   33843.33701   440000.0  140000.0
21      10758  241890.94525   385000.0       0.0
22      10817  138519.73560   150000.0       0.0
23      10817  138519.73560   200000.0   50000.0
24      10817  138519.73560   250000.0   50000.0
25      10817  138519.73560   300000.0   50000.0
26      10817  138519.73560   350000.0   50000.0
27      10817  138519.73560   400000.0   50000.0
28      10859   73140.30048    80000.0       0.0
29      10859   73140.30048   100000.0   20000.0

I need to add to step the first difference between next_value and value in a group by id.

I.e. for id=10764 step = step + 800000.0 - 612486.33921,

for id=10822 step = step + 100000.0 - 97604.30520,

etc.

I've tried this:

df['step'] += df.groupby('id')[['next_value', 'value']].apply
    (lambda x: x.iloc[0, 0] - x.iloc[0, 1])

But it gives me all step values equal to NaN.

What can I do here?

标签: pythonpandasgroup-by

解决方案


Use groupby id and get the first value. Then, use s for df.step

In [717]: s = df.groupby('id').transform('first')

In [718]: df.step + s.next_value - s.value
Out[718]:
0     187513.66079
1     387513.66079
2       2395.69480
3       2194.54149
4      67194.54149
5      87194.54149
6     152194.54149
7     202194.54149
8     202194.54149
9     202194.54149
10    202194.54149
11      6156.66299
12     26156.66299
13     36156.66299
14     41156.66299
15     31156.66299
16     31156.66299
17     36156.66299
18     31156.66299
19     76156.66299
20    146156.66299
21    143109.05475
22     11480.26440
23     61480.26440
24     61480.26440
25     61480.26440
26     61480.26440
27     61480.26440
28      6859.69952
29     26859.69952
dtype: float64

Details

In [719]: s
Out[719]:
           value  next_value  step
0   612486.33921    800000.0   0.0
1   612486.33921    800000.0   0.0
2    97604.30520    100000.0   0.0
3    97805.45851    100000.0   0.0
4    97805.45851    100000.0   0.0
5    97805.45851    100000.0   0.0
6    97805.45851    100000.0   0.0
7    97805.45851    100000.0   0.0
8    97805.45851    100000.0   0.0
9    97805.45851    100000.0   0.0
10   97805.45851    100000.0   0.0
11   33843.33701     40000.0   0.0
12   33843.33701     40000.0   0.0
13   33843.33701     40000.0   0.0
14   33843.33701     40000.0   0.0
15   33843.33701     40000.0   0.0
16   33843.33701     40000.0   0.0
17   33843.33701     40000.0   0.0
18   33843.33701     40000.0   0.0
19   33843.33701     40000.0   0.0
20   33843.33701     40000.0   0.0
21  241890.94525    385000.0   0.0
22  138519.73560    150000.0   0.0
23  138519.73560    150000.0   0.0
24  138519.73560    150000.0   0.0
25  138519.73560    150000.0   0.0
26  138519.73560    150000.0   0.0
27  138519.73560    150000.0   0.0
28   73140.30048     80000.0   0.0
29   73140.30048     80000.0   0.0

推荐阅读