首页 > 解决方案 > 带移位的计算列

问题描述

这是基本数据帧:

 g_accessor  number_opened  number_closed
0     49 - 20            3.0            1.0
1     50 - 20            2.0           14.0
2     51 - 20            1.0            6.0
3     52 - 20            0.0            6.0
4      1 - 21            1.0            4.0
5      2 - 21            3.0            5.0
6      3 - 21            4.0           11.0
7      4 - 21            2.0            7.0
8      5 - 21            6.0           10.0
9      6 - 21            2.0            8.0
10     7 - 21            4.0            9.0
11     8 - 21            2.0            3.0
12     9 - 21            2.0            1.0
13    10 - 21            1.0           11.0
14    11 - 21            6.0            3.0
15    12 - 21            3.0            3.0
16    13 - 21            2.0            6.0
17    14 - 21            5.0            9.0
18    15 - 21            9.0           13.0
19    16 - 21            7.0            7.0
20    17 - 21            9.0            4.0
21    18 - 21            3.0            8.0
22    19 - 21            6.0            3.0
23    20 - 21            6.0            1.0
24    21 - 21            3.0            5.0
25    22 - 21            5.0            3.0
26    23 - 21            1.0            0.0

我想添加一个number_active依赖于先前值的计算新列。为此,我正在尝试使用pd.DataFrame.shift(),如下所示:

# Creating new column and setting all rows to 0
df['number_active'] = 0
# Active from previous period
PREVIOUS_PERIOD_ACTIVE = 22
# Calculating active value for first period in the DataFrame, based on `PREVIOUS_PERIOD_ACTIVE`
df.iat[0,3] = (df.iat[0,1] + PREVIOUS_PERIOD_ACTIVE) - df.iat[0,2]
# Calculating all columns using DataFrame.shift()
df['number_active'] = (df['number_opened'] + df['number_active'].shift(1)) - df['number_closed']
# Recalculating first active value as it was overwritten in the previous step. 
df.iat[0,3] = (df.iat[0,1] + PREVIOUS_PERIOD_ACTIVE) - df.iat[0,2]

结果:

   g_accessor  number_opened  number_closed  number_active
0     49 - 20            3.0            1.0           24.0
1     50 - 20            2.0           14.0           12.0
2     51 - 20            1.0            6.0           -5.0
3     52 - 20            0.0            6.0           -6.0
4      1 - 21            1.0            4.0           -3.0
5      2 - 21            3.0            5.0           -2.0
6      3 - 21            4.0           11.0           -7.0
7      4 - 21            2.0            7.0           -5.0
8      5 - 21            6.0           10.0           -4.0
9      6 - 21            2.0            8.0           -6.0
10     7 - 21            4.0            9.0           -5.0
11     8 - 21            2.0            3.0           -1.0
12     9 - 21            2.0            1.0            1.0
13    10 - 21            1.0           11.0          -10.0
14    11 - 21            6.0            3.0            3.0
15    12 - 21            3.0            3.0            0.0
16    13 - 21            2.0            6.0           -4.0
17    14 - 21            5.0            9.0           -4.0
18    15 - 21            9.0           13.0           -4.0
19    16 - 21            7.0            7.0            0.0
20    17 - 21            9.0            4.0            5.0
21    18 - 21            3.0            8.0           -5.0
22    19 - 21            6.0            3.0            3.0
23    20 - 21            6.0            1.0            5.0
24    21 - 21            3.0            5.0           -2.0
25    22 - 21            5.0            3.0            2.0
26    23 - 21            1.0            0.0            1.0

奇怪的是,似乎只有第一个活动值(索引 1)被正确计算(因为索引 0 处的值是通过 独立计算的df.iat)。对于其余的值,由于某种原因,它似乎number_closed被解释为负值。

我错过了什么/做错了什么?

标签: pythonpandasdataframe

解决方案


您假设在计算当前行时上一行的结果可用。这不是 pandas 计算的工作方式。Pandas 计算单独处理每一行,除非您应用多行操作,例如cumsumand shift

我会用一个最小的例子来计算活跃的数字:

df = pandas.DataFrame({'ignore': ['a','b','c','d','e'], 'number_opened': [3,4,5,4,3], 'number_closed':[1,2,2,1,2]})
df['number_active'] = df['number_opened'].cumsum() + 22 - df['number_closed'].cumsum()

这给出了以下结果:

忽视 number_opened number_closed number_active
0 一种 3 1 24
1 b 4 2 26
2 C 5 2 29
3 d 4 1 32
4 e 3 2 33

您的问题中的代码以及我的最小示例给出了:

忽视 number_opened number_closed number_active
0 一种 3 1 24
1 b 4 2 26
2 C 5 2 3
3 d 4 1 3
4 e 3 2 1

推荐阅读