首页 > 解决方案 > Calculating the difference in value between columns

问题描述

I have a dataframe with YYYYMM columns that contain monthly totals on the row level:

| yearM | feature  | 201902 | 201903 | 201904 | 201905 |... ... ... 202009
|-------|----------|--------|--------|--------|--------|
| 0     | feature1 | Nan    | Nan    | 9.0    | 32.0   |
| 1     | feature2 | 1.0    | 1.0    | 1.0    | 4.0    |
| 2     | feature3 | Nan    | 1.0    | 4.0    | 8.0    |
| 3     | feature4 | 9.0    | 15.0   | 19.0   | 24.0   |
| 4     | feature5 | 33.0   | 67.0   | 99.0   | 121.0  |
| 5     | feature6 | 12.0   | 15.0   | 17.0   | 19.0   |
| 6     | feature7 | 1.0    | 8.0    | 15.0   | 20.0   |
| 7     | feature8 | Nan    | Nan    | 1.0    | 9.0    |

I would like to convert the totals to the monthly change. The feature column should be excluded as I need to keep the feature names. The yearM in the index is a result of pivoting a dataframe to get the YYYYMM on the column level.

This is how the output would look like:

| yearM | feature  | 201902 | 201903 | 201904 | 201905 |... ... ... 202009
|-------|----------|--------|--------|--------|--------|
| 0     | feature1 | Nan    | 0.0    | 9.0    | 23.0   |
| 1     | feature2 | 1.0    | 0.0    | 0.0    | 3.0    |
| 2     | feature3 | Nan    | 1.0    | 3.0    | 5.0    |
| 3     | feature4 | 9.0    | 6.0    | 4.0    | 5      |
| 4     | feature5 | 33.0   | 34.0   | 32.0   | 22.0   |
| 5     | feature6 | 12.0   | 3.0    | 2.0    | 2.0    |
| 6     | feature7 | 1.0    | 7.0    | 7.0    | 5.0    |
| 7     | feature8 | Nan    | 0.0    | 1.0    | 8.0    |

The row level values now represent the change compared to the previous month instead of having the total for the month.

I know that I should start by filling the NaN rows in the starting column 201902 with 0:

df['201902'] = df['201902'].fillna(0)

I could also calculate them one by one with something similar to this:

df['201902'] = df['201902'].fillna(0) - df['201901'].fillna(0)
df['201903'] = df['201903'].fillna(0) - df['201902'].fillna(0)
df['201904'] = df['201904'].fillna(0) - df['201903'].fillna(0)
...
...

Hopefully there's a smarter solution though

标签: python-3.xpandas

解决方案


使用ilocordrop访问其他列,然后diff使用以axis=1获取逐行差异。

monthly_change = df.iloc[:, 1:].fillna(0).diff(axis=1)
# or
# monthly_change = df.drop(['feature'], axis=1).fillna(0).diff(axis=1)

推荐阅读