python - 计算多个熊猫数据帧的百分比变化
问题描述
假设我有两个不同的 pandas 数据帧,它们具有完全相同的结构:
df1
:
+---+---------+------+------+------+
| | summary | col1 | col2 | col3 |
+---+---------+------+------+------+
| 0 | count | 10 | 10 | 10 |
+---+---------+------+------+------+
| 1 | mean | 4 | 5 | 5 |
+---+---------+------+------+------+
| 2 | stddev | 3 | 3 | 3 |
+---+---------+------+------+------+
| 3 | min | 0 | -1 | 5 |
+---+---------+------+------+------+
| 4 | max | 100 | 56 | 47 |
+---+---------+------+------+------+
和df2
:
+---+---------+------+------+------+
| | summary | col1 | col2 | col3 |
+---+---------+------+------+------+
| 0 | count | 15 | 15 | 5 |
+---+---------+------+------+------+
| 1 | mean | 2 | 2.5 | 2.5 |
+---+---------+------+------+------+
| 2 | stddev | 3 | 3 | 3 |
+---+---------+------+------+------+
| 3 | min | 0 | -1 | 5 |
+---+---------+------+------+------+
| 4 | max | 50 | 56 | 47 |
+---+---------+------+------+------+
对于每个条目,我想计算两个数据框的值之间的百分比变化。我知道有一个功能pct_change()
,但是这只适用于同一个熊猫数据框。所需的输出是
+---+---------+------+------+------+
| | summary | col1 | col2 | col3 |
+---+---------+------+------+------+
| 0 | count | 50% | 50% | -50% |
+---+---------+------+------+------+
| 1 | mean | -50% | -50% | -50% |
+---+---------+------+------+------+
| 2 | stddev | 0% | 0% | 0% |
+---+---------+------+------+------+
| 3 | min | 0% | 0% | 0% |
+---+---------+------+------+------+
| 4 | max | -50% | 0% | 0% |
+---+---------+------+------+------+
.
解决方案
按字符串列创建索引,将 DataFrames 除以DataFrame.div
、减去、1
乘以:DataFrame.sub
DataFrame.mul
df = df2.set_index('summary').div(df1.set_index('summary')).sub(1).mul(100).reset_index()
print (df)
summary col1 col2 col3
0 count 50.0 50.0 -50.0
1 mean -50.0 -50.0 -50.0
2 stddev 0.0 0.0 0.0
3 min NaN 0.0 0.0
4 max -50.0 0.0 0.0
编辑:
如果需要pct_change
在列表中的 DataFrames 之间,df1 与 df2,df2 与 df3 ...:
L = [df1, df2]
df = (pd.concat(L, keys=range(len(L)))
.set_index('summary', append=True)
.groupby(level=1)
.pct_change())
print (df)
col1 col2 col3
summary
0 0 count NaN NaN NaN
1 mean NaN NaN NaN
2 stddev NaN NaN NaN
3 min NaN NaN NaN
4 max NaN NaN NaN
1 0 count 0.5 0.5 -0.5
1 mean -0.5 -0.5 -0.5
2 stddev 0.0 0.0 0.0
3 min NaN 0.0 0.0
4 max -0.5 0.0 0.0
推荐阅读
- networking - 使用 DPDK 将数据包数据写入文件会导致丢包(即使流量速度 <200 Mbps)。瓶颈是什么?
- node.js - 如何为不和谐服务器制作“!踢”命令消息?
- node.js - 发生系统错误:uv_os_get_passwd 返回 ENOENT(没有这样的文件或目录)
- macos - 允许在 Safari 中下载多个文件
- python - `var = [ ... ]` 在 Python 中有什么作用?
- java - 从 JDK8 升级到 JDK11 后出现奇怪的 java.lang.InstantiationException 和 java.lang.NoSuchMethodException
- python-3.x - 从不同列中的唯一值创建 DataFrame 或字典
- xml - 在 Powershell 中将 xml 变量保存到文件会导致语法错误
- sql - 时间在周末和周末的情况
- c - 我收到此错误:分段错误(核心转储)