python - python pandas用最大值填充NaN或毯子
问题描述
我有一个大数据框的问题。这是一个小片段。我想用最大值填充最后一列 E,如果有一些值或让它为空。那就是数据:
d = {'A': [4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074,
4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074],
'B': ['SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746','SP000796746',
'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746',
'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746'],
'C': [201926, 201926, 201926, 201926, 201926, 201926, 201909,201909, 201909, 201909, 201909,
201909, 201933, 201933, 201933, 201933, 201933, 201933],
'D': [-1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4],
'E': [np.nan, 1000, 1000, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 3000, 3000, np.nan]}
它看起来像这样:
A B C D E
0 4000074 SP000796746 201926 -1 NaN
1 4000074 SP000796746 201926 0 1000.0
2 4000074 SP000796746 201926 1 1000.0
3 4000074 SP000796746 201926 2 NaN
4 4000074 SP000796746 201926 3 NaN
5 4000074 SP000796746 201926 4 NaN
6 4000074 SP000796746 201909 -1 NaN
7 4000074 SP000796746 201909 0 NaN
8 4000074 SP000796746 201909 1 NaN
9 4000074 SP000796746 201909 2 NaN
10 4000074 SP000796746 201909 3 NaN
11 4000074 SP000796746 201909 4 NaN
12 4000074 SP000796746 201933 -1 NaN
13 4000074 SP000796746 201933 0 NaN
14 4000074 SP000796746 201933 1 NaN
15 4000074 SP000796746 201933 2 3000.0
16 4000074 SP000796746 201933 3 3000.0
17 4000074 SP000796746 201933 4 NaN
但我的目标是用最大值填充列“E”,如果在 -1 到 4 范围内(D 列)之间有任何值。如果没有,它应该保持为空。所以它应该看起来像:
A B C D E
0 4000074 SP000796746 201926 -1 0
1 4000074 SP000796746 201926 0 1000.0
2 4000074 SP000796746 201926 1 1000.0
3 4000074 SP000796746 201926 2 0
4 4000074 SP000796746 201926 3 0
5 4000074 SP000796746 201926 4 0
6 4000074 SP000796746 201909 -1 NaN
7 4000074 SP000796746 201909 0 NaN
8 4000074 SP000796746 201909 1 NaN
9 4000074 SP000796746 201909 2 NaN
10 4000074 SP000796746 201909 3 NaN
11 4000074 SP000796746 201909 4 NaN
12 4000074 SP000796746 201933 -1 3000.0
13 4000074 SP000796746 201933 0 3000.0
14 4000074 SP000796746 201933 1 3000.0
15 4000074 SP000796746 201933 2 3000.0
16 4000074 SP000796746 201933 3 3000.0
17 4000074 SP000796746 201933 4 3000.0
我的代码如下所示:
df = d
indx = df[df['D'] == -1].index.values
for i, j in zip(indx[:-1], indx[1:]):
df.loc[i:j-1, 'E'] = df.loc[i:j-1, 'E'].max()
if j == indx[-1]:
df.loc[j:, 'E'] = df.loc[j:, 'E'].max()
它不适用于非常大的数据帧......也许有人对我的代码有另一个代码或更正的想法。
谢谢!!
A B C D E
0 4000074 SP000796746 201926 -1 0
1 4000074 SP000796746 201926 0 1000.0
2 4000074 SP000796746 201926 1 1000.0
3 4000074 SP000796746 201926 2 0
4 4000074 SP000796746 201926 3 0
5 4000074 SP000796746 201926 4 0
6 4000074 SP000796746 201909 -1 NaN
7 4000074 SP000796746 201909 0 NaN
8 4000074 SP000796746 201909 1 NaN
9 4000074 SP000796746 201909 2 NaN
10 4000074 SP000796746 201909 3 NaN
11 4000074 SP000796746 201909 4 NaN
12 4000074 SP000796746 201933 -1 0
13 4000074 SP000796746 201933 0 0
14 4000074 SP000796746 201933 1 0
15 4000074 SP000796746 201933 2 3000.0
16 4000074 SP000796746 201933 3 3000.0
17 4000074 SP000796746 201933 4 0
解决方案
您可以使用D 列中的新 -1groupby.transform
和. 然后是原始列。max
cumsum
fillna
df['E'] = df['E'].fillna(df['E'].groupby(df['D'].eq(-1).cumsum()).transform('max'))
编辑:用零填充,你可以这样做:
mask = df['E'].groupby(df['D'].eq(-1).cumsum()).transform('any')
df.loc[mask, 'E'] = df.loc[mask, 'E'].fillna(0)
推荐阅读
- java - 为什么我在尝试访问 RabbitMQ 中的 URL 时会出现 Channel 错误?
- bash - 发现这个 shell 脚本是在我的 python 服务器日志中执行的。谁能告诉我它的目的?
- python - 检查我的 .agg 中是否有低于 var 的值
- wolfram-mathematica - 在 Mathematica 中重复调用函数
- php - 链接怎么会消失?
- vue.js - 重置 v-autocomplete 中的选择值,以便添加多个项目
- angular - 禁用 mediaStream 上的视频轨道也禁用 webrtc 中的音轨
- python - 如何从“span”内的html“类”中获取/抓取所有元素?
- shared-libraries - 将共享库作为可执行文件运行?
- mysql - LIKE 比 MySQL 中的 FULLTEXT 搜索更快