python - 如何根据行中的多个条件和上面的数据填充数据框中的缺失数据
问题描述
我有以下数据框,
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00 482.5291 511.2351
2 2 1 NULL 18/6/2020 5:00 467.8704 519.9842
3 1 1 NULL 18/6/2020 5:00 390.5945 480.2252
2186 3 1 NULL 18/6/2020 5:00 0 0.5
2520 2 1 NULL 18/6/2020 5:00 0 21
2840 1 1 NULL 18/6/2020 6:00 0 12
4 1 1 NULL 18/6/2020 6:00 389.2222 480.2252
5 3 1 NULL 18/6/2020 6:00 516.0907 511.2351
6 2 1 NULL 18/6/2020 6:00 450.5216 519.9842
7 3 1 NULL 18/6/2020 6:00 397.9998 511.2351
8 2 1 NULL 18/6/2020 7:00 456.9486 519.9842
9 1 1 NULL 18/6/2020 7:00 414.6932 480.2252
1939 2 1 NULL 18/6/2020 7:00 0 24
2462 3 1 NULL 18/6/2020 7:00 0 3
3075 1 1 NULL 18/6/2020 7:00 0 3.5
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955
114798 3 1 NULL 18/6/2018 22:00 191.2512
114888 2 1 NULL 18/6/2018 22:00 190.5125
114657 2 1 NULL 18/6/2018 22:00 414.6432
114738 1 1 NULL 18/6/2018 22:00 429.43
114885 3 1 NULL 18/6/2018 23:00 361.3246
114756 1 1 NULL 18/6/2018 23:00 409.51
我需要填写 Theoretical 为空的列,但仅在废料也为空的列中填写。
所以条件是,当 LineID 为3
理论值511.2351
时2
,理论值始终为 519.9842,理论值始终为1
480.2252。但是当有废品值时,理论上应该是空的。
我似乎无法以这种方式找出前向填充方法。
我尝试了以下代码,但除了这些行之外,所有其他行都变为 NaN。
df['Theoretical'] = np.select([(df['LineID']==3) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
(df['LineID']==2) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
(df['LineID']==1) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),],
(511.2351,519.9842,480.2252), np.nan)
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00
2 2 1 NULL 18/6/2020 5:00
3 1 1 NULL 18/6/2020 5:00
2186 3 1 NULL 18/6/2020 5:00
2520 2 1 NULL 18/6/2020 5:00
2840 1 1 NULL 18/6/2020 6:00
4 1 1 NULL 18/6/2020 6:00
5 3 1 NULL 18/6/2020 6:00
6 2 1 NULL 18/6/2020 6:00
7 3 1 NULL 18/6/2020 6:00
8 2 1 NULL 18/6/2020 7:00
9 1 1 NULL 18/6/2020 7:00
1939 2 1 NULL 18/6/2020 7:00
2462 3 1 NULL 18/6/2020 7:00
3075 1 1 NULL 18/6/2020 7:00
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955 480.2252
114798 3 1 NULL 18/6/2018 22:00 191.2512 511.2351
114888 2 1 NULL 18/6/2018 22:00 190.5125 519.9842
114657 2 1 NULL 18/6/2018 22:00 414.6432 519.9842
114738 1 1 NULL 18/6/2018 22:00 429.43 480.2252
114885 3 1 NULL 18/6/2018 23:00 361.3246 511.2351
114756 1 1 NULL 18/6/2018 23:00 409.51 480.2252
我需要它是这样的
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00 482.5291 511.2351
2 2 1 NULL 18/6/2020 5:00 467.8704 519.9842
3 1 1 NULL 18/6/2020 5:00 390.5945 480.2252
2186 3 1 NULL 18/6/2020 5:00 0 0.5
2520 2 1 NULL 18/6/2020 5:00 0 21
2840 1 1 NULL 18/6/2020 6:00 0 12
4 1 1 NULL 18/6/2020 6:00 389.2222 480.2252
5 3 1 NULL 18/6/2020 6:00 516.0907 511.2351
6 2 1 NULL 18/6/2020 6:00 450.5216 519.9842
7 3 1 NULL 18/6/2020 6:00 397.9998 511.2351
8 2 1 NULL 18/6/2020 7:00 456.9486 519.9842
9 1 1 NULL 18/6/2020 7:00 414.6932 480.2252
1939 2 1 NULL 18/6/2020 7:00 0 24
2462 3 1 NULL 18/6/2020 7:00 0 3
3075 1 1 NULL 18/6/2020 7:00 0 3.5
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955 480.2252
114798 3 1 NULL 18/6/2018 22:00 191.2512 511.2351
114888 2 1 NULL 18/6/2018 22:00 190.5125 519.9842
114657 2 1 NULL 18/6/2018 22:00 414.6432 519.9842
114738 1 1 NULL 18/6/2018 22:00 429.43 480.2252
114885 3 1 NULL 18/6/2018 23:00 361.3246 511.2351
114756 1 1 NULL 18/6/2018 23:00 409.51 480.2252
解决方案
当然这不是最好的解决方案,但是您可以尝试以下方法
df_new = pd.DataFrame({
"LineID":[1, 2, 3, 1, 2, 1, 1, 2, 3, 1, 2, 1],
"Theoretical": [480.2252, 519.9842, 511.2351, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
"Scrap": [np.nan, 0.5, 21, np.nan, 24, np.nan, 40, 34, np.nan, 0.4, np.nan, 10]
})
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 NaN NaN
4 2 NaN 24.0
5 1 NaN NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 NaN NaN
9 1 NaN 0.4
10 2 NaN NaN
11 1 NaN 10.0
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 3), "Theoretical"] = 511.2351
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 2), "Theoretical"] = 519.9842
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 1), "Theoretical"] = 480.2252
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 480.2252 NaN
4 2 NaN 24.0
5 1 480.2252 NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 511.2351 NaN
9 1 NaN 0.4
10 2 519.9842 NaN
11 1 NaN 10.0
推荐阅读
- java - 如何从 ArrayList 中按键获取值?
- php - Laravel 5.5.40 验证 - 检查输入以字符串结尾(正则表达式)
- android - 初始化“com.android.tools.idea.AndroidInitialConfigurator”的致命错误
- aws-lambda - 将项目放入 DynamoDB 时出错:“将循环结构转换为 JSON”
- angular - 使用 Angular,Typescript 数组在某些情况下会显示为空,而在其他情况下则不会
- javascript - PHP echo javascript - 不回显 $
- ionic3 - Ionic 3 组件插入图像
- docker - 在虚拟机中运行 docker 是否会使 dockerized 应用程序可部署在所有虚拟专用服务器上?
- node.js - 调查猴子网络钩子
- jenkins - 使用反斜杠字符会导致 Jenkins 流水线中的错误