python - R 中 tidyr::complete 的 Python 等效项,允许指定附加值
问题描述
我正在寻找重新创建一个 R 脚本,我被困在如何在 Python 中重新创建这个管道。我正在分析不同工厂的累积生产,需要将它们的累积生产时间标准化以便进行比较。
管道看起来像这样:
Norm_hrs <- Cum_df%>%
group_by(Name)%>%
complete(Cum_hrs = seq(0,max(Cum_hrs),730.5))
它需要这个:
Name Cum_Hrs A B C
Factory 1 1 0 1.887861 3.775722
Factory 1 251 0 2104.335728 21932.57871
Factory 1 611 0 2324.586178 37498.99722
Factory 1 1208 0 4361.588197 65235.05541
Factory 2 48 0 1517.840244 6604.770432
Factory 2 163 0 3370.461172 17252.70972
Factory 2 822 0 13284.87786 71918.78308
Factory 2 1541 0 21476.93602 134569.0388
Factory 2 2285 0 32053.99192 225895.1477
Factory 2 3028 0 42299.41357 340798.6151
Factory 2 3699 0 50125.85599 462145.5438
Factory 2 4436 0 56715.74945 584474.9989
并把它变成这样:
Name Cum_Hrs A B C
Factory 1 1 0 1.887861 3.775722
Factory 1 251 0 2104.335728 21932.57871
Factory 1 611 0 2324.586178 37498.99722
Factory 1 730.5 NA NA NA
Factory 1 1208 0 4361.588197 65235.05541
Factory 2 48 0 1517.840244 6604.770432
Factory 2 163 0 3370.461172 17252.70972
Factory 2 730.5 NA NA NA
Factory 2 822 0 13284.87786 71918.78308
Factory 2 1461 NA NA NA
Factory 2 1541 0 21476.93602 134569.0388
Factory 2 2091.5 NA NA NA
Factory 2 2285 0 32053.99192 225895.1477
Factory 2 2922 NA NA NA
Factory 2 3028 0 42299.41357 340798.6151
这反过来又允许我在 DataFrame 中插入 NA 的值以获得标准化的时间步长
解决方案
只需将所有唯一名称的顺序数据帧与增量Cum_Hrs值连接起来:
seq_df = pd.concat([pd.DataFrame({'Name': i, 'Cum_Hrs': np.arange(0, max(g['Cum_Hrs']), 730.5)})
for i,g in df.groupby(['Name'])])
final_df = (pd.concat([df, seq_df], sort=True)
.sort_values(['Name', 'Cum_Hrs'])
.reset_index(drop=True)
.reindex(columns=df.columns)
)
print(final_df)
# Name Cum_Hrs A B C
# 0 Factory 1 0.0 NaN NaN NaN
# 1 Factory 1 1.0 0.0 1.887861 3.775722
# 2 Factory 1 251.0 0.0 2104.335728 21932.578710
# 3 Factory 1 611.0 0.0 2324.586178 37498.997220
# 4 Factory 1 730.5 NaN NaN NaN
# 5 Factory 1 1208.0 0.0 4361.588197 65235.055410
# 6 Factory 2 0.0 NaN NaN NaN
# 7 Factory 2 48.0 0.0 1517.840244 6604.770432
# 8 Factory 2 163.0 0.0 3370.461172 17252.709720
# 9 Factory 2 730.5 NaN NaN NaN
# 10 Factory 2 822.0 0.0 13284.877860 71918.783080
# 11 Factory 2 1461.0 NaN NaN NaN
# 12 Factory 2 1541.0 0.0 21476.936020 134569.038800
# 13 Factory 2 2191.5 NaN NaN NaN
# 14 Factory 2 2285.0 0.0 32053.991920 225895.147700
# 15 Factory 2 2922.0 NaN NaN NaN
# 16 Factory 2 3028.0 0.0 42299.413570 340798.615100
# 17 Factory 2 3652.5 NaN NaN NaN
# 18 Factory 2 3699.0 0.0 50125.855990 462145.543800
# 19 Factory 2 4383.0 NaN NaN NaN
# 20 Factory 2 4436.0 0.0 56715.749450 584474.998900
类似的过程可以在 base R 中处理。通常将 base R(非 tidyverse)转换为 Pandas 更容易:
seq
==>np.arange
by
==>pd.DataFrame.groupby
data.frame
==>pd.DataFrame
do.call
+rbind
==>pd.concat
order
==>pd.sort_values
row.names=NULL
==>pd.reset_index()
R
# BUILD SEQUENCE DATA FRAME
seq_df = do.call(rbind, by(df, df$Name, function(sub)
data.frame(Name = sub$Name[[1]],
Cum_Hrs = seq(0, max(sub$Cum_Hrs), 730.5),
A = NA, B = NA, C = NA))
)
# CONCATENATE REFERENCING EVERY COLUMN
final_df = rbind(df, seq_df)
# SORT ROWS AND RESET ROW NAMES
final_df = with(final_df, data.frame(final_df[order(Name, Cum_Hrs),], row.names=NULL))
final_df
推荐阅读
- firebase - Chrome 69 不支持 Firebase SDK
- asp.net-mvc - 如何在 chrome 中调试 _Layout.cshtml
- css - 无法在 PhpStorm 中运行 File Watcher 'Sass'
- c# - HangerFire 在删除失败作业时在生产服务器中显示 404 错误
- r - 无法安装 sp 软件包二进制文件?
- vba - 根据其中的值选择销售的 VBA 代码
- ios - Apple Business Chat 中richLinkData 的有效负载是什么
- python - 如果在循环内调用的函数执行时间过长,如何在 python 中跳过循环迭代?
- javascript - 无法迭代多个 JSON 值
- python - API tensorflow 检测 Faster RCNN with FPN