r - 根据 R 中的测量条件将长数据帧转换为宽格式
问题描述
我有一个这样的数据框
ID <- c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B")
ToolID <- c("CCP_A","CCP_A","CCQ_A","CCQ_A","IOT_B","CCP_B","CCQ_B","IOT_B",
"CCP_A","CCP_A","CCQ_A","CCQ_A","IOT_B","CCP_B","CCQ_B","IOT_B")
Step <- c("Step_A","Step_A","Step_B","Step_C","Step_D","Step_D","Step_E","Step_F",
"Step_A","Step_A","Step_B","Step_C","Step_D","Step_D","Step_E","Step_F")
Measurement <- c("Length","Breadth","Width","Height",NA,NA,NA,NA,
"Length","Breadth","Width","Height",NA,NA,NA,NA)
Passfail <- c("Pass","Pass","Fail","Fail","Pass","Pass","Pass","Pass",
"Pass","Pass","Fail","Fail","Pass","Pass","Pass","Pass")
Points <- c(7,5,3,4,0,0,0,0,17,15,13,14,0,0,0,0)
Average <- c(7.5,6.5,7.1,6.6,NA,NA,NA,NA,17.5,16.5,17.1,16.6,NA,NA,NA,NA)
Sigma <- c(2.5,2.5,2.1,2.6,NA,NA,NA,NA,12.5,12.5,12.1,12.6,NA,NA,NA,NA)
Tool <- c("ABC_1","ABC_2","ABD_1","ABD_2","COB_1","COB_2","COB_1","COB_2",
"ABC_1","ABC_2","ABD_1","ABD_2","COB_1","COB_2","COB_1","COB_2")
Dose <- c(NA,NA,NA,NA,17.1,NA,NA,17.3,NA,NA,NA,NA,117.1,NA,NA,117.3)
Machine <- c("CO2","CO6","CO3","CO6","CO2,CO6","CO2,CO3,CO4","CO2,CO3","CO2",
"CO2","CO6","CO3","CO6","CO2,CO6","CO2,CO3,CO4","CO2,CO3","CO2")
df1 <- data.frame(ID,ToolID,Step,Measurement,Passfail,Points,Average,Sigma,Tool,Dose,Machine)
我正在尝试使用这些条件将这个长数据框转换为宽格式。
1) 对于每个 ID,如果测量值是不是 NA,则旋转 ToolID、步长、Passfail 测量值、点数、平均值和 Sigma
所以结果列将是CCP_A_Step_A_Length_Points, CCP_A_Step_A_Length_Average, CCP_A_Step_A_Length_Sigma, CCP_A_Step_A_Length_Passfail
等等。
2) 对于每个 ID,如果测量结果为NA,则旋转 ToolID、Step with Tool、Dose & Machine
所以结果列将是IOT_B_Step_D__Tool, IOT_B_Step_D_Dose, IOT_B_Step_D_Machine
等等。
我希望这一切都在一个数据框中,所以在这种情况下,是一个有 2 行的数据框。
这是我想要的输出
ID CCP_A_Step_A_Length_Points CCP_A_Step_A_Length_Average CCP_A_Step_A_Length_Sigma CCP_A_Step_A_Length_Passfail CCP_A_Step_A_Breadth_Points CCP_A_Step_A_Breadth_Average
A 7 7.5 2.5 Pass 5 6.5
B 17 17.5 12.5 Pass 15 16.5
CCP_A_Step_A_Breadth_Sigma CCP_A_Step_A_Breadth_Passfail CCQ_A_Step_B_Width_Points CCQ_A_Step_B_Width_Average CCQ_A_Step_B_Width_Sigma CCQ_A_Step_B_Width_Passfail
2.5 Pass 3 7.1 2.1 Fail
12.5 Pass 13 17.1 12.1 Fail
CCQ_A_Step_C_Height_Points CCQ_A_Step_C_Height_Average CCQ_A_Step_C_Height_Sigma CCQ_A_Step_C_Height_Passfail IOT_B_Step_D__Tool IOT_B_Step_D_Dose IOT_B_Step_D_Machine
4 6.6 2.6 Fail COB_1 17.1 CO2,CO6
14 16.6 2.6 Fail COB_1 117.1 CO2,CO6
CCP_B_Step_D__Tool CCP_B_Step_D_Dose CCP_B_Step_D_Machine CCQ_B_Step_E__Tool CCQ_B_Step_E_Dose CCQ_B_Step_E_Machine IOT_B_Step_F__Tool CCQ_A_Step_F_Dose CCQ_A_Step_F_Machine
COB_2 NA CO2,CO3,CO4 COB_1 17.3 CO2,CO3 COB_2 NA CO2
COB_2 NA CO2,CO3,CO4 COB_1 117.3 CO2,CO3 COB_2 NA CO2
我正在尝试这样做,但没有做对。
library(reshape2)
df3 <- dcast(df1, ID + ToolID + Step + Measurement~ Passfail+Points+Average+Sigma)
有人可以指出我正确的方向吗?我想申请我更大的数据集,所以一个快速的解决方案会对我有很大帮助。
解决方案
我相信这应该让你得到你想要的:
df_na <- df1 %>%
filter(is.na(Measurement)) %>%
tbl_df()
df_nna <- df1 %>%
filter(!is.na(Measurement)) %>%
tbl_df()
df_nna_wide = df_nna %>%
gather(key=key, value=value, -ID, -ToolID, -Step, -Measurement) %>%
mutate(key = paste(ToolID, Step, Measurement, key, sep='_')) %>%
select(ID, key, value) %>%
arrange(ID, key, value) %>%
spread(key=key, value=value)
df_na_wide = df_na %>%
select(-Measurement) %>%
gather(key=key, value=value, -ID, -ToolID, -Step) %>%
mutate(key = paste(ToolID, Step, key, sep='_')) %>%
select(ID, key, value) %>%
arrange(ID, key, value) %>%
spread(key=key, value=value)
df_wide = df_nna_wide %>%
left_join(df_na_wide, by='ID')
如果您有一个非常大的数据集,那么data.tables
可能更适合您的需求,但我对语法不够熟悉,无法从中创建解决方案。
推荐阅读
- php - 如何使用 Laravel 存储图像?
- ruby - 重定向标准输出时,将 /dev/stdout 作为文件写入失败并出现 Errno::EACCES
- node.js - Discord.js ytdl 错误:输入流:状态代码:416
- linux - 为什么重新启动 AWS ec2 实例(sage maker)时会丢失 shh 密钥?
- python - 创建一个接受单词列表的函数,并返回句子中的一组单词
- angular - 当 proxy.conf.json 中的路径更改时,Java 应用程序无法识别更改
- javascript - 使用钩子更新对象状态
- r - 循环遍历数据集的每 x 个列作为 R 中的一个组
- java - 是否有一种单行方式来编写相同的代码,但在 Java 8 中使用某种类型的可变映射?
- ruby-on-rails - 在引导模式中显示视图不加载数据