首页 > 解决方案 > Concatenate Rows If Value of Another Column In Next Row Is Empty

问题描述

I have a data set as shown in Input table below. I want to combine rows (4,5,6), rows (8,9) and rows (11,12) of Input table such that they share the same ID as shown in row 4,8 and 11 in the Output table below.

I tried merge(), but that didn't work as expected. The key here is the ID column which has unique values.

Any suggestions on how I can achieve this efficiently?

Input

Row Name Val1 Val2 Unit ID
1        -0.5 5.5   V   UI-001
2    a   -0.5 2.5   V   UI-002
3    b   -0.5 5.5   V   UI-003
4    c   -0.5 5.5   V   UI-004
5    d              
6    e              
7        -45 125  Ohms  UI-005
8    f     2        kV  UI-006
9    g              
10   h   500        V   UI-007
11   i    15        kV  UI-008
12   j              
13   k                  UI-009

dput() of Input

structure(list(Name = c(NA, "a", "b", "c", "d", "e", NA, "f", 
"g", "h", "i", "j", "k"), Val1 = c(-0.5, -0.5, -0.5, -0.5, NA, 
NA, -45, 2, NA, 500, 15, NA, NA), Val2 = c(5.5, 2.5, 5.5, 5.5, 
NA, NA, 125, NA, NA, NA, NA, NA, NA), Unit = c("V", "V", "V", 
"V", NA, NA, "Ohms", "kV", NA, "V", "kV", NA, NA), ID = c("UI-001", 
"UI-002", "UI-003", "UI-004", NA, NA, "UI-005", "UI-006", NA, 
"UI-007", "UI-008", NA, "UI-009")), row.names = c(NA, -13L), class = 
c("tbl_df", "tbl", "data.frame"))

Output

Row Name Val1 Val2 Unit ID
1        -0.5 5.5   V   UI-001
2    a   -0.5 2.5   V   UI-002
3    b   -0.5 5.5   V   UI-003
4    cde -0.5 5.5   V   UI-004      
5        -45  125 Ohms  UI-005
6    fg    2        kV  UI-006  
7    h   500        V   UI-007
8    ij   15        kV  UI-008
9    k                  UI-009

dput() of Output

structure(list(Name = c(NA, "a", "b", "cde", NA, "fg", "h", "ij", 
"k"), Val1 = c(-0.5, -0.5, -0.5, -0.5, -45, 2, 500, 15, NA), 
Val2 = c(5.5, 2.5, 5.5, 5.5, 125, NA, NA, NA, NA), Unit = c("V", 
"V", "V", "V", "Ohms", "kV", "V", "kV", NA), ID = c("UI-001", 
"UI-002", "UI-003", "UI-004", "UI-005", "UI-006", "UI-007", 
"UI-008", "UI-009")), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"))

标签: rconcatenation

解决方案


我们可能会使用

out <- df[!is.na(df$ID), ]
out$Name[!is.na(out$Name)] <- tapply(df$Name, cumsum(!is.na(df$ID)), paste, collapse = "")[!is.na(out$Name)]
out
#    Name  Val1  Val2 Unit     ID
# 1  <NA>  -0.5   5.5    V UI-001
# 2     a  -0.5   2.5    V UI-002
# 3     b  -0.5   5.5    V UI-003
# 4   cde  -0.5   5.5    V UI-004
# 7  <NA> -45.0 125.0 Ohms UI-005
# 8    fg   2.0    NA   kV UI-006
# 10    h 500.0    NA    V UI-007
# 11   ij  15.0    NA   kV UI-008
# 13    k    NA    NA <NA> UI-009

第一行去掉了IDis的所有行NA。然后

tapply(df$Name, cumsum(!is.na(df$ID)), paste, collapse = "")
#     1     2     3     4     5     6     7     8     9 
#  "NA"   "a"   "b" "cde"  "NA"  "fg"   "h"  "ij"   "k" 

构造正确的值Name!is.na(out$Name)为我们提供out应该修改的行(这是必需的,因为"NA"与 不同NA)。


推荐阅读