r - 拆分后如何重新排序列?
问题描述
我有一个包含国家列表的数据框,它已使用该csplit
函数进行拆分。
代码如下:-
df <- data.frame(country = c("India, South Africa", "United Kingdom, United States, India",
"England, Australia, South Africa, Germany, United States"))
splitstackshape::cSplit(df, "country", sep = ", ")
# country_1 country_2 country_3 country_4 country_5
#1: India South Africa <NA> <NA> <NA>
#2: United Kingdom United States India <NA> <NA>
#3: England Australia South Africa Germany United States
我希望以这样一种方式重新排列列,即country_1
列应该包含United States
或<NA>
。同样对于country_2
and country_3
,它应该分别是India
or<NA>
和United Kingdom
or <NA>
。从column_4
病房开始,它可以按照行中的顺序进行。
预期输出如下,
#Expected Output
# country_1 country_2 country_3 country_4 country_5 country_6 country_7
#1 <NA> India <NA> South Africa <NA> <NA> <NA>
#2 United States India United Kingdom <NA> <NA> <NA> <NA>
#3 United States <NA> <NA> England Australia South Africa Germany
解决方案
一个非常丑陋的解决方案,使用apply
:
df1 <- splitstackshape::cSplit(df, "country", sep = ", ")
n <- length(unique(na.omit(unlist(df1))))
as.data.frame(t(apply(df1, 1, function(x) {
x1 <- rep(NA, n)
if(any(x == 'United States', na.rm = TRUE)) x1[1] <- 'United States'
if(any(x == 'India', na.rm = TRUE)) x1[2] <- 'India'
if(any(x == 'United Kingdom', na.rm = TRUE)) x1[3] <- 'United Kingdom'
temp <- setdiff(x, x1)
if(length(temp)) x1[4:(4 + length(temp) - 1)] <- temp
x1
})))
# V1 V2 V3 V4 V5 V6 V7
#1 <NA> India <NA> South Africa <NA> <NA> <NA>
#2 United States India United Kingdom <NA> <NA> <NA> <NA>
#3 United States <NA> <NA> England Australia South Africa Germany