首页 > 解决方案 > 拆分后如何重新排序列?

问题描述

我有一个包含国家列表的数据框,它已使用该csplit函数进行拆分。

代码如下:-

df <- data.frame(country = c("India, South Africa", "United Kingdom, United States, India",
                             "England, Australia, South Africa, Germany, United States"))
splitstackshape::cSplit(df, "country", sep = ", ")
 
#        country_1     country_2    country_3 country_4     country_5
#1:          India  South Africa         <NA>      <NA>          <NA>
#2: United Kingdom United States        India      <NA>          <NA>
#3:        England     Australia South Africa   Germany United States

我希望以这样一种方式重新排列列,即country_1列应该包含United States<NA>。同样对于country_2and country_3,它应该分别是Indiaor<NA>United Kingdomor <NA>。从column_4病房开始,它可以按照行中的顺序进行。

预期输出如下,

#Expected Output
# country_1    country_2    country_3        country_4     country_5   country_6     country_7
#1 <NA>            India     <NA>            South Africa  <NA>        <NA>          <NA>
#2 United States   India     United Kingdom  <NA>          <NA>        <NA>          <NA>
#3 United States   <NA>      <NA>            England       Australia   South Africa  Germany

标签: rdata-manipulationdata-wrangling

解决方案


一个非常丑陋的解决方案,使用apply

df1 <- splitstackshape::cSplit(df, "country", sep = ", ")
n <- length(unique(na.omit(unlist(df1))))

as.data.frame(t(apply(df1, 1, function(x) {
      x1 <- rep(NA, n)
      if(any(x == 'United States', na.rm = TRUE)) x1[1] <- 'United States'
      if(any(x == 'India', na.rm = TRUE)) x1[2] <- 'India'
      if(any(x == 'United Kingdom', na.rm = TRUE)) x1[3] <- 'United Kingdom'
      temp <- setdiff(x, x1)
      if(length(temp)) x1[4:(4 + length(temp) - 1)] <- temp
      x1
})))

#             V1    V2             V3           V4        V5           V6      V7
#1          <NA> India           <NA> South Africa      <NA>         <NA>    <NA>
#2 United States India United Kingdom         <NA>      <NA>         <NA>    <NA>
#3 United States  <NA>           <NA>      England Australia South Africa Germany

推荐阅读