首页 > 解决方案 > 将字符串拆分为组件

问题描述

例如,我有一个包含几列的数据表:

column A                      column B
key_500:station and loc       2
spectra:key_600:type          9
alpha:key_100:number          12

我想按照以下规则将 A 列的行拆分为组件并创建新列:

我预期的最终数据表应该是这样的:

column A   column B  var1  var2
effect     2         500   station and loc
spectra    9         600   type
alpha      12        100   number

标签: rparsingextract

解决方案


Using tidyr extract you can extract specific part of the string using regex.

tidyr::extract(df, columnA, into = c('var1', 'var2'), 'key_(\\d+):(.*)', 
               convert = TRUE, remove = FALSE) %>%
  dplyr::mutate(columnA = sub(':?key_.*', '', columnA), 
         columnA = replace(columnA, columnA == '', 'effect'))

#  columnA var1            var2 columnB
#1  effect  500 station and loc       2
#2 spectra  600            type       9
#3   alpha  100          number      12

If you want to use data.table you can break this down in steps :

library(data.table)
setDT(df)
df[, c('var1', 'var2') := .(sub('.*key_(\\d+).*', '\\1',columnA), 
                            sub('.*key_\\d+:', '', columnA))]
df[, columnA := sub(':?key_.*', '', columnA)]
df[, columnA := replace(columnA, columnA == '', 'effect')]

data

df <-  structure(list(columnA = c("key_500:station and loc", 
"spectra:key_600:type", "alpha:key_100:number"), 
columnB = c(2L, 9L, 12L)), class = "data.frame", row.names = c(NA, -3L))

推荐阅读