首页 > 解决方案 > 如何跨列提取部分单元格值?

问题描述

我有一个这样的数据框:

df1<-structure(list(q006_1 = c("1098686880", "18493806","9892464","96193586",
                               "37723803","13925456","37713534","1085246853"),
                    q006_2 = c("1098160170","89009521","9726314","28076230","63451251",
                               "1090421499","37124019"),
                    q006_3 = c("52118967","41915062","1088245358","79277706","91478662",
                               "80048634")), 
               class=data.frame, row.names = c(NA, -8L)))

我知道如何substr在 data.table 中使用一列提取每个数字的最后五位数字,但我想在所有列中执行此操作。

n_last <- 5  

df1[, `q006_1`:= substr(q006_1, nchar(q006_1) - n_last + 1, nchar(q006_1))]

如何对所有列执行此操作?

标签: rdata.tablesubstring

解决方案


data.table可以如下完成:(您的示例数据不完整,因为第一列有 8 个,第二列有 7 个,第三列有 6 个条目。)

library(data.table)

#or `cols <- names(df1)` if you want to apply it on all columns and this is not just an example
cols <- c("q006_1", "q006_2", "q006_3") 

setDT(df1)[ , (cols):= lapply(.SD, function(x){
                                   sub('.*(?=.{5}$)', '', x, perl=T)}),
             .SDcols = cols][]

#     q006_1 q006_2 q006_3
# 1:  86880  60170  18967
# 2:  93806  09521  15062
# 3:  92464  26314  45358
# 4:  93586  76230  77706
# 5:  23803  51251  78662
# 6:  25456  21499  48634
# 7:  13534  24019  76230
# 8:  46853  76230  76230

数据:

df1<-structure(list(q006_1 = c("1098686880", "18493806","9892464","96193586",
                               "37723803","13925456","37713534","1085246853"),
                    q006_2 = c("1098160170","89009521","9726314","28076230",
                               "63451251","1090421499","37124019","28076230"),
                    q006_3 = c("52118967","41915062","1088245358","79277706",
                               "91478662","80048634","28076230","28076230")),
                class = c("data.frame"), row.names = c(NA, -8L))

推荐阅读