首页 > 解决方案 > 如何用 data.frame 重写我的数据框转换

问题描述

我有一个数据框:

ID       value
1      he following object is masked from ‘package:purrr’. R is free software and comes with ABSOLUTELY NO WARRANTY
2      Attaching package: ‘magrittr’. Natural language support but running in an English locale
2      Attaching package: ‘DT’. Natural language support but running in an English locale
2      Attaching package: ‘anytime’. Natural language support but running in an English locale
3      package ‘ggplot2’ was built under R version 3.6.2. Type 'contributors()' for more information
4      Warning messages: type 'demo()' for some demos, 'help()' for on-line help
4      Warning messages: 'help.start()' for an HTML browser interface to help

如何创建它:

ID <- c(1,2,2,2,3,4,4)
value <- c("he following object is masked from ‘package:purrr’. R is free software and comes with ABSOLUTELY NO WARRANTY",
           "Attaching package: ‘magrittr’. Natural language support but running in an English locale",
           "Attaching package: ‘DT’. Natural language support but running in an English locale",
           "Attaching package: ‘anytime’. Natural language support but running in an English locale",
           "package ‘ggplot2’ was built under R version 3.6.2. Type 'contributors()' for more information",
           "Warning messages:type 'demo()' for some demos, 'help()' for on-line help",
           "Warning messages:'help.start()' for an HTML browser interface to help")



df <- data.table(ID, value)

我用这段代码转换它:

df_patterns <- df  %>% 
  mutate(pattern= stringr::str_extract(value, "\\S+\\s+\\S+\\s+\\S+"),
         pattern = coalesce(stringr::str_extract(pattern, "^Attaching package:|Warning messages:"),pattern),
         id_type = case_when(ID %in% c(1, 5) ~ "extra_type")
  ) %>%  
  group_by(ID, pattern) %>%
  summarise(example = sample(value,1)) %>%
  ungroup() %>%
  mutate(pattern=coalesce(pattern, example))

输出是:

ID            pattern                  example                                                                                                               id_type                               
1      he following object        he following object is masked from ‘package:purrr’. R is free software and comes with ABSOLUTELY NO WARRANTY             extra_type

2      Attaching package:         Attaching package: ‘anytime’. Natural language support but running in an English locale                                     NA

3      package ‘ggplot2’ was      package ‘ggplot2’ was built under R version 3.6.2. Type 'contributors()' for more information                               NA

4      Warning messages:          Warning messages:'help.start()' for an HTML browser interface to help                                                       NA

及其所需的输出。如您所见,我创建了新的列模式并按其分组数据表。我还添加了带有模式示例的列示例。

我怎么能用 data.table 重写这个转换?我不想使用 mutate 和其他功能,而是想使用 data.table 的功能。但我不擅长它。我试过这个,但我不知道下一步该怎么做:

df_patterns <- df[, c("pattern", "id_type") := list(
  pattern = coalesce(stringr::str_extract(pattern= stringr::str_extract(value, "\\S+\\s+\\S+\\s+\\S+"), "^Attaching package:|Warning messages:"),pattern= stringr::str_extract(value, "\\S+\\s+\\S+\\s+\\S+")),
  case_when(ID %in% c(1, 5) ~ "extra_type")), by = ID, pattern]

标签: rdataframedata.table

解决方案


删除除以下之外的所有依赖data.table项应该与您的预期输出匹配(但当然会在没有设置种子的情况下有所不同):

df_patterns <- 
  copy(df)[, pattern := fcase(
               startsWith(value, "Attaching package:"), "Attaching package:",
               startsWith(value, "Warning messages:"), "Warning messages:",
               rep(TRUE, nrow(df)), sub("((\\S+\\s+){2}\\S+).+", "\\1", value)
             )][, 
                .(
                  example = sample(value, 1), 
                  id_type = fifelse(ID %in% c(1,5),  "extra_type", NA_character_)
                ), 
                by = .(ID, pattern)]


   ID               pattern                                                                                                      example    id_type
1:  1   he following object he following object is masked from ‘package:purrr’. R is free software and comes with ABSOLUTELY NO WARRANTY extra_type
2:  2    Attaching package:                           Attaching package: ‘DT’. Natural language support but running in an English locale       <NA>
3:  3 package ‘ggplot2’ was                package ‘ggplot2’ was built under R version 3.6.2. Type 'contributors()' for more information       <NA>
4:  4     Warning messages:                                     Warning messages:type 'demo()' for some demos, 'help()' for on-line help       <NA>

推荐阅读