首页 > 解决方案 > 从R中的字符串中提取多个数值

问题描述

我有一个数据集,我想只从以下字符串中提取数值:

{  "What are the last three digits of your zip code?": "043",  "What are the last three digits of your phone number?": "681"}

具体来说,我想将其提取为两个单独的列(043 和 681)。有没有办法用字符串中的这些符号来做到这一点?

标签: r

解决方案


我们可以用str_extract_all

library(stringr)
str_extract_all(str1, "\\d+")[[1]]
#[1] "043" "681"

如果有多个元素,我们可以这样做

library(dplyr)
library(tidyr)
tibble(col1 = str2) %>%
    mutate(col1 = str_extract_all(str2, "\\d+")) %>%
    unnest_wider(c(col1)) %>%
    set_names(str_c('col', seq_along(.)))

-输出

# A tibble: 2 x 2
#  col1  col2 
#  <chr> <chr>
#1 043   681  
#2 313   681  

数据

str1 <- "{ \"What are the last three digits of your zip code?\": \"043\", \"What are the last three digits of your phone number?\": \"681\"}"

str2 <- c('{  "What are the last three digits of your zip code?": "043",  "What are the last three digits of your phone number?": "681"}', '{  "What are the last three digits of your zip code?": "313",  "What are the last three digits of your phone number?": "681"}')

推荐阅读