首页 > 解决方案 > 如何在purrr中获得最接近参考词的词

问题描述

我有一个清单如下:

list(c("\n", "\n", "oesophagus graded  and fine\n", 
"\n", "\n", "\n", "stomach and  antrum  altough with some rfa response rfa\n", 
"\n", "mucosa washed a lot\n", "\n", "treated with halo rfa ultra \n", 
"\n", "total of 100 times\n", "\n", "duodenum looks ok"))

我想从一个列表中提取与另一个列表中的另一个词最接近的词。

我想要的输出是

antrum:rfa

我的第一个清单是:

EventList<-c("rfa", "apc", "dilat", "emr", "clip", "grasp", "probe", "iodine", 
"acetic", "nac", "peg", "botox")

我的第二个清单是:

tofind<-"ascending|descending|sigmoid|rectum|transverse|caecum|splenic|ileum|rectosigmoid|ileocaecal|hepatic|colon|terminal|terminal ileum|ileoanal|prepouch|pouch|stomach|antrum|duodenum|oesophagus|goj|ogj|cardia|anastomosis"

我正在使用的代码是:

EventList %>%
        map(
          ~words %>%
            str_which(paste0('^.*', .x)) %>%
            map_chr(
              ~words[1:.x] %>%
                str_c(collapse = ' ') %>%

                str_extract_all(regex(tofind, ignore_case = TRUE)) %>%
                map_if(is_empty, ~ NA_character_) %>%
                flatten_chr()%>%
                `[[`(1) %>%

                .[length(.)]
            ) %>%
            paste0(':', .x)
        ) %>%
        unlist() %>%
        str_subset('.+:')

这给了我事件(在这种情况下rfa),但不是将它分配给antrum,而是将它分配给oesophagus

因此,它将它赋予tofind列表中找到的第一个术语,而不是最接近事件的术语。

我怀疑这条线

`[[`(1) %>%

 .[length(.)]

是罪魁祸首,但我不知道如何更改它,以便它给我最接近的术语而不是第一个术语

标签: r

解决方案


以下为您提供了匹配中tofind每个匹配元素的最后一个元素EventList

map(EventList, 
    function(event) {
      indices <- map(words, str_which, pattern = event)
      map(indices, function(i) 
        map2_chr(words, i, ~ .x[seq_len(.y)] %>% 
               str_c(collapse = ' ') %>% 
               str_extract_all(regex(tofind, ignore_case = TRUE), simplify = TRUE) %>% 
               last()) %>%
          map_if(is_empty, ~ NA_character_)
        ) %>% 
        unlist() %>% 
        paste0(':', event)
    })  %>%
  unlist() %>%
  str_subset('.+:')

# [1] "antrum:rfa"     "oesophagus:rfa"

推荐阅读