首页 > 解决方案 > 领先或滞后函数来获取多个值,而不仅仅是第 n 个值

问题描述

我有一个包含每行单词列表的小标题。我想从搜索关键字的函数创建一个新变量,如果找到关键字,则创建一个由关键字加减 3 个单词组成的字符串。

下面的代码是close,但是,它不是在我的关键字之前和之后抓取所有三个单词,而是在前面/后面抓取单个单词 3。

df <- tibble(words = c("it", "was", "the", "best", "of", "times", 
                       "it", "was", "the", "worst", "of", "times"))
df <- df %>% mutate(chunks = ifelse(words=="times", 
                                    paste(lag(words, 3), 
                                          words, 
                                          lead(words, 3), sep = " "),
                                    NA))

最直观的解决方案是该lag函数是否可以执行以下操作:lead(words, 1:3)但这不起作用。

paste(lead(words,3), lead(words,2), lead(words,1),...lag(words,3)显然,我可以很快地手动

如果 tidyverse 中存在解决方案将是理想的,但任何解决方案都会有所帮助。任何帮助,将不胜感激。

标签: rdplyrlaglead

解决方案


一种选择是sapply

library(dplyr)

df %>%
  mutate(
    chunks = ifelse(
      words == "times",
      sapply(
        1:nrow(.),
        function(x) paste(words[pmax(1, x - 3):pmin(x + 3, nrow(.))], collapse = " ")
        ),
      NA
      )
  )

输出:

# A tibble: 12 x 2
   words chunks                      
   <chr> <chr>                       
 1 it    NA                          
 2 was   NA                          
 3 the   NA                          
 4 best  NA                          
 5 of    NA                          
 6 times the best of times it was the
 7 it    NA                          
 8 was   NA                          
 9 the   NA                          
10 worst NA                          
11 of    NA                          
12 times the worst of times   

虽然不是显式leadlag函数,但它通常也可以达到目的。


推荐阅读