首页 > 解决方案 > 使用 r 使用数字向量选择正则表达式选择下方的行数

问题描述

这是这个问题的后续问题。

在这里,我需要基于数字向量在正则表达式下方提取 X 行。此外,我正在尝试基于一个向量循环正则表达式选择,在本例中命名为 range_Labels 并包含在下面的代码中。

假设我有一个名为 File 的文本文件,如下所示:

[1] "            2015  YOUTH RISK BEHAVIOR SURVEY RESULTS"                                
 [2] "                            Puerto Rico High School Survey"                          
 [3] "                                               Codebook"                             
 [4] " Data    Variable                              Question      Unweighted  Weighted"   
 [5] "Location  Name                              Code and Label   Frequency  Percentage"  
 [6] "17-17      Q1     How old are you?"                                                  
 [7] "                  1                  12 years old or younger        9         0.7"   
 [8] "                  2                  13 years old                  63         4.2"   
 [9] "                  3                  14 years old                 242        17.0"   
[10] "                  4                  15 years old                 317        21.3"   
[11] "                  5                  16 years old                 487        27.0"   
[12] "                  6                  17 years old                 399        23.0"   
[13] "                  7                  18 years old or older        100         6.8"   
[14] "                                     Missing                        4"               
[15] "18-18      Q2     What is your sex?"                                                 
[16] "                  1                  Female                       822        51.8"   
[17] "                  2                  Male                         790        48.2"   
[18] "                                     Missing                        9"               
[19] "19-19      Q3     In what grade are you?"                                            
[20] "                  1                  9th grade                    393        28.0"   
[21] "                  2                  10th grade                   378        25.6"   
[22] "                  3                  11th grade                   544        23.2"   
[23] "                  4                  12th grade                   300        23.1"   
[24] "                  5                  Ungraded or other grade        4         0.2"   
[25] "                                     Missing                        2"               
[26] "20-20      Q4     Are you Hispanic or Latino?"                                       
[27] "                  1                  Yes                        1,524        95.8"   
[28] "                  2                  No                            66         4.2"   
[29] "                                     Missing                       31"               
[30] "                                                                                   1"
[31] "

在我基于此代码使用的代码下方:

 index_qm <- grep("\\?$", File, perl = TRUE) # qm = question mark (?)

index_qm

index_Missing <- grep("Missing", File, perl = TRUE)
index_Missing

range_Labels <- index_Missing - index_qm - 1

range_Labels

sum(range_Labels)

# Extract labels 2 ----------------------------------------------------------

Labels <- NA

Labels <- 
    lapply(grep("\\?", File, perl = TRUE),

              FUN = function(x){

                  for (i in seq(range_Labels)){ # range_Lables[i] <- 7, 2, 5, 2

                      Labels <- File[x + 1:range_Labels]
                }
                  print(Labels)
              }
         )

最终结果 Labels 应该是一个等于 sum(range_Labels) 长度的列表,即 4,并且 Label 中的每个元素都应该等于 range_Lables = 7、2、5、2 的每个元素。

我正在寻找的最终结果是:

q1_label <- c
               "13 years old", "14 years old", 
               "15 years old", "16 years old", 
               "17 years old", "18 years old or older")
#
q2_label <- c("Female", "Male")

#
q3_label <- c("9th grade", "10th grade", 
                 "11th grade", "12th grade", 
                 "Ungraded")

#
q4_label <- c("Yes", "No")

我知道我在这里遗漏了一个基本的循环概念,但我无法弄清楚。

非常感谢。

标签: rregexfor-loop

解决方案


推荐阅读