问题描述

以虹膜数据为例来说明我的问题，我想对“.5”进行部分匹配并获取位置的索引（在我的真实数据中，0.5 实际上是一个字符串“_mutations”）。

我打算遍历每一行，执行部分匹配，获取第一个匹配的索引。我使用了以下内容；

idx = regexpr(pattern, txt[i,], ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)[1]

idx = regexec(pattern, txt[i,], ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)[1]

gregexpr(pattern, txt[j,], ignore.case = FALSE, perl = FALSE,
          fixed = FALSE, useBytes = FALSE)

stri_locate_first_regex(txt[i,], pattern)

str_detect(txt[i,], pattern)

示例数据如下所示；

library(ggplot2)
txt = iris
pattern=".5"

预期结果是第一个匹配的索引。

标签： r

将所有值替换TRUE为匹配项

df <- iris
# [] notation preserves structure
df[] <- lapply(X = df, function(x) {
    grepl(pattern = ".5",
          x = as.character(x),
          fixed = TRUE)
})

获取TRUE每列值的位置

sapply(X = df, which)

结果

# $Sepal.Length
# [1]  34  37  42  54  55  81  82  90  91 105 111 117 148
# 
# $Sepal.Width
# [1]   1  18  28  37  41  44  70  73  90  99 107 109 114 147
# 
# $Petal.Length
# [1]   4   8  10  11  16  20  22  28  32  33  35  40  49  52  56  61  67  69  79  80
# [21]  85  86 107 113 117 138
# 
# $Petal.Width
# [1]  24  52  53  55  62  67  69  73  79  85  87 101 110 120 134 145
# 
# $Species
# integer(0)

笔记

有很多方法可以解决这个问题。我喜欢这个解决方案，因为结果可读性很强，但我认为这在很大程度上是一个品味问题。

r - 数据框每一行的字符串的部分匹配

问题描述

解决方案

结果

笔记

推荐阅读