r - 从字符串中提取单词
问题描述
我想提取唯一的对象 Scaffold 和每个字符串的脚手架数量(例如脚手架 6)。有任何想法吗?
[2] "KQ415657.1 isolate UCB-ISO-001 unplaced genomic scaffold Scaffold5, whole genome shotgun sequence"
[3] "ABCD0100000.1 isolate UCB-ISO-001 Scaffold6_contig_1, whole genome shotgun sequence"
[4] "ABCDD0100001.1 isolate UCB-ISO-001 Scaffold8_contig_1, whole genome shotgun sequence"
[5] "ABCD0100002.1 isolate UCB-ISO-001 Scaffold2_contig_1, whole genome shotgun sequence"
[6] "ABCD0100003.1 isolate UCB-ISO-001 Scaffold6_contig_1, whole genome shotgun sequence"
[7] "ABCD0100004.1 isolate UCB-ISO-001 Scaffold2_contig_1, whole genome shotgun sequence"
[8] "ABCD0100005.1 isolate UCB-ISO-001 Scaffold7_contig_1, whole genome shotgun sequence"
[9] "ABCD0100006.1 isolate UCB-ISO-001 Scaffold8_contig_1, whole genome shotgun sequence"
解决方案
这是存储为字符串向量还是data.frame?每行是否总是包含一个 Scaffold 字符串?
如果它只是一个向量:
STRING = c("This is some vector Scaffold1", "some Scaffold20 string with stuff")
stringr::str_split(string = STRING, pattern = " ") %>%
lapply(function(x) x[grepl("Scaffold", x)]) %>%
unlist()
[1] "Scaffold1" "Scaffold20"
如果您可以将它放在 data.frame 中,它可能会更整洁:
library(tidyverse)
data.frame(String = STRING, stringsAsFactors = F) %>%
separate(String, paste0("V", 1:8), remove = F) %>%
gather(key,val, starts_with("V")) %>%
filter(grepl("Scaffold", val)) %>%
select(-key)
String val
1 some Scaffold20 string with stuff Scaffold20
2 This is some vector Scaffold1 Scaffold1
推荐阅读
- apache-kafka - 如何动态分支到 100 多个输出主题?
- c++ - 构建 Rcpp 包:如何使函数内部化
- java - 为什么会出现错误:'error: package com.mysql does not exist'?
- r - 如何在 R 中找到某些行的总和以获得每行的总计?
- django - 在 Django 管理命令中设置 csv 文件的默认路径
- pagination - 需要带有分页示例的 Cosmos DB .NET SDK V3 查询
- python - 将列转换为 pd 中的行。数据框
- graph - 如何将 GRAPH 变量传递给 SPARQL 中的 FROM 语句?
- python - Python ipywidgets 切换按钮样式颜色
- javascript - 如何将两个块元素附加到一个div?