首页 > 解决方案 > grep 特定部分或数字/单词,带有 R,包含在文本文件中

问题描述

我有一个文本文件,其中包含来自研究分析的许多不同的输出部分。文本文件看起来像这样......

Zone  1         

Dist.   Time         Amb.   Time         Ster.  Time         Vert.  Vert.        Zone       Zone
Tr.(cm) Amb.         Cnts.  Ster.        Cnts.  Rest.        Cnts.  Time         Entries    Time
======= ============ ====== ============ ====== ============ ====== ============ ========== ============
 626.29 000:00:29.90    480 000:00:05.25     52 000:00:24.85     11 000:00:11.75          1 000:01:00.00
 489.99 000:00:23.20    401 000:00:07.30     75 000:00:29.45      5 000:00:11.65          0 000:01:00.00
-----------------------------------------------------------------------------------------------------

Zone Totals

Dist.   Time         Amb.   Time         Ster.  Time         Vert.  Vert.        Zone       Zone
Tr.(cm) Amb.         Cnts.  Ster.        Cnts.  Rest.        Cnts.  Time         Entries    Time
======= ============ ====== ============ ====== ============ ====== ============ ========== ============
5661.08 000:04:39.30   4360 000:00:55.35    572 000:04:25.35     81 000:02:23.85          1 000:10:00.00
======= ============ ====== ============ ====== ============ ====== ============ ==========     
-----------------------------------------------------------------------------------------------------

Block Summary
-------------
Dist.      Time         Amb.   Time         Ster.  Time         Vert.  Vert.        Zone
Trav.(cm)  Amb.         Cnts.  Ster.        Cnts.  Rest.        Cnts.  Time         Entries
========== ============ ====== ============ ====== ============ ====== ============ ==========
    626.29 000:00:29.90    480 000:00:05.25     52 000:00:24.85     11 000:00:11.75          1
    489.99 000:00:23.20    401 000:00:07.30     75 000:00:29.45      5 000:00:11.65          0

我怎样才能 grep 只是区域总部分?更具体地说,我想 grep 只是“Dist. Tr”。“区域总数”部分中的数字。但我会很高兴得到整个部分,然后在需要的地方裁剪线条。

我在想这样的事情......

dist_move = apply(data.frame(grep("Totals",dat)+1, grep("Block",dat)-2),1,function(x) (dat[x[1]:x[2]]))

但它只是抓住了所有的线

标签: r

解决方案


假设最后在 Note 中创建的文件,将其读入,找到该Zone Totals行并读取第 5 个下一行中的第一个数字。不使用任何软件包,它适用于单个和多个 Zone Total 部分。

L <- trimws(readLines("test-file.dat"))
scan(text = sub(" .*", "", L[grep("Zone Totals", L) + 5]), quiet = TRUE)
## [1] 5661.08

或者这个稍微短一点的变化:

L <- readLines("test-file.dat")
read.table(text = L[grep("Zone Totals", L) + 5])[[1]]
## [1] 5661.08

笔记

Lines <- "Zone  1         

Dist.   Time         Amb.   Time         Ster.  Time         Vert.  Vert.        Zone       Zone
Tr.(cm) Amb.         Cnts.  Ster.        Cnts.  Rest.        Cnts.  Time         Entries    Time
======= ============ ====== ============ ====== ============ ====== ============ ========== ============
 626.29 000:00:29.90    480 000:00:05.25     52 000:00:24.85     11 000:00:11.75          1 000:01:00.00
 489.99 000:00:23.20    401 000:00:07.30     75 000:00:29.45      5 000:00:11.65          0 000:01:00.00
-----------------------------------------------------------------------------------------------------

Zone Totals

Dist.   Time         Amb.   Time         Ster.  Time         Vert.  Vert.        Zone       Zone
Tr.(cm) Amb.         Cnts.  Ster.        Cnts.  Rest.        Cnts.  Time         Entries    Time
======= ============ ====== ============ ====== ============ ====== ============ ========== ============
5661.08 000:04:39.30   4360 000:00:55.35    572 000:04:25.35     81 000:02:23.85          1 000:10:00.00
======= ============ ====== ============ ====== ============ ====== ============ ==========     
-----------------------------------------------------------------------------------------------------

Block Summary
-------------
Dist.      Time         Amb.   Time         Ster.  Time         Vert.  Vert.        Zone
Trav.(cm)  Amb.         Cnts.  Ster.        Cnts.  Rest.        Cnts.  Time         Entries
========== ============ ====== ============ ====== ============ ====== ============ ==========
    626.29 000:00:29.90    480 000:00:05.25     52 000:00:24.85     11 000:00:11.75          1
    489.99 000:00:23.20    401 000:00:07.30     75 000:00:29.45      5 000:00:11.65
"
cat(Lines, file = "test-file.dat")

推荐阅读