首页 > 解决方案 > 从R中的文本文件中提取数字

问题描述

我有多个文本文件,我想读取并提取包含“从不分类(0)”的行中的一个数字和一个文件名作为数据框。

files <- list.files(path= "directory/info/", pattern= "*.txt", full.names = TRUE)

data <- lapply(files, function(x) {

  datxt <- read.table(x, sep = "\t", header = TRUE, stringsAsFactors = FALSE)


  for (i in 1:length(datxt)){
    i = gsub("\\never classified (0)", "", i)
    }

 return(data.frame(file=x,NoOfReturn=i))
})

示例文本如下所示:

LASzip compression (version 3.4r1 c2 50000): POINT10 2
reporting minimum and maximum for all LAS point record entries ...
  X                   0        527
  Y                   0       2009
  Z                   0        241
  intensity           1        314
  return_number       1          1
  number_of_returns   1          1
  edge_of_flight_line 0          0
  scan_direction_flag 0          0
  classification      0          0
  scan_angle_rank     0          0
  user_data           0          0
  point_source_ID     0          0
number of first returns:        2781080
number of intermediate returns: 0
number of last returns:         2781080
number of single returns:       2781080
overview over number of returns of given pulse: 2781080 0 0 0 0 0 0
histogram of classification of points:
         2781080  never classified (0)

我想返回一个文件名和 2781080 作为数据框。

标签: r

解决方案


这应该有效:

files <- list.files(path= "directory/info/", pattern= "*.txt", full.names = TRUE)
data <- lapply(files, function(x) {
  # the data we're interested in doesn't seem to be a table 
  # easier to read it in as a character vector
  datxt <- readLines(x)

  # keep only the line with the text we're looking for
  datxt <- datxt[grepl(pattern = "never classified (0)", x = datxt, fixed = TRUE)]

  # get the number from that line
  n <- sub(pattern = "never classified (0)", replacement = "", x = datxt, fixed = TRUE)
  n <- as.numeric(trimws(n))

  return(data.frame(file = x, NoOfReturn = n))
})

推荐阅读