首页 > 解决方案 > 我的 R 脚本没有拾取文件夹中的所有文件

问题描述

我的 R 脚本正在尝试聚合 Concerned Files 文件夹(如下所示)中不同文件夹中的 excel 电子表格,并将所有数据放入一个主文件中。但是,脚本随机选择要从中复制信息的文件,当我运行代码时,会显示以下错误,所以我假设这就是为什么它没有选择文件夹中的每个文件的原因?

all_some_data <- rbind(all_some_data, temp) 
Error in rbind(deparse.level, ...) : 
numbers of columns of arguments do not match

整个代码:

#list of people's name it has to search the folders for. For our purposes, i am only taking one name
managers <- c("Name")
#directory of all the files
directory = 'C:/Users/Username/OneDrive/Desktop/Testing/Concerned Files/'

#Create an empty dataframe
all_HR_data <-
setNames(
data.frame(matrix(ncol = 8, nrow = 0)),
c("Employee", "ID", "Overtime", "Regular", "Total", "Start", "End", "Manager")
)
str(files)
#loop through managers to get time sheets and then add file to combined dataframe
for (i in managers){
#a path to find all the extract files
 files <-
 list.files(
  path = paste(directory, i, "/", sep = ""),
  pattern = "*.xls",
  full.names = FALSE,
  recursive = FALSE
    )

   #for each file, get a start and end date of period, remove unnecessary columns, rename columns and add manager name
for (j in files){
temp <- read_excel(paste(directory, i, "/", j, sep = ""), skip = 8)

 #a bunch of manipulations with the data being copied over. Code not relevant to the problem

all_some_data <- rbind(all_some_data, temp)
     }
}

标签: r

解决方案


问题的最可能原因是一个或多个文件中有一个额外的列。

一个潜在的解决方案以及性能改进是使用bind_rowsdplyr 包中的函数。这个函数比基数 R 更容错rbind

用 lapply 语句包裹你,然后bind_rows在一个语句中使用整个数据帧列表。

output <-lapply(files, function(j) {
      temp <- read_excel(paste(directory, i, "/", j, sep = ""), skip = 8)

       #a bunch of manipulations with the data being copied over. 
       # Code not relevant to the problem

      temp  #this is the returned value to the list
  })
all_some_data <- dplyr::bind_rows(output)

推荐阅读