首页 > 解决方案 > excel batch load with specific sheet

问题描述

Im trying to batch load several xlsx files to a single dataframe. So far i have this piece of code which works great.

file.list <- list.files(path = "base-files/", pattern='*.xlsx', full.names = TRUE)
test1 <- sapply(file.list, read_xlsx, simplify=FALSE) %>%  bind_rows(.id = "id")

My issue is related with the number of sheets each xlsx file has. I'm specifically looking to load only sheet # 2 from each file. Is there any way i could add a sheet flag into the sapply function?

EDIT: After running your recommended code, i get some errors. I guess i need to install perl?

file.list <- list.files(path = "base-files/", pattern='*.xlsx', full.names = TRUE)
bind_rows(.id = "id")
test1 = lapply(file.list, function(x) {
                                        sheet_no <- if(sheetCount(x) == 1) 1 else 2
                                        read_xlsx(x, sheet = sheet_no)
                                       }) %>%   bind_rows(.id = 'id')



Error in findPerl(verbose = verbose) : 
perl executable not found. Use perl= argument to specify the correct path. 

EDIT Just installed perl for windows https://www.activestate.com/products/perl/downloads/

标签: rexceldataframe

解决方案


We can specify the sheet number

sapply(file.list, read_xlsx, sheet = 2, simplify=FALSE) %>%  
       bind_rows(.id = "id")

According to ?read_xlsx

sheet - Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Ignored if the sheet is specified via range. If neither argument specifies the sheet, defaults to the first sheet.


There is a sheetCount function from gdata

library(gdata)
lapply(file.list, function(x) {
          sheet_no <- if(sheetCount(x) == 1) 1 else 2
          read_xlsx(x, sheet = sheet_no)
    }) %>%
    bind_rows(.id = 'id')
             

推荐阅读