首页 > 解决方案 > 在 R 中使用多张纸读取 xlsx 以删除重复项

问题描述

我有一个嵌入了多张工作表的 excel 文件。我的主要目标是基本上删除在一张纸中多次出现的所有行,并且必须为每张纸执行此操作。我已经编写了下面的代码,但代码只是读取第一张纸,并在第一行和第一列给出“...”。有人可以帮我解决我可能出错的地方。谢谢先进

**config_file_name <- '/RBIAPI3tables.xlsx'
config_xl <- paste(currentPath,config_file_name,sep="")
config_xl_sheets_name <- excel_sheets(path = config_xl) # An array of sheets is created. To access the array use config_xl_sheets[1] 
count_of_xl_sheets <- length(config_xl_sheets_name) 
# Read all sheets in the file as separate lists
list_all_sheets <- lapply(config_xl_sheets_name, function(x) read_excel(path = config_xl, sheet = x))
names (list_all_sheets) <- config_xl_sheets_name # Change the name of all the lists to excel file sheets name
count_of_list_all_sheets <- length(list_all_sheets) # to get the data frame of each list use list_all_sheets[[Config]]
# Create data frame for each sheet Assign the sheet name to the data frame
for (i in 1:count_of_list_all_sheets)
{
  assign(x= trimws(config_xl_sheets_name[i]), value = data.frame(list_all_sheets[[i]]))
  updateddata = unique(list_all_sheets[[i]])
}
write.xlsx(updateddata,"Unique3tables.xlsx",showNA = FALSE)**

标签: rdplyrtibblereadxlr-xlsx

解决方案


这是我的方法

library(readxl)
library(data.table)
library(openxlsx)
file.to.read   <- "./testdata.xlsx"
sheets.to.read <- readxl::excel_sheets(file.to.read)

# read sheets from the file to a list and remove duplicate rows
L <- lapply(sheets.to.read, function(x) {
  data <- setDT(readxl::read_excel(file.to.read, sheet = x))
  #remove puplicates
  data[!duplicated(data), ]
  })

# create a new workbook
wb <- createWorkbook()
# create new worksheets an write to them
for (i in seq.int(L)) {
  addWorksheet(wb, sheets.to.read[i])
  writeData(wb, i, L[[i]] )
}
# write the workbook to disk
saveWorkbook(wb, "testdata_new.xlsx")

推荐阅读