首页 > 解决方案 > 如何使用 readr::write_delim() 写入包含的 .csv

问题描述

我正在尝试使用 write_delim 在 S3 中构建一个文件,我希望它用双引号 (") 括起来,但是我不知道它是否不是 write_delim 函数中的参数,我将需要使用基本 R 函数,或者如果我做错了这是我试过的

s3write_using(file_filtered,
              FUN = write_delim,
              delim = ",",
              na = "",
              object = paste0(output_path,
                              "file-",
                              lubridate::today(),
                              ".csv"),
              bucket = input_bucket)

s3write_using(file_filtered,
              FUN = write_delim,
              delim = ",",
              na = "",
              quote = "double",
              object = paste0(output_path,
                              "file-",
                              lubridate::today(),
                              ".csv"),
              bucket = input_bucket)



标签: rreadr

解决方案


如果我对您的理解正确,您希望将 csv 写入您的 S3 存储桶,其中包括开头的一个引号和结尾的一个引号。

从 s3write_using 文档:

乐趣:对于 s3write_using,将传递 x 和文件路径的函数(按此顺序)。

因此,您只需定义一个函数,该函数将 R 对象作为其第一个参数,并将引号括起来的 csv 字符串写入作为第二个参数传递的路径。

如果你真的担心优化问题,readr::write_delim肯定比 快write.csv,但是 data.table 库有一个更快的功能fwrite,它允许以与相同的方式引用write.csv

write_quoted_csv <- function(object, path)
{
  data.table::setDT(object)
  data.table::fwrite(object, path, quote = TRUE)
  data.table::setDF(object)
}

让我们write_delim使用具有 100,000 行的数据框对其进行测试:

df <- data.frame(a = 1:50000, 
                 b = 50001:100000, 
                 c = rep(LETTERS[1:10], each = 5000))

microbenchmark::microbenchmark(
  readr      = readr::write_delim(df, "~/test_readr.csv", delim = ",", na = ""),
  data.table = write_quoted_csv(df, "~/test_datatable.csv"), 
  times      = 100)
# Unit: milliseconds
#        expr       min       lq      mean    median        uq       max neval
#       readr 244.87593 257.6236 276.91877 262.86998 283.07285 416.79254   100
#  data.table  20.80768  22.8940  26.25808  24.92915  27.69624  54.55789   100

可以看到 data.table 方法快了 10 倍以上。即使那样,write_delim也不会加上引号,而fwrite

cat(readLines("~/test_readr.csv", 10), sep = "\n")
#> a,b,c
#> 1,50001,A
#> 2,50002,A
#> 3,50003,A
#> 4,50004,A
#> 5,50005,A
#> 6,50006,A
#> 7,50007,A
#> 8,50008,A
#> 9,50009,A
cat(readLines("~/test_datatable.csv", 10), sep = "\n")
#> "a","b","c"
#> 1,50001,"A"
#> 2,50002,"A"
#> 3,50003,"A"
#> 4,50004,"A"
#> 5,50005,"A"
#> 6,50006,"A"
#> 7,50007,"A"
#> 8,50008,"A"
#> 9,50009,"A"

因此,通过一种超快速的方法,您可以这样编写您的 s3 文件:

s3write_using(file_filtered,
              FUN = write_quoted_csv,
              object = paste0(output_path, "file-", lubridate::today(), ".csv"),
              bucket = input_bucket)

推荐阅读