首页 > 解决方案 > Read certain columns if exist using read.csv.sql from sqldf

问题描述

I have very large csv files (2.3 GB). I only want to read certain columns that could be or could not be there.

I am using the following code that was suggested here Only read limited number of columns

library(sqldf) 
loc <- read.csv.sql("data.csv",
                    sql = "select locID, City, CRESTA, Latitude, Longitude from file",
                    sep = ",")

How can I deal with the situation when for example the column "City" is not in the csv?

标签: sqlrsqldf

解决方案


这会找出哪些列可用,将它们的名称与所需列的名称相交并仅读取这些列。

library(sqldf)

nms_wanted <- c("locID", "City", "CRESTA", "Latitude", "Longitude")
nms_avail <- names(read.csv("data.csv", nrows = 0))
nms <- intersect(nms_avail, nms_wanted)
fn$read.csv.sql("data.csv", "select `toString(nms)` from file")

推荐阅读