首页 > 解决方案 > 对多个 .csv 文件应用相同的代码

问题描述

我有几个相同格式的 .csv 文件,每分钟抽水一次。我需要汇总每天的总抽水量。一些数据有文本错误(数据记录器关闭)或负值(表示流量为 0)。我写了下面的代码来做到这一点。如何在多个文件上循环而不是每月复制和粘贴?所有文件都标有“Mon_Year_Well_Flows.csv”。我尝试使用 for 循环并使用 lapply 没有成功。另外,我对 R 很陌生,所以我知道我的代码可能非常低效。

第一个月数据文件“Jul_2017_Well_Flows.csv”的第一行

Date        DW_20   DW_24A  DW_25A   DW_26A DW_27A  DW_28   DW_29
9/1/18 0:00 995.88  1110.62 1229.14  -0.09  4.5    1100.95  913.33
9/1/18 0:01 1002.43 1115.85 1231.59  -0.09  4.5    1107.63  909.06
9/1/18 0:02 1007.01 1123.39 1236.75  -0.09  4.51   1108.37  935
9/1/18 0:03 1007.17 1121.69 1234.58  -0.09  4.52   1105.64  901.35
9/1/18 0:04 1005.27 1122.86 1233.25  -0.09  4.53   1107.56  911.15
9/1/18 0:05 1001.37 1116.39 1229.89  -0.09  4.54   1103.66  937.93

第一个月数据文件的代码

#Load data
data <- read.csv("Jul_2017_Well_Flows.csv", header = T)
#Create new data frame with date info
data1 <- data.frame("Date" = data$Date)
#Remove all error text to NA
index <- supply(data, is.factor)
data[index] <- apply(data[index], function(x) as.numeric(as.character(x)))
#Convert all NA values to 0
data[is.na(data)] <- 0
#Converting all negative pumping rates to 0
data[,-c(1)][data[,-c(1)]<0] <-0
#Add back original date column
data <- select(data, -Date)
data <- bind_cols(data,data1)
#Remove minute data and change day to date formatting
data$Day <- as.Date(data$Date, '%m/%d/%Y')
Jul_2017 <- data %>%
  #Remove date column
  select(-Date) %>%
  #Group all data according to day
  group_by(day) %>%
  #Sum all daily well data by day
  summarize_all(sum)

在每个月复制和粘贴上述代码结束时,我执行以下操作将所有输出文件绑定在一起 -

combined <- bind_rows(Jul_2017, Aug_2017....)

标签: r

解决方案


我要回答这个问题:

如何在多个文件上循环而不是每月复制和粘贴?

要开始使用,一种方法是首先获取该目录中的文件名列表。尝试:

filenames <- list.files("temp", pattern="*.csv", full.names=TRUE)


#Load data
data <- read.csv(filenames[[1]], header = T) # read in the first file as usual


#Create new data frame with date info
data1 <- data.frame("Date" = data$Date)
#Remove all error text to NA
index <- supply(data, is.factor)
data[index] <- apply(data[index], function(x) as.numeric(as.character(x)))
#Convert all NA values to 0
data[is.na(data)] <- 0
#Converting all negative pumping rates to 0
data[,-c(1)][data[,-c(1)]<0] <-0
#Add back original date column
data <- select(data, -Date)
data <- bind_cols(data,data1)
#Remove minute data and change day to date formatting
data$Day <- as.Date(data$Date, '%m/%d/%Y')
Jul_2017 <- data %>%
  #Remove date column
  select(-Date) %>%
  #Group all data according to day
  group_by(day) %>%
  #Sum all daily well data by day
  summarize_all(sum)


#I'm not sure if you can use bind_rows with one argument - I am not able
# to test code at the moment. Create a storage place for the combined dfs.
combined <- bind_rows(Jul_2017)

for (i in 2:len(filenames)) {
temp_month <- read.csv(filenames[[i]], header = TRUE) # Notice the temp_month

#Load data
data <- read.csv(filenames[[1]], header = T) # read in first file as usual
#Create new data frame with date info
data1 <- data.frame("Date" = data$Date)
#Remove all error text to NA
index <- supply(data, is.factor)
data[index] <- apply(data[index], function(x) as.numeric(as.character(x)))
#Convert all NA values to 0
data[is.na(data)] <- 0
#Converting all negative pumping rates to 0
data[,-c(1)][data[,-c(1)]<0] <-0
#Add back original date column
data <- select(data, -Date)
data <- bind_cols(data,data1)
#Remove minute data and change day to date formatting
data$Day <- as.Date(data$Date, '%m/%d/%Y')
temp_month <- data %>%
  #Remove date column
  select(-Date) %>%
  #Group all data according to day
  group_by(day) %>%
  #Sum all daily well data by day
  summarize_all(sum)

combined <- bind_rows(combined, temp_month)
}

推荐阅读