首页 > 解决方案 > lapply 用于多个文件

问题描述

BorderData07 <- read_csv("Downloads/BorderData/BorderApprehension2007.csv")
BorderData08 <- read_csv("Downloads/BorderData/BorderApprehension2008.csv")
BorderData07[is.na(BorderData07)] = 0
B08[is.na(B08)] = 0
BorderData07$CITIZENSHIP <- str_to_title(BorderData07$CITIZENSHIP)
BorderData07$Region <- countrycode(sourcevar = BorderData07$CITIZENSHIP, origin = "country.name", destination = "region")
BorderData07[nrow(BorderData07), 26] <- "Total"
World_Region <- ddply(BorderData07,"Region",numcolwise(sum))
ggplot(World_Region, aes(x = Region, y = Total)) + geom_col(width = 0.5, position = position_dodge(3), fill = 'blue', alpha = 0.5) + scale_y_log10() + coord_flip() +  geom_text(aes(label=Total), alpha = 1.0, check_overlap = TRUE) +  ggtitle("Apprehension By World Region Totals in 2007")

我正在尝试使用 lapply 为我的边界数据的每一年运行每个 csv 文件。与每一个的唯一区别是 csv 文件的结尾和图表的标题。我对 lapply 的了解非常有限,并且无法学习如何使其正常运行。

标签: rfor-looplapply

解决方案


将您要应用于每个文件的所有内容放在一个函数中

apply_fun <- function(file) {
  x <- read_csv(file)
  year <- str_extract(file, '\\d+')
  x[is.na(x)] = 0
  x$CITIZENSHIP <- str_to_title(x$CITIZENSHIP)
  x$Region <- countrycode(sourcevar = x$CITIZENSHIP, origin = "country.name", destination = "region")
  x[nrow(x), 26] <- "Total"
  World_Region <- ddply(x,"Region",numcolwise(sum))
  ggplot(World_Region, aes(x = Region, y = Total)) + 
    geom_col(width = 0.5, position = position_dodge(3), fill = 'blue', alpha = 0.5) + 
    scale_y_log10() + coord_flip() +  
    geom_text(aes(label=Total), alpha = 1.0, check_overlap = TRUE) +  
    ggtitle(paste0("Apprehension By World Region Totals in", year))
}

然后使用lapply-

filename <- list.files('Downloads/BorderData/', pattern = '\\.csv$', full.names = TRUE)
list_plots <- lapply(filename, apply_fun)

推荐阅读