如何从csv中的多个文件计算平均值


在 python 中使用这个选项可以计算多个 csv 文件的平均值

如果 file1.csv 到 file100.csv 都在同一个目录中,你可以使用这个 Python 脚本:

#!/usr/bin/env python3

N = 100
mean_sum = 0
std_sum = 0
for i in range(1, N + 1):
    with open(f"file{i}.csv") as f:
        mean_sum += float(f.readline().split(",")[1])
        std_sum += float(f.readline().split(",")[1])

print(f"Mean of means: {mean_sum / N}")
print(f"Mean of stds: {std_sum / N}")

怎么可能在 R 中实现它?

“一切都可以编码”,埃里克 :)


以下是基于{tidyverse}; 一组可以很好地协同工作的软件包。我写的差不多pseudo-code应该能让你继续前进。显然,您将不得不适应、重命名以适合您的项目/变量名称等。


library(readr)     # package to read tabular data
library(dplyr)     # main working horse to crunch data
library(purrr)     # functional programming for iterations/loops

pth <- "my-data-folder"    # provide path to your data

# create a list of file names in your folder
## you may need to fine-tune the regular pattern to select the files you look for
## full.names gives you the path/name of your data files
## \\.csv is the way to "escape" the dot of the csv type ending

fns <- list.files(path = pth, pattern = "*file.*\\.csv", full.names = TRUE)

# write a function that reads the file and calculates your stats
## you can "summarise" stats over a table

my_function <- function(.fn){
  df <- read_csv(.fn)     # read the file
  df <- df %>% 
    summarise(MEAN = mean(my-target-variable)    # calc mean of your file/data
              , SD = sd(my-target-variable)      # calc sd of the data

# iterate with purrr::map := take list of filenames and apply your function to each list entry
## map_dfr() provides a data frame, you can use "only" map() to get a list
## for testing purposes you can truncate the list of filenames with fns[1:3] for the
## first 3 files, other

ds <- fns %>% 
   purrr::map_dfr(.f = my_function)


ds是一个包含 MEAN 和 SD 列的表。
