首页 > 解决方案 > 如何从csv中的多个文件计算平均值

问题描述

在 python 中使用这个选项可以计算多个 csv 文件的平均值

如果 file1.csv 到 file100.csv 都在同一个目录中,你可以使用这个 Python 脚本:

#!/usr/bin/env python3

N = 100
mean_sum = 0
std_sum = 0
for i in range(1, N + 1):
    with open(f"file{i}.csv") as f:
        mean_sum += float(f.readline().split(",")[1])
        std_sum += float(f.readline().split(",")[1])

print(f"Mean of means: {mean_sum / N}")
print(f"Mean of stds: {std_sum / N}")

怎么可能在 R 中实现它?

标签: pythonr

解决方案


“一切都可以编码”,埃里克 :)

如果您不提供最小的可重现示例并描述您迄今为止尝试的内容以及您的问题所在,那么很难提供帮助。

以下是基于{tidyverse}; 一组可以很好地协同工作的软件包。我写的差不多pseudo-code应该能让你继续前进。显然,您将不得不适应、重命名以适合您的项目/变量名称等。

祝你好运:

library(readr)     # package to read tabular data
library(dplyr)     # main working horse to crunch data
library(purrr)     # functional programming for iterations/loops

pth <- "my-data-folder"    # provide path to your data

# create a list of file names in your folder
## you may need to fine-tune the regular pattern to select the files you look for
## full.names gives you the path/name of your data files
## \\.csv is the way to "escape" the dot of the csv type ending

fns <- list.files(path = pth, pattern = "*file.*\\.csv", full.names = TRUE)

# write a function that reads the file and calculates your stats
## you can "summarise" stats over a table

my_function <- function(.fn){
  df <- read_csv(.fn)     # read the file
  df <- df %>% 
    summarise(MEAN = mean(my-target-variable)    # calc mean of your file/data
              , SD = sd(my-target-variable)      # calc sd of the data
}

# iterate with purrr::map := take list of filenames and apply your function to each list entry
## map_dfr() provides a data frame, you can use "only" map() to get a list
## for testing purposes you can truncate the list of filenames with fns[1:3] for the
## first 3 files, other

ds <- fns %>% 
   purrr::map_dfr(.f = my_function)

ds

ds是一个包含 MEAN 和 SD 列的表。


推荐阅读