首页 > 解决方案 > 循环遍历 ar 数据框并将行作为参数传递给函数

问题描述

我想遍历一个数据框并将行作为参数传递给一个函数,以汇总名为 df3 的数据框的总数。

我尝试过使用传统 for 循环的代码,但没有结果。

我在https://adv-r.hadley.nz/functionals.html#pmap中查看了 pmap

但我看不到如何将此示例应用于我的代码。

以下是原始数据中的一些数据:

dput(head(df3,n=3))
structure(list(id = c("81", "83", "85"), look_work = c("yes", 
"yes", "yes"), current_work = c("no", "yes", "no"), hf_l5k = c("", 
"", ""), ac_l5k = c("", "", ""), hf_5_10k = c("", "1", "1"), 
    ac_5_10k = c("", "1", "1"), hf_11_20k = c("", "", ""), ac_11_20k = c("", 
    "", ""), hf_21_50k = c("", "", ""), ac_21_50k = c("", "", 
    ""), hf_51_100k = c("", "", ""), ac_51_100k = c("", "", ""
    ), hf_m100k = c("", "", ""), ac_m100k = c("", "", ""), s_l1000 = c("", 
    "", ""), se_l1000 = c("", "", "1"), s_1001_1500 = c("", "1", 
    "1"), se_1001_1500 = c("", "", ""), s_2001_3000 = c("", "", 
    ""), se_2001_3000 = c("", "1", ""), s_3001_4000 = c("", "", 
    ""), se_3001_4000 = c("", "", ""), s_4001_5000 = c("", "", 
    ""), se_4001_5000 = c("", "", ""), s_5001_6000 = c("", "", 
    ""), se_5001_6000 = c("", "", ""), s_m6000 = c("", "", ""
    ), se_m6000 = c("", "", ""), s_n_ans = c("", "", ""), se_n_ans = c("", 
    "", ""), before_work = c("no", "NULL", "yes"), keen_move = c("yes", 
    "yes", "no"), city_size = c("village", "more than 500k inhabitants", 
    "more than 500k inhabitants"), gender = c("male", "female", 
    "female"), age = c("18 - 24 years", "18 - 24 years", "more than 50 years"
    ), education = c("secondary", "vocational", "secondary")), row.names = c(NA, 
3L), class = "data.frame")

这是参数的数据框 hf_names:

structure(list(hf_names = c("hf_l5k", "hf_5_10k", "hf_11_20k", 
"hf_21_50k", "hf_51_100k", "hf_m100k"), job = c("hf_l5k_job", 
"hf_5_10k_job", "hf_11_20k_job", "hf_21_50k_job", "hf_51_100k_job", 
"hf_m100k_job"), tot = c("hf_l5k_tot", "hf_5_10k_tot", "hf_11_20k_tot", 
"hf_21_50k_tot", "hf_51_100k_tot", "hf_m100k_tot")), class = "data.frame", row.names = c(NA, 
-6L))

这是我尝试使用传统 for 循环的代码:

library(dplyr)

tot_function <- function(df, filter_tot, col_name1, col_name2) {
  # filter desired columns for all jobs
  filter_tot <- df %>% filter(col_name1=="1") %>% 
  summarise(col_name2 = n()) 
}

for (i in seq_along(hf_names3)) {
  tot_function(df3, hf_names3$tot[i], hf_names3$hf_names[i], hf_names3$job[i])

}

预期的结果将是数据框或向量:

hf_l5k_jobs hf_l5_10k_jobs
10               193

但此代码不会生成任何内容,因为它查看的是诸如 trim 和 runif 之类的简单函数。

标签: rdplyrpurrr

解决方案


我不认为你需要把这个复杂化。您可以从中获取名称,从hf_names该列中提取子集df3并计算该列中 1 的数量。

sapply(hf_names$hf_names, function(x) sum(df3[[x]] == 1))

#    hf_l5k   hf_5_10k  hf_11_20k  hf_21_50k hf_51_100k   hf_m100k 
#         0          2          0          0          0          0 

如果您愿意tidyverse,可以更改sapplymap.*变体

purrr::map_int(hf_names$hf_names, ~sum(df3[[.]] == 1))

推荐阅读