r - R中的分类“网格”
问题描述
我正在使用 R 编程语言。假设我有以下数据:
library("dplyr")
df <- data.frame(b = rnorm(100,5,5), d = rnorm(100,2,2),
c = rnorm(100,10,10))
a <- c("a", "b", "c", "d", "e")
a <- sample(a, 100, replace=TRUE, prob=c(0.3, 0.2, 0.3, 0.1, 0.1))
a<- as.factor(a)
df$a = a
> head(df)
b d c a
1 3.1316480 0.5032860 4.7362991 a
2 4.3111450 -0.1142736 -0.5841322 c
3 2.8291346 3.6107839 16.0684492 a
4 14.2142245 4.9893987 -1.8145138 a
5 -6.7381302 0.0416782 -7.7675387 c
6 0.4481874 0.3370716 17.4260801 a
我还有以下函数(“my_subset_mean”),它在给定特定输入选择的情况下评估“列 c”的平均值:
my_subset_mean <- function(r1, r2, r3){
subset <- df %>% filter(a %in% r1, b > r2, d < r3)
return(mean(subset$c))
}
my_subset_mean(r1 = c("a", "b"), r2 = 5, r3 = 1 )
[1] 5.682513
我的问题:我正在尝试以“r1”、“r2”和“r3”的随机组合来评估函数“my_subset_mean”。例如:
my_subset_mean(r1 = c("a", "b"), r2 = 5, r3 = 1 )
[1] 11.46365
my_subset_mean(r1 = c("a", "b"), r2 = 5, r3 = 1 )
[1] 11.46365
my_subset_mean(r1 = c("a"), r2 = 2, r3 = 0 )
[1] 14.59809
my_subset_mean(r1 = c("a", "b", "c"), r2 = 3.1, r3 = 0 )
[1] 11.26508
#I am not sure how to get this one to work (i.e. ignore "r1" all together and only calculate the mean using r2 and r3)
my_subset_mean(r1 = "NA", r2 = 3.1, r3 = 0 )
[1] NaN
etc.
是否可以制作一个“网格”,其中包含“r2”和“r3”的随机值(例如,“r2”和“r3”的随机值介于 0 和 5 之间)以及“r1”的随机子集(例如“a ", "c, d", "b, a, e", "d"):
> head(my_grid)
r2 r3 r1
1 3.1316480 0.5032860 a, b
2 4.3111450 -0.1142736 c, d, e
3 2.8291346 3.6107839 a
4 14.2142245 4.9893987 b, e
5 -6.7381302 0.0416782 NA
6 0.4481874 0.3370716 e
然后在“my_grid”的每一行评估“my_subset_mean”?例如
#desired result
> head(final_answer)
r2 r3 r1 my_subset_mean
1 3.1316480 0.5032860 a, b 0.3
2 4.3111450 -0.1142736 c, d, e 0.1
3 2.8291346 3.6107839 a 0.55
4 14.2142245 4.9893987 b, e 0.6
5 -6.7381302 0.0416782 NA 0.51
6 0.4481874 0.3370716 e 0.16
如果不涉及“因子变量”,我想我可以用一个迭代的“for循环”来做到这一点。但我不确定如何使用“my_grid”“喂养”函数(“my_subset_mean”)。有人可以告诉我如何做到这一点吗?
谢谢!
解决方案
我认为这段代码可能会对您有所帮助
library(tidyverse)
r1_sim <- c("a", "b", "c", "d", "e")
r2_sim <- seq(0,1,.2)
r3_sim <- seq(0,1,.2)
expand_grid(r1 = r1_sim,r2 = r2_sim, r3 = r3_sim) %>%
rowwise() %>%
mutate(my_subset_mean(r1,r2,r3))
# A tibble: 180 x 4
# Rowwise:
r1 r2 r3 `my_subset_mean(r1, r2, r3)`
<chr> <dbl> <dbl> <dbl>
1 a 0 0 16.5
2 a 0 0.2 12.9
3 a 0 0.4 12.9
4 a 0 0.6 12.9
5 a 0 0.8 12.9
6 a 0 1 13.4
7 a 0.2 0 16.5
8 a 0.2 0.2 12.9
9 a 0.2 0.4 12.9
10 a 0.2 0.6 12.9
# ... with 170 more rows
推荐阅读
- sql-server - 确定调用 DDL 触发器的操作类型
- java - (MQTT 消息)使用时间块进行调度
- c++ - 如何为小部件设置边框,继承自 QFrame?
- java - 如何使用 mongoDB Java 驱动程序 3.4+ 或 3.6+ 避免异常过早到达流的末尾?(插入时)
- python - 为什么 __init__.py 中的方法无法访问
- javascript - 更改字段值后重新加载页面 [VUE.JS]
- java - 片段启动上的晶圆厂位置与他在预览版 android 上的位置不同
- javascript - 我的求和函数错误并且什么也不显示
- react-native - 反应标签的导航标题
- java - 如果不存在则插入整数,如果 Firebase 实时数据库 Android 中已存在则递增