首页 > 解决方案 > How do I calculate two separate means for a vector with both characters and numbers?

问题描述

New to R.. I created a BMI variable and pasted it with the sex variable so I can see which BMI is labeled M (male) or F (female). How do I find the mean of group M and group F separately?

I've tried using the substr and gsub functions to get rid of the characters but am not sure that's the solution because then I have no way of knowing whether the BMI value should be calculated as M or F.

edit:

I want to calculate the means for M and F separately to learn how to do simple subgroup analyses. I've been trying to learn R independently, and this particular BMI problem is from an old problem set.

edit:

I see why pasting doesn't work here. I was treating the paste function like proc format in SAS.

set.seed(123)
sex <- sample(x = c("M", "F"), size = 100, replace = TRUE)
height.cm <- rnorm(n = 100, mean = ifelse(sex == "M", 175, 163), sd = 3)
weight.kg <- -110 + height.cm * 1.1 + rnorm(n = 100, sd = 7)
waist.in <- -20 + ifelse(sex == "M", 35, 33) + 0.5 * weight.kg + rnorm(n = 100, sd = 2.5)

bmi <- weight.kg / (height.cm / 100) ^ 2

bmi_sex <- paste(sex, bmi, sep = "")

标签: r

解决方案


您可以根据bmisex变量计算平均值

tapply(bmi, sex, mean)

#       F        M 
#25.81020 27.14678 

不需要bmi_sex变量来计算,但如果这是我们获取输入的方式,我们可以使用一些正则表达式来分隔性别和实际 bmi 值

tapply(as.numeric(sub(".(.*)", "\\1", bmi_sex)), sub("(.).*", "\\1", bmi_sex), mean)

#       F        M 
#25.81020 27.14678 

我们可以编写一个函数来一次获取一种性别的 bmi 值。

get_bmi <- function(bmi, sex, select_sex) {
   mean(bmi[sex == select_sex], na.rm = TRUE)
}

然后调用它

get_bmi(bmi, sex, "F")
#[1] 25.8102
get_bmi(bmi, sex, "M")
#[1] 27.14678

推荐阅读