r - R:如何在我的数据上使用用户定义的函数
问题描述
我正在使用 R 进行数据分析,但在编码方面存在一些问题。我创建了自己的函数来创建频率表并将其应用于我的数据中的变量,但 R 显示错误消息。
谁能给我任何解决方案,为什么它不起作用?
> str(diabetes)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 56632 obs. of 30 variables:
$ ID : chr "A308059801" "A308059802" "A308120201" "A308120202" ...
$ year : num 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
$ region : num 1 1 1 1 1 1 1 1 1 1 ...
$ sex : num 1 2 1 2 2 1 2 1 2 1 ...
$ age : num 61 54 33 33 4 65 59 54 49 18 ...
$ edu : chr "3.000000" "2.000000" "3.000000" "4.000000" ...
$ occp : chr "5.000000" "3.000000" "4.000000" "1.000000" ...
$ marri_1 : 'labelled' num 1 1 1 1 2 1 1 1 1 2 ...
..- attr(*, "label")= chr "Marriage Y/N"
$ marri_2 : 'labelled' num 1 1 1 1 8 1 1 1 1 8 ...
..- attr(*, "label")= chr "Marriage status"
$ tins : 'labelled' num 10 20 10 10 10 20 20 10 10 10 ...
..- attr(*, "label")= chr "Insurance registration"
$ D_1_1 : 'labelled' chr "3.000000" "2.000000" "2.000000" "3.000000" ...
..- attr(*, "label")= chr "Self-report health status"
$ DI1_dg : 'labelled' chr "1.000000" "8.000000" "8.000000" "8.000000" ...
..- attr(*, "label")= chr "HBP diagnosis"
$ DI1_pr : 'labelled' chr "1.000000" "8.000000" "8.000000" "8.000000" ...
..- attr(*, "label")= chr "HBP current status"
$ DI1_pt : 'labelled' chr "1.000000" "8.000000" "8.000000" "8.000000" ...
..- attr(*, "label")= chr "HBP care"
$ DE1_dg : 'labelled' chr "8.000000" "8.000000" "8.000000" "8.000000" ...
..- attr(*, "label")= chr "Diabetes diagnosis"
$ DE1_pr : 'labelled' chr "8.000000" "8.000000" "8.000000" "8.000000" ...
..- attr(*, "label")= chr "Diabetes status"
$ DE1_pt : 'labelled' chr "8.000000" "8.000000" "8.000000" "8.000000" ...
..- attr(*, "label")= chr "Diabetes cure"
$ HE_DMdg : 'labelled' chr "0.000000" "0.000000" "0.000000" "0.000000" ...
..- attr(*, "label")= chr "Diabetes doctor diagnosis"
$ HE_BMI : 'labelled' chr "26.177198" "22.807647" "26.562865" "20.863743" ...
..- attr(*, "label")= chr "BMI"
$ HE_DM : 'labelled' chr "2.000000" "3.000000" "1.000000" "1.000000" ...
..- attr(*, "label")= chr "With diagnosis(over 19 year-old)"
$ LQ4_07 : 'labelled' chr "8.000000" "8.000000" "8.000000" "8.000000" ...
..- attr(*, "label")= chr "Barries for physical activity - diabetes"
$ HE_DMfh1 : 'labelled' chr "0.000000" "0.000000" "9.000000" "1.000000" ...
..- attr(*, "label")= chr "Father with diagnosis"
$ HE_DMfh2 : 'labelled' chr "1.000000" "0.000000" "9.000000" "0.000000" ...
..- attr(*, "label")= chr "Mother with diagnosis"
$ HE_DMfh3 : 'labelled' chr "0.000000" "0.000000" "9.000000" "0.000000" ...
..- attr(*, "label")= chr "Sibling with diagnosis"
$ HE_glu : 'labelled' chr "124.000000" "141.000000" "92.000000" "88.000000" ...
..- attr(*, "label")= chr "Diabetes indicator - glucose level"
$ BE5_1 : 'labelled' chr "1.000000" "1.000000" "1.000000" "1.000000" ...
..- attr(*, "label")= chr "Muscle training frequency"
$ LQ4_04 : 'labelled' chr "8.000000" "8.000000" "8.000000" "8.000000" ...
..- attr(*, "label")= chr "Barriers for physical activity - Have heart disease"
$ DF2_dg : 'labelled' chr "8.000000" "8.000000" "8.000000" "8.000000" ...
..- attr(*, "label")= chr "Diagnosed with depression"
$ HE_IHDfh1: 'labelled' chr "0.000000" "0.000000" "9.000000" "0.000000" ...
..- attr(*, "label")= chr "Diagnosed with Ischaemic heart disease"
$ HE_HP : 'labelled' chr "3.000000" "3.000000" "2.000000" "1.000000" ...
..- attr(*, "label")= chr "Hypertension Status (three levels)"
freq_table <- function (y) {
d <- select (y) %>% group_by (y) %>% summarise (n = n ()) %>% mutate (freq = n / sum (n))
}
lapply(diabetes$marri_1, freq_table)
解决方案
选择函数位于管道的开头,至少需要两个参数,您可以将数据框的名称添加到参数函数另外,因为 y 存储在变量中,所以在使用dplyr
动词时必须通过添加!!
before取消引用它它。
library(tidyverse)
# add df as an argument and add it before the select
freq_table <- function (df,y) {
d <- df %>% select (!! y) %>% group_by (!! y) %>% summarise (n = n ()) %>% mutate (freq = n / sum (n))
}
freq_table(diabetes,"marri_1")
或者以更简单的方式你可以做
tab <- table(diabetes$marri_1)
tab <- as.data.frame(tab)
names(tab) <- c("marri_1","n")
tab$freq <- tab$n /sum(tab$n)
这是你要找的吗?
推荐阅读
- here-api - HERE API - 如何根据 GPS 坐标提取有关道路的所有信息
- html - 如何在 Django 中保存 POST 数据?
- php - PhpMailer 附件与 Verot class.upload.php 调整大小
- javascript - 如何在d3中的每个破折号上创建一条带箭头的虚线?
- javascript - 拖放角js后克隆一个元素
- macos - 我无法在 Mac OS Catalina (Haskell) 中安装库光泽
- python - 我正在尝试编写一个简单的代码来计算税收并在您购物时保持运行总额
- tensorflow - 如何通过对象检测在谷歌云视觉上训练边缘 tpu?
- r - 如何在 R 中使用 lapply 替换嵌套循环?
- python - 使用 Django 和 openpyxl 提供文件时损坏的 excel 文件