r - Group_by and summarize behave strangely and do not provide expected results
问题描述
While having used dplyr before, I've run into problems that I do not sufficiently understand at the moment.
The part of a research data set I am working with has +2500 different rows. These rows are different respondents of 515 houses from a study.
I want to summarize the number of years the respondent has spent in school (column [, 7]) and group it by the house id (column [, 26]). Average for all of the school years is 3.65 (sample was taken in Uganda).
Now, when I run the following code:
library(dplyr)
df_house %>%
dplyr::group_by(House = df_house[, 26]) %>%
dplyr::summarise(Avg_school = mean(df_house[,7], na.rm = TRUE))
I get the following result:
A tibble: 510 x 2
House Avg_school
<dbl> <dbl>
1 1 3.65
2 2 3.65
3 3 3.65
4 4 3.65
5 5 3.65
6 6 3.65
7 7 3.65
8 8 3.65
9 9 3.65
10 10 3.65
# ... with 500 more rows
I have two issues with this: First, obviously summarize does not summarize over the mean of each house_id. Second, I only get 510 groups instead of the expected 515 different houses.
I have looked at the class() and typeof() functions to make sure that they are both numeric and double.
Has anybody any idea why group_by and summarize behave that way?
解决方案
Right answer was provided by @Ronak Shah. It was indeed the use of the column numbers instead of the names that prevented it from working properly.
推荐阅读
- python - 将熊猫数据框加载到 SQL Server 2012
- java - Porterduff 模式(乘法)
- c - C pthread 包括头文件并与 -nolibc 链接
- java - 从 url 下载 .svg 图片
- python - Python中数据框中的计算
- laravel - 在 Laravel 中仅加载某些语言文件(数组)
- javascript - (VUE.js)在组件外部调用时事件不起作用
- firebase - 如何在 Xamarin 表单中保存 Firebase 实时数据库中的 URL?
- java - MainMenu.onCreateOptionsMenu 崩溃
- python - 有没有办法获得 Cx_Oracle.cursor.execute() 的语句执行的 ddl 语句输出?