首页 > 解决方案 > R按组/循环功能计数并输出到csv

问题描述

我有一个包含用户数据的数据框:

age = c(45, 21, 32, 33, 46)
gender = c('female', 'female', 'male', 'male', 'female')
income = c('low', 'low', 'medium', 'high', 'low')
education = c('high', 'high', 'high', 'medium', 'medium')

df = data.frame(age, gender ,income, education)

从这里我想获得一个清晰的列表,其中包含每个属性的计数和份额,然后我将附加到一个表/csv 中,该表应该更清晰,以便进一步使用,而不是一个正常运行的数据框。对于一个类似这样的属性:

nusers = nrow(users)
df = count(users, gender)
df['sot']=df['n']/totuser
write.table(df,'stat.csv',sep=';', row.names = FALSE, append = T)

多个属性需要以下结果:

gender,n,sot
female,10,0.526315789
male,9,0.473684211
income,Freq,sot
low,4,0.210526316
medium,10,0.526315789
high,5,0.263157895
education,Freq,sot
low,8,0.421052632
medium,1,0.052631579
high,10,0.526315789

我(不是很熟练)尝试将其放入循环中失败了。我最好怎么做?

标签: rloopscount

解决方案


您可以sink()为此使用:

library(dplyr)
n_gen <- df %>% group_by(gender) %>% summarise(Feq = n(), sot = n()/nrow(df))
n_inc <- df %>% group_by(income) %>% summarise(Feq = n(), sot = n()/nrow(df))
n_edu <- df %>% group_by(education) %>% summarise(Feq = n(), sot = n()/nrow(df))

sink('export.csv')

write.csv(n_gen, row.names = F)
write.csv(n_inc, row.names = F)
write.csv(n_edu, row.names = F)

sink()

您可以缩短它并将其写入 for 循环。取决于您有多少列(在 df 中)可能是首选


推荐阅读