首页 > 解决方案 > 计算组的百分比

问题描述

我有一组人口数据。我已将数据划分为年龄组和区域。

如何使用以下示例数据计算所有列中每个组和区域内的比例?

area    sex  agegrouping  2011         2012         2013
area1   F    0-4          637.4815661  626.6145185  596.7128164
area1   F    10-14        417.8041418  402.5041888  411.2180838
area1   F    15-19        360.6491372  361.5883403  364.5626384
area1   F    20-24        562.4887445  598.7190796  617.9790937
area1   M    0-4          581.08247    581.11732    556.4439468
area1   M    10-14        408.1015966  379.945334   377.7312704
area1   M    15-19        380.7336397  392.2732017  384.8757803
area1   M    20-24        1089.024655  983.1813181  874.3646633
area2   F    0-4          460.2959017  479.7512631  489.1076221
area2   F    10-14        357.2974721  378.9785589  410.7145251
area2   F    15-19        353.4763328  324.3975914  312.5421936
area2   F    20-24        674.8157905  627.0151556  568.8309423
area2   M    0-4          570.1424505  579.4558621  572.8858648
area2   M    10-14        366.9484728  365.0947588  370.726409
area2   M    15-19        382.3444468  365.0342791  343.5104
area2   M    20-24        645.3627281  624.4575313  577.5540519

我知道我可以逐列手动完成,但有没有办法一次完成(因为完整的数据集到 2050 年)。

数据应如下所示(但包括所有其他年份列和区域):

area   sex  agegrouping  2011.percent
area1  F    0-4          14.36621575
area1  F    10-14        9.415589032
area1  F    15-19        8.127550019
area1  F    20-24        12.67618562
area1  M    0-4          13.09521181
area1  M    10-14        9.196933521
area1  M    15-19        8.5801722
area1  M    20-24        24.54214205

标签: r

解决方案


这是一个dplyr版本:

library(dplyr)

dt = read.table(text = "
area    sex  agegrouping  2011         2012         2013
area1   F    0-4          637.4815661  626.6145185  596.7128164
area1   F    10-14        417.8041418  402.5041888  411.2180838
area1   F    15-19        360.6491372  361.5883403  364.5626384
area1   F    20-24        562.4887445  598.7190796  617.9790937
area1   M    0-4          581.08247    581.11732    556.4439468
area1   M    10-14        408.1015966  379.945334   377.7312704
area1   M    15-19        380.7336397  392.2732017  384.8757803
area1   M    20-24        1089.024655  983.1813181  874.3646633
area2   F    0-4          460.2959017  479.7512631  489.1076221
area2   F    10-14        357.2974721  378.9785589  410.7145251
area2   F    15-19        353.4763328  324.3975914  312.5421936
area2   F    20-24        674.8157905  627.0151556  568.8309423
area2   M    0-4          570.1424505  579.4558621  572.8858648
area2   M    10-14        366.9484728  365.0947588  370.726409
area2   M    15-19        382.3444468  365.0342791  343.5104
area2   M    20-24        645.3627281  624.4575313  577.5540519
", header=T)


dt %>%
  group_by(area) %>%                                 # for each area
  mutate_if(is.numeric, ~./sum(.)) %>%               # calculate percentages for each numeric column
  rename_if(is.numeric, ~gsub("X", "prc_", .)) %>%   # update the names of those columns
  ungroup()                                          # forget the grouping

# # A tibble: 16 x 6
#    area  sex   agegrouping prc_2011 prc_2012 prc_2013
#   <fct> <fct> <fct>          <dbl>    <dbl>    <dbl>
# 1 area1 F     0-4           0.144    0.145    0.143 
# 2 area1 F     10-14         0.0942   0.0930   0.0983
# 3 area1 F     15-19         0.0813   0.0836   0.0871
# 4 area1 F     20-24         0.127    0.138    0.148 
# 5 area1 M     0-4           0.131    0.134    0.133 
# 6 area1 M     10-14         0.0920   0.0878   0.0903
# 7 area1 M     15-19         0.0858   0.0907   0.0920
# 8 area1 M     20-24         0.245    0.227    0.209 
# 9 area2 F     0-4           0.121    0.128    0.134 
# 10 area2 F     10-14         0.0938   0.101    0.113 
# 11 area2 F     15-19         0.0928   0.0866   0.0857
# 12 area2 F     20-24         0.177    0.167    0.156 
# 13 area2 M     0-4           0.150    0.155    0.157 
# 14 area2 M     10-14         0.0963   0.0975   0.102 
# 15 area2 M     15-19         0.100    0.0975   0.0942
# 16 area2 M     20-24         0.169    0.167    0.158 

推荐阅读