首页 > 解决方案 > Group by with percentages and raw numbers

问题描述

I have a dataset that looks like this:

six columns: area (categorical), amount ($), population (categorical), purpose (categorical), purpose2(categorical), county(categorical)

I would like to create a table that groups by area and shows the total amount for the area both as a percentage of total amount and as a raw number, as well as the percent of the total number of records/observations per area and total number of records/observations as a raw number.

The code below works to generate a table of raw numbers but does not the show percent of total:

tabstat amount, by(county) stat(sum count) 

标签: stata

解决方案


没有固定的命令可以做你想做的事。您必须自己对表格进行编程。

这是一个使用的快速示例auto.dta

. sysuse auto, clear
(1978 Automobile Data)

. tabstat price, by(foreign) stat(sum count)

Summary for variables: price
     by categories of: foreign (Car type)

 foreign |       sum         N
---------+--------------------
Domestic |    315766        52
 Foreign |    140463        22
---------+--------------------
   Total |    456229        74
------------------------------

您可以进行计算并将原始数字保存在变量中,如下所示:

. generate total_obs = _N

. display total_obs
74

. count if foreign == 0
  52

. generate total_domestic_obs = r(N)

. count if foreign == 1
  22

. generate total_foreign_obs = r(N)

. egen total_domestic_price = total(price) if foreign == 0 

. sort total_domestic_price
. local tdp = total_domestic_price

. display total_domestic_price
315766

. egen total_foreign_price = total(price)  if foreign == 1

. sort total_foreign_price
. local tfp = total_foreign_price

. display total_foreign_price
140463

. generate total_price = `tdp' + `tfp' 

. display total_price
456229

对于百分比:

. generate pct_domestic_price = (`tdp' / total_price) * 100

. display pct_domestic_price
69.212173

. generate pct_foreign_price = (`tfp' / total_price) * 100 

. display pct_foreign_price 
30.787828

编辑:

这是一种更自动化的方式来执行上述操作,而无需指定单个值:

program define foo

syntax varlist(min=1 max=1), by(string)

generate total_obs = _N
display total_obs

quietly levelsof `by', local(nlevels)

foreach x of local nlevels {
    count if `by' == `x'
    quietly generate total_`by'`x'_obs = r(N)

    quietly egen total_`by'`x'_`varlist' = total(`varlist') if `by' == `x' 
    sort total_`by'`x'_`varlist'
    local tvar`x' = total_`by'`x'_`varlist'
    local tvarall `tvarall' `tvar`x'' +
    display total_`by'`x'_`varlist'
}

quietly generate total_`varlist' = `tvarall' 0 
display total_`varlist'

foreach x of local nlevels {
    quietly generate pct_`by'`x'_`varlist' = (`tvar`x'' / total_`varlist') * 100
    display pct_`by'`x'_`varlist'
}

end

结果是相同的:

. foo price, by(foreign)
74
  52
315766
  22
140463
456229
69.212173
30.787828

您显然需要将结果格式化为您喜欢的表格。


推荐阅读