python - 按年龄将人员分配到不同家庭的代码
问题描述
我正在生成一个合成人口,其中每个家庭的每种规模和年龄构成的家庭数量都是已知的。我正在尝试按年龄将人员分配到这些家庭中的每一个。
每个年龄组的总人数(列总和)应总计为
Children 45196
Adult 148949
Senior 12195
而每个规模(1-20)的家庭总人数应为
1 2276
2 9366
3 23739
4 47636
5 42475
6 28338
7 3675
8 3728
9 3672
10 3830
11 3894
12 3792
13 3770
14 3710
15 3795
16 3648
17 3672
18 3744
19 3800
20 3780
我试图在 python 中将其编码为一组线性方程。但是,负面解决方案的存在无济于事。代码如下,如何修改生成不同家庭各年龄组的总人数。
# Total Population
Population = 206340
# Number of Children, Adults and Seniors
Demography = np.array([ 45196, 148949, 12195])
# Number of households by size
Household_size_distribution = np.array([2276, 4683, 7913,11909,8495,4723,525,466,408,383,354,316,290,265,253,228,216,208,200,189])
# Probability that a person of a certain age group belongs to a household of a certain size
Age_Composition = np.array([[7.000e-04, 7.702e-01, 2.291e-01],
[1.890e-02, 8.066e-01, 1.745e-01],
[1.486e-01, 8.027e-01, 4.870e-02],
[2.519e-01, 7.180e-01, 3.010e-02],
[2.732e-01, 6.719e-01, 5.490e-02],
[3.046e-01, 6.337e-01, 6.170e-02]])
# Store Age compositions
x = np.zeros((20,3),dtype=np.float)
x[:5,:] = Age_Composition[:5,:]
# Age composition same for households with more than 6 persons
x[5:20,:] = np.repeat(Age_Composition[5][np.newaxis,:], 15,0)
# Normalize the age compositions column-wise: Children, Adults and Seniors
y = np.zeros((20,3),dtype=np.float)
y[:,0] = x[:20,0]/np.sum(x[:20,0])
y[:,1] = x[:20,1]/np.sum(x[:20,1])
y[:,2] = x[:20,2]/np.sum(x[:20,2])
# Store Coefficients of 60 variables
w = np.zeros((23,60),dtype=np.float)
w[:20,:3] = x
z = np.zeros((20,3),dtype=np.float)
z[:,0] = y[:,0]
w[20] = np.reshape(z,(60))
z = np.zeros((20,3),dtype=np.float)
z[:,1] = y[:,1]
w[21] = np.reshape(z,(60))
z = np.zeros((20,3),dtype=np.float)
z[:,2] = y[:,2]
w[22] = np.reshape(z,(60))
rollnumber = np.arange(0,60,step=3)
for i in range(20):
w[i]=np.roll(w[i],rollnumber[i])
# Ax=B
A = w
B = np.zeros((23,1),dtype=np.int)
B[:20,:] = new_Household_size_distribution[:,None]*np.arange(1,21)[:,None]
B[20:,:] = Demography[:,None]
# Solution to the set of linear equations
f=np.matmul(np.linalg.pinv(A),B)
f=np.reshape(f,60)
解决方案
由于您想根据分布生成合成数据,我认为最好的办法是使用一些随机方法,而不是尝试通过线性代数来解决它。
要生成的数据大小为206340
,因此很容易放入内存中。
构造数据样本以使类别频率为 的一种简单方法i
是将元素f[i]
的属性设置为。我们可以使用 numpy as 轻松地做到这一点f[i]
i
start = 0
for i, fi in enumerate(f):
category[start:start+fi] = i
start += fi
此外,频率对于排列是不变的。
所以我们能做的就是把人分到户,把n
人分到大小户,n
把ppl_by_household_size[i]
人分到大小户i+1
。然后我们将年龄组相应地分配给ppl_by_age_group
,最后我们打乱 age_group 数组。
这将给出一个随机样本,其中该人i
居住在年龄组age_group[i]
中,并且居住在具有 index 的家庭中household_id[i]
,具有 size household_size[i]
,从这些数组中,您可以轻松地将数据转换为所需的格式。
ppl_by_age_group = [45196, 148949, 12195]
ppl_by_household_size = np.array([
2276,9366,23739,47636,42475,28338,3675,3728,3672,3830,
3894,3792,3770,3710,3795,3648,3672,3744,3800,3780])
# check if a solution exists
assert np.all(ppl_by_household_size % np.arange(1, len(ppl_by_household_size)+1)) == 0)
assert np.sum(ppl_by_household_size) == np.sum(ppl_by_age_group)
nppl = np.sum(ppl_by_age_group)
age_group = np.zeros(nppl, np.int8)
household_size = np.zeros(nppl, np.int8)
household_id = np.zeros(nppl, np.int32)
cppl = 0
nhouseholds = 0
for i,n in enumerate(ppl_by_household_size):
# this will assign ppl_by_household_size[i] to
# ppl_by_holsehold_size[i] // (i + 1) households
# so, each household will have (i + 1) members
household_id[cppl:cppl+n] = nhouseholds + np.arange(n) // (i+1)
household_size[cppl:cppl+n] = i+1
cppl += n
nhouseholds += n // (i+1)
cppl = 0
for i,n in enumerate(ppl_by_age_group):
# this will assign ppl_by_household[i] members to
# age group i
age_group[cppl:cppl+n] = i
cppl += n
# shuffle the age groups
age_group = np.random.permutation(age_group)
在这里它在 5 毫秒内运行
在现实世界中,我不希望年龄组和家庭规模是独立的,但这是另一个主题。
推荐阅读
- python - 在 Python/Pygame 中找到具有给定角度的适当 X/Y 坐标修改器
- javascript - 在 Ajax 之后在 javascript 中显示单选按钮值
- javascript - 将属性添加到 array.reduce() 返回对象的顺序
- java - Play Async 动作中的并行独立休息调用
- python - 这是正确的格式吗
- excel - Excel - 如何查找 4 列的所有可能组合 - (无 VBA)
- excel - 将 SUMPRODUCT 公式转换为 INDEX+MATCH 的问题
- c# - 如何从 asp.net core 3.0 的 UseRouting(...) 委托处理程序返回对象?
- python-3.x - 当我手动写入时,如何让 RadioSelect 遍历特定字段的每个选项?
- npm - Node-sass,node-gyp 错误 npm install error python 3.7 default env in debian linux_x64