首页 > 解决方案 > Python - 创建一个正态分布列矩阵,其中每行总和为 1

问题描述

我正在使用熊猫处理一些选举数据。我想知道如果 A 党不存在,A 党的选票将如何转移到 650 个席位中的 B 党和 C 党。

我们假设我们在全国范围内都知道:

我希望为每个座位生成介于 0 和 1 之间的数字的正态分布,其中:

  1. 每行之和为1
  2. to_B 列的平均值为 0.48
  3. to_C 列的平均值为 0.32
  4. to_dnvis 列的平均值 0.2

以完全独立的数字为例:

座位 to_B to_C to_dnv
1 0.5 0.3 0.2
2 0.1 0.6 0.3
3 0.3 0.3 0.4
... ... ... ...
650 ETC ETC ETC

在本手册示例中:

动机是这样的,后来我将此表与我已经建立的另一个单独的表压缩在一起,其中包含每个席位的选举结果。然后我将使用这些正态分布的数字将 A 方的选票重新分配给 B、C 和 DNV。

生成这样一个矩阵的最佳方法是什么?最好在熊猫中。

到目前为止的代码:

# -*- coding: utf-8 -*-
"""
Created on Tue May 11 20:23:45 2021

@author: Josh
"""

## IMPORTS

import numpy
import pandas
import sys
import re

## README 
'''
Prerequirement: Create a folder called Data and put it in the same folder as this script. Download the 2019 results csv ("HoC-GE2019-results-by-constituency-csv HoC-GE2019-results-by-constituency-csv (126 KB, Excel Spreadsheet)(126 KB, Excel Spreadsheet)" from the Commons Library and place it in the Data folder: https://commonslibrary.parliament.uk/research-briefings/cbp-8749/ 
'''

'''
Steps:
    1. Read in 2019 results to a raw numpy array
    2. Construct a normal distrubution of 650 ways to redistribute LibDem vote share, 
        around 20% do not transfer
        around 48% transfer to Labour
        around 32% transfer to Conservatives 
    3. Zip this redistribution together with the 2019 results. Compare number of seats before and after to see how seats would change.
    4. Convert to voter percentage array
    5. Run this simulation 1000 times, find the average seat numbers 
'''

# Config
numpy.set_printoptions(threshold=sys.maxsize)

pandas.set_option('display.max_rows', sys.maxsize)
pandas.set_option('display.max_columns', sys.maxsize)
pandas.set_option('display.width', sys.maxsize)

# Variables
parties = ["con", "lab", "ld", "brexit", "green", "snp", "pc", "dup", "sf", "sdlp", "uup", "alliance", "other"]

raw_results_path = 'Data/HoC-GE2019-results-by-constituency-csv.csv'
dtype_dic = {party: int for party in parties}

## STEP 1
raw_results = pandas.io.parsers.read_csv(raw_results_path, dtype = dtype_dic)#.values

## STEP 2
# ??

## STEP 3
# ??

## STEP 4
# Calculate vote shares of each party

for party in parties:
    raw_results["Share_" + party] = raw_results[party] / raw_results["valid_votes"]

print(raw_results)

标签: pythonpython-3.xpandasmath

解决方案


听起来np.random.dirichlet可以在这里工作:

import numpy as np
import pandas as pd

# Set Seed for reproducibility (Remove if different randoms are needed)
np.random.seed(5)
# Create Dirichlet Distribution
a = np.random.dirichlet((.48, .32, .2), size=600)

df = pd.DataFrame(a, columns=['to_B', 'to_C', 'to_dnv'])
df = df.rename_axis('seat')

框架


print(df)

          to_B      to_C    to_dnv
seat                              
0     0.553653  0.092384  0.353963
1     0.970484  0.029512  0.000005
2     0.897923  0.040126  0.061951
3     0.937764  0.052244  0.009991
4     0.123293  0.000047  0.876660
...        ...       ...       ...
595   0.430808  0.017738  0.551454
596   0.000072  0.034152  0.965775
597   0.616199  0.290054  0.093747
598   0.922872  0.075728  0.001400
599   0.190437  0.756399  0.053163

[600 rows x 3 columns]

行总和

print(df.sum(axis=1))

seat
0      1.0
1      1.0
2      1.0
3      1.0
4      1.0
      ... 
595    1.0
596    1.0
597    1.0
598    1.0
599    1.0
Length: 600, dtype: float64

列均值

print(df.mean(axis=0))

to_B      0.473463
to_C      0.317920
to_dnv    0.208617
dtype: float64

推荐阅读