首页 > 解决方案 > 创建数据框时在循环期间遇到错误

问题描述

我目前正在研究世界宗教数据,并希望组织一个数据框,它给我['国家名称','国家最信奉的宗教','信仰人数'],但是,我遇到了一条错误消息。下面是我的代码。

'''

import pandas as pd
import geopandas
import matplotlib.pyplot as plt
import mapclassify
import pyproj
from pyproj import Proj
from matplotlib.patches import Ellipse, Polygon
import datetime
import numpy as np

countries = geopandas.read_file('../data/world/ne_admin_0_countries.geojson')
hse_size = pd.read_csv('../data/world/houseshold_size_2018.csv', skiprows=4, header=0)
rlgn_adhere = pd.read_csv('../data/world/WRP_national.csv', header=0)
religion_cat = []

rlgn_adhere_top = list(rlgn_adhere.columns.values)
for i in range(3,38):
    religion_cat.append(rlgn_adhere_top[i])

country_rlgn_adhere = rlgn_adhere.groupby(['name'], as_index=False)
lastest_rlgn_adhere = country_rlgn_adhere['year'].max()
country_latest_adhere = lastest_rlgn_adhere.merge(rlgn_adhere, on=['year', 'name'], how='left')
col_latest_rlgn_pop = ['year', 'name'] + religion_cat
latest_rlgn_pop = country_latest_adhere[col_latest_rlgn_pop]

pop_rlgn = ''
pop_rlgn_cat_num = pd.DataFrame(columns=['name', 'Country Most Adhered Religion', 'Number of Adherence'])

for x in latest_rlgn_pop['name']:
    maximum = 0
    a = pd.DataFrame()
    a = latest_rlgn_pop[latest_rlgn_pop['name'] == x]
    for y in religion_cat:
        b = pd.Series([])
        b = a[str(y)]
        print(b[0])
        if np.invert(np.isnan(b[0])):
            b = int(b[0])
            if (b > maximum):
                maximum = b
                pop_rlgn = y
    a.insert(0,"Number of Adherence", maximum)
    a.insert(0,"Country Most Adhered Religion", pop_rlgn)
    pop_rlgn_cat_num = pop_rlgn_cat_num.append(a[['name', 'Number of Adherence', 'Country Most Adhered Religion']],sort=True)
latest_rlgn_pop = pd.merge(latest_rlgn_pop, pop_rlgn_cat_num, on=['name'])

country_hse_size = hse_size.groupby(['Country or area'], as_index=False)
latest_size = country_hse_size['Reference date (dd/mm/yyyy)'].max()
avg_hse_size = hse_size[['Country or area', 'Reference date (dd/mm/yyyy)', 'Average household size (number of members)']]
country_latest_size = latest_size.merge(avg_hse_size, on=['Country or area','Reference date (dd/mm/yyyy)'], how='left')
country_latest_size = country_latest_size.dropna()
country_latest_size_unique = country_latest_size.groupby(['Country or area'], as_index=False)
country_latest_size_unique = country_latest_size_unique['Average household size (number of members)'].mean()



countries = countries[['ADMIN', 'geometry']]
countries.columns = ['Country or area', 'geometry']

countries_household_size = countries.merge(country_latest_size_unique, on='Country or area', how='left')

'''

在我的嵌套 for 循环中,当第二次运行第 36 行 'print(b[0])' 时,控制台中出现了一条错误消息:

Traceback (most recent call last):
  File "C:\Users\USER\Anaconda3\envs\MaCT\lib\site-packages\pandas\core\series.py", line 1068, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Users\USER\Anaconda3\envs\MaCT\lib\site-packages\pandas\core\indexes\base.py", line 4730, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 992, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

我还没有找到有关此错误消息的线索,有人可以帮我吗?谢谢。

这是我一直在使用的数据集的链接: Country_Household_Religions_Dataset

标签: pythonpandasdataframe

解决方案


推荐阅读