首页 > 解决方案 > CSV 文件数据组织(结构化数据)

问题描述

我一直在尝试从这个 csv 文件中提取数据并以一种我可以更清楚地查看数据的方式对其进行组织。目标是创建 2 个字典。一种保存来自 csv 中列出的区域的数据。另一个在 csv 中保存来自国家/地区的数据。我在循环数据时遇到问题。csv 文件首先开始列出所有区域。直到“ID”列达到第 4 位,国家才开始需要帮助来组织它。到目前为止我有这个。但我仍然需要帮助根据地区和国家组织它。csv 文件的链接是: https ://docs.google.com/document/d/1v68_QQX7Tn96l-b0LMO9YZ4ZAn_KWDMUJboa6LEyPr8/edit?usp=sharing

import csv

f = open('dph_SYB60_T03_Population Growth, Fertility and Mortality Indicators.csv')

reader = csv.DictReader(f)

data_by_region = {}
data_by_country = {}
answers = []


for line in reader:
  #Collects all the region names
  regions = line['Region/Country/Area'] 
  # Gets All the Years
  years = line['Year']
  # print(regions)



  if regions not in data_by_region:
    data_by_region[regions] = {}

标签: pythonpython-3.xcsv

解决方案


也许这会有所帮助:

import csv

f = open('dph_SYB60_T03_Population Growth, Fertility and Mortality Indicators.csv', encoding='utf-8-sig')

reader = csv.DictReader(f)

data_by_region = {}
data_by_country = {}
answers = []

for line in reader:
    # Collects all the region names
    regions = line['Region/Country/Area']
    # Gets All the Years
    years = line['Year']
    # print(regions)

    if regions not in data_by_region:
        data_by_region[regions] = [line]
    else:
        data_by_region[regions].append(line)

# print data count group by regions.
for region, data_list in data_by_region.items():
    print('{:>30s}: {} rows.'.format(region, len(data_list)))

输出:

 Total, all countries or areas: 21 rows.
                        Africa: 18 rows.
               Northern Africa: 21 rows.
            Sub-Saharan Africa: 21 rows.
                Eastern Africa: 18 rows.
                 Middle Africa: 18 rows.
               Southern Africa: 18 rows.
                Western Africa: 18 rows.
              Northern America: 18 rows.
...

推荐阅读