首页 > 解决方案 > 需要帮助通过 Python 构建分层数据

问题描述

我的 CSV 文件内容如下:

name_1,dept_code_1,name_2,dept_code_2,name_3,dept_code_3
ABC,CODE1,ABC CHILD 1,CODE1-1,,
ABC,CODE1,ABC CHILD 1,CODE1-1,ABC CHILD 1-1-1,CODE1-1-1
ABC,CODE1,ABC CHILD 1,CODE1-1,ABC CHILD 1-1-2,CODE1-1-2
ABC,CODE1,ABC CHILD 2,CODE1-2,ABC CHILD 1-2-1,CODE1-2-1
ABC,CODE1,ABC CHILD 2,CODE1-2,ABC CHILD 1-2-2,CODE1-2-2
XYZ,CODE2,XYZ CHILD,CODE2-2,,

我想用这个逻辑构建分层数据:

CSV 文件可能有更多列:name_4,dept_code_4 ... name_n,dept_code_n

上述 CSV 的预期输出:

ABC,CODE1,,,,,OK
ABC,CODE1,ABC CHILD 1,CODE1-1,,,OK
ABC,CODE1,ABC CHILD 1,CODE1-1,ABC CHILD 1-1-1,CODE1-1-1,OK
ABC,CODE1,ABC CHILD 1,CODE1-1,ABC CHILD 1-1-2,CODE1-1-2,OK
ABC,CODE1,ABC CHILD 2,CODE1-2,,,OK
ABC,CODE1,ABC CHILD 2,CODE1-2,ABC CHILD 1-2-1,CODE1-2-1,OK
ABC,CODE1,ABC CHILD 2,CODE1-2,ABC CHILD 1-2-2,CODE1-2-2,OK
XYZ,CODE2,,,,,OK
XYZ,CODE2,XYZ CHILD,CODE2-1,,,OK

有人会指导我通过 Python 执行此操作的最佳实践吗?

这是我正在尝试的代码:

import os
import csv

from io import StringIO

with open(os.path.join('', 'test.csv')) as csv_file:
    content = StringIO(csv_file.read())
    csv_rows = csv.DictReader(content, delimiter=',')
    final_data = {}
    for row in csv_rows:
        if row['name_1']:
            if row['name_1'] not in final_data:
                final_data[row['name_1']] = {}
                final_data[row['name_1']]['dept_code_1'] = row['dept_code_1']
        if row['name_2']:
            if row['name_2'] not in final_data[row['name_1']]:
                final_data[row['name_1']][row['name_2']] = {}
                final_data[row['name_1']][row['name_2']]['dept_code_2'] = row['dept_code_2']
        if row['name_3']:
            if row['name_3'] not in final_data[row['name_1']][row['name_2']]:
                final_data[row['name_1']][row['name_2']][row['name_3']] = {}
                final_data[row['name_1']][row['name_2']][row['name_3']]['dept_code_3'] = row['dept_code_3']

    print(final_data)

标签: pythonpython-3.xdata-structuresdata-science

解决方案


你可以看看itertools.groupby

import os
import csv

from io import StringIO
from itertools import groupby

with open(os.path.join('', 'test.csv')) as csv_file:
    content = StringIO(csv_file.read())
    csv_rows = csv.DictReader(content, delimiter=',')

    group_by_dept_code_1 = groupby(csv_rows, lambda n: n['dept_code_1'])

推荐阅读