首页 > 解决方案 > 将复杂的 XML 转换为 CSV

问题描述

我的 XML 结构如下:

<result>
    <report>
        <id>111</id>
        <user>username1</user>
        <actions_list>
            <action1>
                <id>a_1</id>
            </action1>
            <action1>
                <id>a_2</id>
            </action1>
            <action1>
                <id>a_3</id>
            </action1>
        </actions_list>
    </report>

    <report>
        <id>222</id>
        <user>username2</user>
        <actions_list>
            <action1>
                <id>a_1</id>
            </action1>
            <action2>
                <id>a_2</id>
            </action2>
            <action3>
                <id>a_3</id>
            </action3>
            <action4>
                <id>a_4</id>
            </action4>
            <action5>
                <id>a_5</id>
            </action5>
        <actions_list>
    </report>
</result>

所以,我想创建一个结构如下的 CSV 文件:

+---+-----+-----------+-----+
| 1 | 111 | username1 | a_1 |
+---+-----+-----------+-----+
| 1 | 111 | username1 | a_2 |
+---+-----+-----------+-----+
| 1 | 111 | username1 | a_3 |
+---+-----+-----------+-----+
| 2 | 222 | username2 | a_1 |
+---+-----+-----------+-----+
| 2 | 222 | username2 | a_2 |
+---+-----+-----------+-----+
| 2 | 222 | username2 | a_3 |
+---+-----+-----------+-----+
| 2 | 222 | username2 | a_4 |
+---+-----+-----------+-----+
| 2 | 222 | username2 | a_5 |
+---+-----+-----------+-----+

我尝试使用 python BeautifulSoup 和 xml.etree,但无法处理具有相同名称(在我的示例中为“id”)和不同报告中不同数量的操作的字段。我该怎么做?任何帮助将非常感激。提前致谢。

标签: pythonxmlparsing

解决方案


尝试以下操作:

import xml.etree.ElementTree as ET
import csv

with open("yourfile.csv) as f:
    writer = csv.writer(f)

    # Use either of the next two lines, depending on whether you're reading from a 
file
    root = ET.fromstring(your_xml_str)
    root = ET.parse("user_actions.xml")
    
    for reportIndex, report in enumerate(root, start=1):
        id = report.find("id").text
        user = report.find("user").text
        actions_list = report.find("actions_list")
        
        for action in actions_list:
            action_id = action.find("id").text
            writer.writerow([reportIndex, id, user, action_id])
        
        



推荐阅读