首页 > 解决方案 > 如何使用 pandas 将 xml 转换为 dataFrame

问题描述

我是熊猫的新人,我刚刚开始我的代码学习。拜托,如果你能帮助我,那就太好了。我有一个像这样的简单 XML,我想用 pandas 将它转换成数据框

   <products_availability date="2020-01-24 06:32" >
    <region id="122">
        <products count="45453242">
            <product id="1000001">0</product>
            <product id="1000002">5</product>
            <product id="1000003">3</product>
        </products>
   </region>
   </products_availability>`

我使用了一些代码,但无论如何它对我没有帮助:

    import pandas as pd
    import xml.etree.ElementTree as et
    xtree = et.parse("file.xml")
    xroot = xtree.getroot()
    df_cols = ["product"]
    rows = []
    for node in xroot:
    s_product = node.attrib.get("product")
    rows.append({"name":  s_product
                 })
    out_df = pd.DataFrame(rows, columns = df_cols)

在此处输入图像描述

标签: pythonxmlpandasdataframe

解决方案


如果不需要按地区分类,可以使用findall方法或iterfind方法查找所有匹配的子元素:

<products_availability date="2020-01-24 06:32" >
    <region id="122">
        <products count="45453242">
            <product id="1000001">0</product>
            <product id="1000002">5</product>
            <product id="1000003">3</product>
        </products>
    </region>
    <region id="133">
        <products count="45453242">
            <product id="1000004">7</product>
            <product id="1000005">3</product>
            <product id="1000006">1</product>
        </products>
    </region>
</products_availability>
import pandas as pd
import xml.etree.ElementTree as et


columns = ["product", "products_availability"]
xtree = et.parse("file.xml")
products = ((p.get("id"), p.text) for p in xtree.iterfind(".//product"))
out_df = pd.DataFrame(products, columns=columns)
>>> out_df


   product products_availability
0  1000001                     0
1  1000002                     5
2  1000003                     3
3  1000004                     7
4  1000005                     3
5  1000006                     1

如果需要该区域,只需:

import pandas as pd
import xml.etree.ElementTree as et


columns = ["product", "products_availability", "region"]
xtree = et.parse("file.xml")
prds = ((p.get("id"), p.text, r.get("id")) for r in xtree.iterfind(".//region")
            for p in r.iterfind(".//product")
            )

out_df = pd.DataFrame(prds, columns=columns)
>>> out_df


   product products_availability region
0  1000001                     0    122
1  1000002                     5    122
2  1000003                     3    122
3  1000004                     7    133
4  1000005                     3    133
5  1000006                     1    133

推荐阅读