python - BeautifulSoup XML 到 CSV
问题描述
下面的代码获取一个 xml 文件并将其解析为 csv 文件。
import openpyxl
from bs4 import BeautifulSoup
with open('1last.xml') as f_input:
soup = BeautifulSoup(f_input, 'lxml')
wb = openpyxl.Workbook()
ws = wb.active
ws.title = "Sheet1"
ws.append(["Description", "num", "text"])
for description in soup.find_all("description"):
ws.append(["", description['num'], description.text])
ws.append(["SetData", "x", "value", "xin", "xax"])
for setdata in soup.find_all("setdata"):
ws.append(["", setdata.get('x', ''), setdata.get('value', ''), setdata.get('xin', ''), setdata.get('xax', '')])
wb.save(filename="1last.csv")
这是输出
这是 XML 文件
<?xml version="1.0" encoding="utf-8"?>
<ProjectData>
<FINAL>
<START id="ID0001" service_code="0x5196">
<Docs Docs_type="START">
<Rational>225196</Rational>
<Qualify>6251960000A0DE</Qualify>
</Docs>
<Description num="1213f2312">The parameter</Description>
<DataFile dg="12" dg_id="let">
<SetData value="32" />
</DataFile>
</START>
<START id="DG0003" service_code="0x517B">
<Docs Docs_type="START">
<Rational>23423</Rational>
<Qualify>342342</Qualify>
</Docs>
<Description num="3423423f3423">The third</Description>
<DataFile dg="55" dg_id="big">
<SetData x="E1" value="21259" />
<SetData x="E2" value="02" />
</DataFile>
</START>
<START id="ID0048" service_code="0x5198">
<RawData rawdata_type="START">
<Rational>225198</Rational>
<Qualify>343243324234234</Qualify>
</RawData>
<Description num="434234234">The forth</Description>
<DataFile unit="21" unit_id="FEDS">
<Ycross unit="ce" points="21" name="Thefiles" text_id="54" unit_id="98"
<SetData xin="5" xax="233" value="323" />
<SetData xin="123" xax="77" value="555" />
<SetData xin="17" xax="65" value="23" />
</DataFile>
</START>
</FINAL>
</ProjectData>
最近我一直在尝试修改代码,使其遍历START的所有子级并将它们解析为列。如果一个子元素有更多行,它将像上面的代码一样解析为新行。不幸的是,没有成功,只是停留在这一刻
解决方案
你可以尝试这样的事情。
我只为几个标签编写了代码。您可以类似地轻松填写其余所需的标签。希望能帮助到你!
编辑添加设置数据标签值。
from xml.etree import ElementTree as ET
from collections import defaultdict
import csv
tree = ET.parse(StringIO(data))
root = tree.getroot()
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
start_nodes = root.findall('.//START')
headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
writer.writerow(headers)
for sn in start_nodes:
row = defaultdict(str)
for k,v in sn.attrib.items():
row[k] = v
for rn in sn.findall('.//Rational'):
row['rational'] = rn.text
for qu in sn.findall('.//Qualify'):
row['qualify'] = qu.text
for ds in sn.findall('.//Description'):
row['description_txt'] = ds.text
row['description_num'] = ds.attrib['num']
# all other tags except set data must be parsed before this.
for st in sn.findall('.//SetData'):
for k,v in st.attrib.items():
row['set_data_'+ str(k)] = v
row_data = [row[i] for i in headers]
writer.writerow(row_data)
row = defaultdict(str)
更新
添加
for st in sn.findall('.//DataFile'):
for k,v in st.attrib.items():
row['datafile_'+ str(k)] = v
for st in sn.findall('.//Ycross'):
for k,v in st.attrib.items():
row['ycross_'+ str(k)] = v
headers
以及列表的相应值
推荐阅读
- javascript - Java Script中函数和命名函数表达式的代码性能和区别
- openldap - 在 openldap 中更改用户的 pwdChangedTime 属性
- mysql - 如何将同一表中的总值显示为一行(在case语句中)?
- java - 带有自定义登录的 Spring Security LDAP 身份验证
- c# - 未经管理员同意,为用户获取所有驱动器,包括来自站点的驱动器
- java - 即使在 HQL 查询中使用 Join Fetch,Hibernate 延迟初始化异常
- servlets - Jsp 动作表单
- python - 忽略正则表达式匹配中的换行符
- c# - 记录 CSV 数据导入
- vba - VBA用户表单数据填写