python - 如何从属性中解析文本
问题描述
我有一个小问题需要克服......我对在 Python 中解析 XML 文件还是很陌生,但我设法做到了,并将其转换为 csv 文件。一切都很好,除了我无法获得的一个数据。
这里的xml代码:
<Stat Type="matchday">1</Stat>
<Stat Type="season_name">Season 2017/2018</Stat>
<Stat Type="symid">FR_L1</Stat>
这是我的python代码:
from xml.etree import ElementTree
import csv
file_name="C:/Users/Hp/Desktop/BYG/Angers-Bordeaux.xml"
full_file=os.path.abspath(os.path.join('BYG',file_name))
dom=ElementTree.parse(full_file)
MatchDay=dom.findall('SoccerDocument/Competition/Stat')
TeamData=dom.findall('SoccerDocument/MatchData/TeamData')
for m in MatchDay:
Match=m.get('Type')
Day=m.text
它有效,但是当我只想要“比赛日”和“1”时,我所有这些,我不知道如何只选择这些数据,我尝试了很多方法,但它们都失败了..
谢谢你的帮助
解决方案
I would use Beautiful Soup for this. Beautiful soup lets you parse HTML pages, but since HTML is also just a form of XML, it also works with your example.
Install Beautiful Soup with
pip install beautifulsoup4
Change Your code to the following:
from bs4 import BeautifulSoup
file_name="C:/Users/Hp/Desktop/BYG/Angers-Bordeaux.xml"
full_file_name=os.path.abspath(os.path.join('BYG',file_name))
# Read contents of your file
with open(full_file_name) as f:
raw_text = f.read()
# Parse XML with beautiful soup
soup = BeautifulSoup(raw_text, features="lxml")
# Find all Stat Elements
elements = soup.find_all("stat")
# Go through all elements and print them
for element in elements:
element_type = element["type"]
element_text = element.text
print(element_type, element_text)
This produces the following output:
matchday 1
season_name Season 2017/2018
symid FR_L1
Now if you are only interested in the elements of type matchday
you can get them as follows:
# Only select elements with type 'matchday'
elements = soup.find_all("stat", {"type":"matchday"})
for element in elements:
element_type = element["type"]
element_text = element.text
print(element_type, element_text)
This will produce the following output:
matchday 1
Hope this helps :)
推荐阅读
- javascript - 是否可以从 VSCode 扩展的 Webview 调用 window.print() ?
- laravel - 如何从单个页面反应应用程序登录到另一个域上的 Laravel 7.x?
- python - 如何对 Pandas 数据透视表进行排序但将总数保留在表的末尾
- python - 如何消除参数解析中的错误
- javascript - TypeError:message.guild.members.filter 不是函数
- c - 将位图背景设置为与其后面的位图相同的像素 - Windows API
- javascript - 在 JS 中创建动态表,“Cannot read property 'addEventListener' of null”
- java - 尝试启动 VM 时 Azure ApplicationTokenCredentials 中的评估错误 (NoSuchMethodError)
- reactjs - 更改组件上的 redux 状态将卸载
- javascript - 在 Javascript 正则表达式中用作“非”时,插入符号 (^) 似乎转义了正斜杠