首页 > 解决方案 > 如何从属性中解析文本

问题描述

我有一个小问题需要克服......我对在 Python 中解析 XML 文件还是很陌生,但我设法做到了,并将其转换为 csv 文件。一切都很好,除了我无法获得的一个数据。

这里的xml代码:

<Stat Type="matchday">1</Stat>
<Stat Type="season_name">Season 2017/2018</Stat>
<Stat Type="symid">FR_L1</Stat>

这是我的python代码:

from xml.etree import ElementTree
import csv


file_name="C:/Users/Hp/Desktop/BYG/Angers-Bordeaux.xml"
full_file=os.path.abspath(os.path.join('BYG',file_name))

dom=ElementTree.parse(full_file)


MatchDay=dom.findall('SoccerDocument/Competition/Stat')
TeamData=dom.findall('SoccerDocument/MatchData/TeamData')

for m in MatchDay:
    Match=m.get('Type')
    Day=m.text

它有效,但是当我只想要“比赛日”和“1”时,我所有这些,我不知道如何只选择这些数据,我尝试了很多方法,但它们都失败了..

谢谢你的帮助

标签: pythonxml-parsing

解决方案


I would use Beautiful Soup for this. Beautiful soup lets you parse HTML pages, but since HTML is also just a form of XML, it also works with your example.

  1. Install Beautiful Soup with pip install beautifulsoup4

  2. Change Your code to the following:

from bs4 import BeautifulSoup

file_name="C:/Users/Hp/Desktop/BYG/Angers-Bordeaux.xml"
full_file_name=os.path.abspath(os.path.join('BYG',file_name))

#  Read contents of your file
with open(full_file_name) as f:
    raw_text = f.read()

#  Parse XML with beautiful soup
soup = BeautifulSoup(raw_text, features="lxml")

# Find all Stat Elements
elements = soup.find_all("stat")

# Go through all elements and print them
for element in elements:
    element_type = element["type"]
    element_text = element.text
    print(element_type, element_text)

This produces the following output:

matchday 1
season_name Season 2017/2018
symid FR_L1

Now if you are only interested in the elements of type matchday you can get them as follows:

# Only select elements with type 'matchday'
elements = soup.find_all("stat", {"type":"matchday"})

for element in elements:
    element_type = element["type"]
    element_text = element.text
    print(element_type, element_text)

This will produce the following output:

matchday 1

Hope this helps :)


推荐阅读