python - 使用 elemettree 获取 XML 中特定标签的内容
问题描述
以下是我的 XML 数据:
<PubmedArticle>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">1883738</PMID>
<DateCompleted>
<Year>1991</Year>
<Month>10</Month>
<Day>07</Day>
</DateCompleted>
<DateRevised>
<Year>2013</Year>
<Month>11</Month>
<Day>21</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Print">0959-9673</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>72</Volume>
<Issue>4</Issue>
<PubDate>
<Year>1991</Year>
<Month>Aug</Month>
</PubDate>
</JournalIssue>
<Title>International journal of experimental pathology</Title>
<ISOAbbreviation>Int J Exp Pathol</ISOAbbreviation>
</Journal>
<ArticleTitle>The effect of HeNe laser radiation on the thyroid gland of the rat.</ArticleTitle>
<Pagination>
<MedlinePgn>379-85</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText>Although laser irradiation is becoming common practice in medicine, there is not always a clear understanding of the possible side-effects. The present report is a light and electron microscopic study of the effects of fixed low intensity doses of soft HeNe laser on the thyroid of Wistar rats. The immediate effects are mild multifocal degenerative changes; these lesions recover in less than 3 months. Long-term lesions are identified only by electron microscopy; they consist of an increased number of peroxisomes and free or intramitochondrial crystalline structures. We discuss the laser's hypothetical functions.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Lerma</LastName>
<ForeName>E</ForeName>
<Initials>E</Initials>
<AffiliationInfo>
<Affiliation>Department of Pathology and Radiology, Hospital Universitario Virgen Macarena, University of Seville, Spain.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Hevia</LastName>
<ForeName>A</ForeName>
<Initials>A</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Rodrigo</LastName>
<ForeName>P</ForeName>
<Initials>P</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Gonzalez-Campora</LastName>
<ForeName>R</ForeName>
<Initials>R</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Armas</LastName>
<ForeName>J R</ForeName>
<Initials>JR</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Galera</LastName>
<ForeName>H</ForeName>
<Initials>H</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Int J Exp Pathol</MedlineTA>
<NlmUniqueID>9014042</NlmUniqueID>
<ISSNLinking>0959-9673</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>06LU7C9H1V</RegistryNumber>
<NameOfSubstance UI="D014284">Triiodothyronine</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>Q51BO43MG4</RegistryNumber>
<NameOfSubstance UI="D013974">Thyroxine</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<CommentsCorrectionsList>
<CommentsCorrections RefType="Cites">
<RefSource>J Histochem Cytochem. 1969 Oct;17(10):675-80</RefSource>
<PMID Version="1">4194356</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Acta Anat (Basel). 1986;125(1):10-3</RefSource>
<PMID Version="1">3953239</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Anat Anz. 1977;142(3):209-12</RefSource>
<PMID Version="1">603070</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Cell Biol. 1964 Nov;23:383-5</RefSource>
<PMID Version="1">14222822</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Cell Biol. 1967 Jun;33(3):605-23</RefSource>
<PMID Version="1">6036524</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Am J Med. 1983 May;74(5):852-62</RefSource>
<PMID Version="1">6837608</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Exp Eye Res. 1977 Jan;24(1):45-56</RefSource>
<PMID Version="1">402283</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D007834" MajorTopicYN="N">Lasers</DescriptorName>
<QualifierName UI="Q000009" MajorTopicYN="Y">adverse effects</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008297" MajorTopicYN="N">Male</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008830" MajorTopicYN="N">Microbodies</DescriptorName>
<QualifierName UI="Q000528" MajorTopicYN="N">radiation effects</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008854" MajorTopicYN="N">Microscopy, Electron</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D051381" MajorTopicYN="N">Rats</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D011919" MajorTopicYN="N">Rats, Inbred Strains</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D013961" MajorTopicYN="N">Thyroid Gland</DescriptorName>
<QualifierName UI="Q000528" MajorTopicYN="Y">radiation effects</QualifierName>
<QualifierName UI="Q000648" MajorTopicYN="N">ultrastructure</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D013974" MajorTopicYN="N">Thyroxine</DescriptorName>
<QualifierName UI="Q000097" MajorTopicYN="N">blood</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D014284" MajorTopicYN="N">Triiodothyronine</DescriptorName>
<QualifierName UI="Q000097" MajorTopicYN="N">blood</QualifierName>
</MeshHeading>
</MeshHeadingList>
<OtherID Source="NLM">PMC2001961</OtherID>
</MedlineCitation>
<PubmedData>
我需要从文档中提取所有作者姓氏。但是,有多个这样的文件,每个文件都有不同的作者姓名。如何解析此文件并仅将作者姓氏提取到列表中以创建数据库?
我已经使用 elementtree 来解析文档。以下是我的代码:
tree = ET.parse("file path"+file)
doc = tree.getroot()
for LastName in doc.iter('LastName'):
file1 = (ET.tostring(LastName, encoding='utf8').decode('utf8'))
file2 = file1[48:(len(file1))]
author_name_lastname = file2.split("<")[0]
print(author_name_lastname)
在这里我只能打印第一个作者姓名“Lerma”。
解决方案
import os
from lxml import etree as ET
DIR="D:\yourfilesdirectory/"
for filename in os.listdir(DIR):
if filename.endswith(".xml"):
with open(file=DIR+filename,mode='r',encoding='utf-8') as file:
_tree = ET.fromstring(text=file.read())
_all_metadata_tags = _tree.xpath('.//LastName')
for i in _all_metadata_tags:
print(i.text + '\n')
else:
print("skipping for filename")
推荐阅读
- html - 如何通过 Django 模板中的按钮调用视图功能?
- javascript - 通过javascript向url添加和删除分隔符
- python - /[\x00-\x7F]/ 在 Python 重新引擎中的保证行为
- c# - 如何有效地限制 MVC Web 应用程序中从控制器到前端的 SignalR 集线器消息的速率?
- r - R:按年份创建具有最高值的表
- python - 在python中将文件保存到Windows路径
- oop - UML 中的多态性和泛化
- html - SVG:FF 和 Chrome 中的文本居中不同
- go - 如何从golang中的对象时间获取字符串或int64?
- c# - 如何从数据库上下文中制作参数以在函数中使用它?