首页 > 解决方案 > 使用 xmlns 解析 XML

问题描述

我在 python3 中解析 XML 时遇到了很多麻烦。

例如,我只想获取作者姓名。即使经过数小时的搜索也无法弄清楚,您能帮我吗?

from urllib.request import urlopen
import xml.etree.ElementTree as ET

filing_url = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001326801&type=&dateb=&owner=include&start=0&count=40&output=atom"

        tree = ET.parse('countries.xml')
        root = tree.getroot()


        for child in root.findall('author'):
            print(child.tag, child.attrib)

xml 内容

    <?xml version="1.0" encoding="ISO-8859-1" ?>
    <feed xmlns="http://www.w3.org/2005/Atom">
        <author>
            <email>webmaster@sec.gov</email>
            <name>Webmaster</name>
        </author>
        <company-info><state-location>CA</state-location>
            <state-location-href>http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&amp;State=CA&amp;owner=include&amp;count=40</state-location-href>
            <state-of-incorporation>DE</state-of-incorporation>
        </company-info>
<entry>
        <category label="form type" scheme="http://www.sec.gov/" term="4" />
        <content type="text/xml">
            <accession-nunber>0001127602-18-034767</accession-nunber>
            <filing-date>2018-11-29</filing-date>
            <filing-href>http://www.sec.gov/Archives/edgar/data/1326801/000112760218034767/0001127602-18-034767-index.htm</filing-href>
            <filing-type>4</filing-type>
            <form-name>Statement of changes in beneficial ownership of securities</form-name>
            <size>4 KB</size>
        </content>
        <id>urn:tag:sec.gov,2008:accession-number=0001127602-18-034767</id>
        <link href="http://www.sec.gov/Archives/edgar/data/1326801/000112760218034767/0001127602-18-034767-index.htm" rel="alternate" type="text/html" />
        <summary type="html"> &lt;b&gt;Filed:&lt;/b&gt; 2018-11-29 &lt;b&gt;AccNo:&lt;/b&gt; 0001127602-18-034767 &lt;b&gt;Size:&lt;/b&gt; 4 KB</summary>
        <title>4  - Statement of changes in beneficial ownership of securities</title>
        <updated>2018-11-29T18:46:54-05:00</updated>
    </entry>
    <entry>
        <category label="form type" scheme="http://www.sec.gov/" term="4" />
        <content type="text/xml">
            <accession-nunber>0001127602-18-034766</accession-nunber>
            <filing-date>2018-11-29</filing-date>
            <filing-href>http://www.sec.gov/Archives/edgar/data/1326801/000112760218034766/0001127602-18-034766-index.htm</filing-href>
            <filing-type>4</filing-type>
            <form-name>Statement of changes in beneficial ownership of securities</form-name>
            <size>19 KB</size>
        </content>
        <id>urn:tag:sec.gov,2008:accession-number=0001127602-18-034766</id>
        <link href="http://www.sec.gov/Archives/edgar/data/1326801/000112760218034766/0001127602-18-034766-index.htm" rel="alternate" type="text/html" />
        <summary type="html"> &lt;b&gt;Filed:&lt;/b&gt; 2018-11-29 &lt;b&gt;AccNo:&lt;/b&gt; 0001127602-18-034766 &lt;b&gt;Size:&lt;/b&gt; 19 KB</summary>
        <title>4  - Statement of changes in beneficial ownership of securities</title>
        <updated>2018-11-29T18:44:39-05:00</updated>
    </entry>
</feed>

标签: pythonxmlpython-3.xxml-parsing

解决方案


我不是 100% 确定你的问题是什么。但是,如果你能够我推荐使用BeautifulSoup

例如 :

from bs4 import BeautifulSoup

infile = open("myxml.xml","r")

contents = infile.read()

soup = BeautifulSoup(contents,'html.parser')

authors = soup.find_all('author')


for author in authors:
    print (author)

#Output--
#<author>
#<email>webmaster@sec.gov</email>
#<name>Webmaster</name>
#</author>

推荐阅读