首页 > 解决方案 > HTML - XML:如何解决 ParseError(文本)

问题描述

最近我开始从 Jupyter Notebook 学习 Web 访问、服务和 XML,我跑到一个 Traceback,我不清楚如何解决它。有人可以给我一个方向来找出解决方案吗?

这是代码:

import re
import string
from bs4 import BeautifulSoup as bs
import xml.etree.ElementTree as ET

data = """
<html>
    <head>
        <title>HTML read as XML</title>
    </head>
    <body>
        <header>
            <h1>HTML file</h1>
        </header>
            <person>
                <name> Madderman </name>
                <phone type='local'> 
                            088 043 04 30
                </phone>
            </person>
            <table>
                <tr>
                    <th>Some stuff</th>
                    <td>Value/ Element</td>                   
                </tr>
                <tr>
                    <th><Some stuff with other stuff</th>
                    <td>Value /element</td>
                </tr>
            </table>
        <script></script>
    </body>
</html> """

check = ET.fromstring(data)  # This code import the data by directly reading from a string which is the root element of the parsed tree

print('Name', check.find('name').text) <br>
print('Phone', check.find('phone').text)

这是 TraceBack: XML parser.feed(text) 中的文件“/home/jupyterlab/conda/envs/python/lib/python3.6/xml/etree/ElementTree.py”,第 1314 行

提前致谢!

标签: pythonxmljupyter

解决方案


您的 XML 文档无效。

<th><Some stuff with other stuff</th>

看到<你在这个词之前Some


您应该使用正确的 xpath

import xml.etree.ElementTree as ET

data = """
<html>
    <head>
        <title>HTML read as XML</title>
    </head>
    <body>
        <header>
            <h1>HTML file</h1>
        </header>
            <person>
                <name> Madderman </name>
                <phone type='local'> 
                            088 043 04 30
                </phone>
            </person>
            <table>
                <tr>
                    <th>Some stuff</th>
                    <td>Value/ Element</td>                   
                </tr>
                <tr>
                    <th>Some stuff with other stuff</th>
                    <td>Value /element</td>
                </tr>
            </table>
        <script></script>
    </body>
</html> """

check = ET.fromstring( data) 


print('Name', check.find('.//name').text)
print('Phone', check.find('.//phone').text)

输出

Name  Madderman 
Phone  
                            088 043 04 30

推荐阅读