python - 通过本地 MathML DTD 验证 MathML xml 字符串
问题描述
我正在尝试以这种方式使用 lxml 验证 MathML XML 字符串:
import lxml.etree
mathml = """
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 3.0//EN" "http://www.w3.org/Math/DTD/mathml3/mathml3.dtd">
<math xmlns="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<mi>a</mi>
<mo>+</mo>
<mi>b</mi>
</math>
"""
lxml_parser = lxml.etree.XMLParser(
dtd_validation=True,
no_network=False,
load_dtd=True,
ns_clean=True,
remove_blank_text=True,
)
validated = lxml.etree.fromstring(xml, lxml_parser)
通过这种方式,它检查 mathml 字符串中指定的 DTD 并通过网络验证字符串。
问题
当没有可用的 nwtwork 时,如何根据本地 DTD 验证 mathml 字符串?
我试过的
我从 https://www.w3.org/Math/DTD/mathml3/mathml3.dtd 下载了 MathML DTD 3,从https://www.w3.org/Math/DTD/mathml1/下载了MathML DTD 1 mathml.dtd并将它们保存在当前工作目录中,我已将 DOCTYPE 声明更改为指向本地 DTD,例如<!DOCTYPE math SYSTEM "path/to/mathml_dtd.dtd">
,我终于创建了 lxml.etree.XMLParser no_network=True
,但是当我运行以下代码时
import lxml.etree
mathml = """
<!DOCTYPE math SYSTEM "path/to/dtd/mathml3.dtd">
<math xmlns="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<mrow>
<mo>⌊</mo>
<mrow>
<mi>a</mi>
</mrow>
<mo>⌋</mo>
</mrow>
</math>
"""
lxml_parser = lxml.etree.XMLParser(
dtd_validation=True,
no_network=True,
load_dtd=True,
ns_clean=True,
remove_blank_text=True,
)
validated = lxml.etree.fromstring(xml, lxml_parser)
我有这个错误:
File "/projects/py_asciimath/py_asciimath/parser/parser.py", line 117, in __dtd_validation
return lxml.etree.fromstring(xml, lxml_parser)
File "src/lxml/etree.pyx", line 3235, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1876, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1757, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
File "/projects/py_asciimath/py_asciimath/translation/dtd/mathml3.dtd", line 36
lxml.etree.XMLSyntaxError: conditional section INCLUDE or IGNORE keyword expected, line 36, column 17
使用 MathML DTD 1 的本地副本,我得到了这个:
File "/projects/py_asciimath/py_asciimath/parser/parser.py", line 117, in __dtd_validation
return lxml.etree.fromstring(xml, lxml_parser)
File "src/lxml/etree.pyx", line 3235, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1876, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1757, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
File "<string>", line 1
lxml.etree.XMLSyntaxError: Entity 'lfloor' not defined, line 1, column 135
解决方案
推荐阅读
- flutter - Flutter Widget Overflow Parent(Tabbar 超大项目)
- sql-server-2016 - SQL Server - AlwaysOn 重定向写入启用可读辅助
- html - 如何修复 div 在 html 中不能正确收缩
- android - 使用后退按钮更改导航抽屉按钮后无法恢复
- java - 如何在 Hibernate 中放置自定义类型
- powerbi - DAX - 应用过滤器并检查记录是否为特定类别,然后输出 1,否则输出 0
- cassandra - cassandra 会在同时执行的两个并行创建键空间命令上失败吗
- events - Flutter - 每次页面更改时运行一个函数
- javascript - Javascript - 使用对象类名过滤数组
- node.js - 如何调用 API 到多个微服务但只验证一次并独立调用?