python - 访问 !ENTITY 语句和引用
问题描述
我有一些带有 !ENTITY Definitions 和 &file_reference; 的 xml 文件
我可以成功处理这些。
但是我想预处理文件并访问 !ENTITY 定义以提取文件名以及 &file_references 以及它们所在的 xml 部分
示例 XML 文件如下所示
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE gdml [
<!ENTITY materials SYSTEM "materialsOptical.xml">
<!ENTITY solids_Mainz_v2 SYSTEM "solids_Mainz_v2.xml">
<!ENTITY matrices_Mainz_v2 SYSTEM "matrices_Mainz_v2.xml">
]>
<gdml xmlns:gdml="http://cern.ch/2001/Schemas/GDML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema/gdml.xsd">
<define>
<constant name="PI" value="1.*pi"/>
&matrices_Mainz_v2;
</define>
&materials;
&solids_Mainz_v2;
<structure>
.... continued...
我的编码尝试看起来像
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
#print "Encountered a start tag:", tag
# Entity may not be in latest tag so handle outselves
global currentSection
if tag in ["define","materials","solids","structure"] :
print("Section : "+tag)
currentSection = tag
def handle_decl(self, decl):
# This gets called when the entity is declared
print ("Encountered an Entity declaration ", decl)
words = decl.split()
wlen = len(words)
print (words)
if words[3] == "<!ENTITY" and wlen == 7:
# const that refers to a file
word = words[6].split('"')[1]
print("Entity "+words[4]+" : "+word)
filesDict[words[4]] = word
def handle_entityref(self, name):
# This gets called when the entity is referenced
# starttag may not be a section
print ("Entity reference : "+ name)
#tag = self.get_starttag_text()
print ("Current Section : "+ currentSection)
FilesEntity = True
sectionDict[currentSection] = filesDict[name]
# def handle_endtag(self, tag):
# print "Encountered an end tag :", tag
# def handle_data(self, data):
# print "Encountered some data :", data
def unknown_decl(data):
print ("Encountered unknown data :", data)
def preprocessHTML(doc,filename):
# Add files object so user can change to organise files
# from GDMLObjects import GDMLFiles, ViewProvider
print ("Preprocessing file for Entities File Definitions")
global FilesEntity, filesDict, sectionDict
FilesEntity = False
sectionDict = {} # Empty Dict
filesDict = {}
fp = io.open(filename)
parser = MyHTMLParser()
parser.feed(fp.read())
# myfiles = doc.addObject("App::FeaturePython","Export_Files")
# GDMLFiles(myfiles,FilesEntity,sectionDict)
print("End of Preprocessing")
当我运行它时,它只会拾取第一个实体
Preprocessing file for Entities File Definitions
Encountered an Entity declaration DOCTYPE gdml [
<!ENTITY materials SYSTEM "materialsOptical.xml"
['DOCTYPE', 'gdml', '[', '<!ENTITY', 'materials', 'SYSTEM', '"materialsOptical.xml"']
Entity materials : materialsOptical.xml
Section : define
Section : structure
End of Preprocessing
解决方案
推荐阅读
- matlab - 一个函数如何事先知道它应该在哪里存储它的输出?
- python - KMeans 预测的标签上的 dask compute() 问题
- azure-functions - 来自 Az Function .net5 中查询字符串的字符串数组输入
- python - 创建具有列表值并依赖于另一列的列
- list - SAS检查值是否在变量的动态列表中
- sharepoint-online - 从分类字段中获取值
- google-admin-sdk - 目录 API 查询“不等于”?
- java - JsonParser 获得第一个重复键
- c++ - 从类内结构方法获取类的引用
- python - 使用 pandas 现有列中的信息创建动态列