首页 > 解决方案 > 将 XML 导入 R

问题描述

我正在尝试将 xml 文件导入 R 并将其转换为数据框,但无法获取不同的节点。许多节点中都有字符(例如:“),所以我很难指定它们被拉出。当我向下移动层次结构时,我也不完全清楚如何拉出较低级别的节点。

我正在使用xmlParsexmlToDataFrame

doc <- xmlParse("http://www.orphadata.org/data/xml/en_product6.xml")
doc2 <-xmlToDataFrame(nodes=getNodeSet(doc,"//Disorder"))[c("OrphaNumber")]

#this works but when I try to add more nodes with unusual characters or lower levels it fails. 

doc3 <-xmlToDataFrame(nodes=getNodeSet(doc,"//Disorder"))[c("OrphaNumber","Name lang="en"")]

#or when I try to grab a lower node
doc4 <-xmlToDataFrame(nodes=getNodeSet(doc,"//Disorder"))[c("OrphaNumber","/DisorderGeneAssociation")]

预期结果是

head(doc3)
OrphaNumber   Name lang="en"
166024        Multiple epiphyseal dysplasia,
166035        Brachydactyly-short stature-retinitis pigmentosa syndrome


head(doc4)
OrphaNumber   DisorderGeneAssociationStatus

166024        <SourceOfValidation>22587682[PMID]
166035        <SourceOfValidation>28285769[PMID]</SourceOfValidation>

标签: rxmlimport

解决方案


推荐阅读