首页 > 解决方案 > 如何使用 OpenXML 从 Word 中获取嵌入文档与其媒体文件之间的关系

问题描述

当 Word 文档包含嵌入的 Office 文档时,它会创建一个媒体文件来显示文档的名称和徽标,并嵌入 Office 文档。我无法使用 OpenXML 将媒体文件与文档相关联。

我可以使用以下代码获取嵌入的文档和媒体文件,但我无法从班级成员那里看到两者之间的任何关系。

Private Shared Function ExtractStream(source As Document, Stream As IO.Stream, format As DocumentDataFormat) As DocumentList
    Dim Documents As New DocumentList()
    Const embeddingPartString As String = "/word/embeddings/"
    Const mediaPartString As String = "/word/media/"
    Using WordDoc = WordprocessingDocument.Open(Stream, False)

        Dim intDocumentIndex As Int32 = 1
        ' EmbeddedPackagePart - These are the Office 2007+ type documents
        For Each pkgPart In WordDoc.MainDocumentPart.GetPartsOfType(Of EmbeddedPackagePart)
            If pkgPart.Uri.ToString.StartsWith(embeddingPartString) Then
                Dim fileName1 As String
                fileName1 = pkgPart.Uri.ToString.Remove(0, embeddingPartString.Length)
                Dim Doc As Document = ReadOffice(source, pkgPart, format, intDocumentIndex)
                If (Doc IsNot Nothing) Then
                    Documents.Add(Doc)
                    intDocumentIndex += 1

                End If
            End If
        Next

        For Each pkgPart In WordDoc.MainDocumentPart.GetPartsOfType(Of ImagePart)
            ' Media files
            If pkgPart.Uri.ToString.StartsWith(mediaPartString) Then
                Dim fileName1 As String
                fileName1 = pkgPart.Uri.ToString.Remove(0, mediaPartString.Length)
                Dim Doc As Document = ReadMedia(source, pkgPart, format, intDocumentIndex)
                If (Doc IsNot Nothing) Then
                    Documents.Add(Doc)
                    intDocumentIndex += 1
                End If

            End If
        Next
    End Using
    Return Documents

End Function

如果我遍历类的子元素,我可以获得媒体文件和嵌入的 Ole 对象之间的关系WordDoc.MainDocumentPart.Document。当我找到一个 Ole 对象时,我会在父级的 ChildElements 集合中查找同级 Shape XmlElement。

这很好,除了我不知道如何从这些类中获取嵌入文件。

    ' Start with the Document class.
LookForOleObjects(WordDoc.MainDocumentPart.Document)


Private Shared Sub LookForOleObjects(elem As DocumentFormat.OpenXml.OpenXmlElement)
    If elem Is Nothing Then Return

    Dim ole = TryCast(elem, DocumentFormat.OpenXml.Vml.Office.OleObject)
    If (ole IsNot Nothing) Then
        ' found one.
        Dim img = GetImageFile(ole)
        If img IsNot Nothing Then
            ' found the image for the ole object
        End If
    End If
    If (elem.ChildElements IsNot Nothing) Then
        For Each child In elem.ChildElements
            LookForOleObjects(child)
        Next
    End If

End Sub

Private Shared Function GetImageFile(ole As OleObject) As DocumentFormat.OpenXml.Vml.ImageData
    Dim p As DocumentFormat.OpenXml.OpenXmlElement = ole.Parent
    For Each child In p.ChildElements
        Dim shape As DocumentFormat.OpenXml.Vml.Shape = TryCast(child, DocumentFormat.OpenXml.Vml.Shape)
        If shape IsNot Nothing Then
            Dim Img As DocumentFormat.OpenXml.Vml.ImageData = TryCast(shape.ChildElements(0), DocumentFormat.OpenXml.Vml.ImageData)
            If Img IsNot Nothing Then Return Img
        End If
    Next
    Return Nothing
End Function

我有两种方法,但每一种都缺乏。第一个获取嵌入文档但不获取关系,第二个获取关系但不获取嵌入文档。我错过了什么?如何使用 OpenXML 获取嵌入文档及其相关的媒体文件?

标签: .netvb.netopenxml

解决方案


推荐阅读