首页 > 解决方案 > 在将 .docx 文件转录为新的 .docx 文档时,如何插入以及基于段落的相关数据?

问题描述

for src_paragraph in src_doc.paragraphs:
                src_paragraph_format = src_paragraph.paragraph_format
                # print(src_paragraph.text)

                # Handle Headers/Footers                                        Headers not implemented
                # 
                sections = trgt_doc.sections                                    # there's only 1 section
                section = sections[0]
                footer = section.footer                                         # get the footer section of the section
                paragraph = footer.paragraphs[0]                                # footer has 1 paragraph
                paragraph.text = f'{page_number} \t\t\t {printed_time_stamp}'   

                # Transcribe paragraph settings - Build the target
                #
                trgt_paragraph = trgt_doc.add_paragraph(style = src_paragraph.style ) 

                if src_paragraph._p.pPr.numPr is not None:
                    print('\n <w:pStyle> :', src_paragraph._p.pPr.pStyle)
                    print ('<w:numPr> :', src_paragraph._p.pPr.numPr)
                    print ('\t<w:ilvl> :', src_paragraph._p.pPr.numPr.ilvl)
                    print ('\t<w:numId> :', src_paragraph._p.pPr.numPr.numId)
                    print('\n', src_paragraph.text)

                trgt_paragraph_format = trgt_paragraph.paragraph_format
                trgt_paragraph.style.name = src_paragraph.style.name
                trgt_paragraph_format.left_indent = src_paragraph_format.left_indent  # inherited from style hierarchy
                trgt_paragraph_format.right_indent = src_paragraph_format.right_indent 
                # print('S_INDENT -------|', src_paragraph_format.left_indent)
                # print('T_INDENT -------|', trgt_paragraph_format.left_indent)
                trgt_paragraph_format.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
                trgt_paragraph_format.widow_control = True
                font = trgt_paragraph.style.font
                font.name = 'Times'
                font.size = Pt(11)

我正在将 Word 文件转录成具有相同信息的类似文档。内容,但有修改和补充。我通过遍历源段落然后创建目标段落/运行来构建目标文件。

这主要完成,但捕获数字项目符号除外。我可以捕捉但此时不知道如何将这些值放入每个目标段落中。

这是我第一个使用 .docx 数据的项目,我正在研究这个。

标签: pythonpython-docx

解决方案


在尝试插入到我正在生成的目标 .docx 文件中时,我尝试了这种方法https://python.developreference.com/article/15889882/How+to+add+line+numbers+to+a+docx+document +section+使用+python-docx

https://stackoverflow.com/questions/38400208/how-to-add-line-numbers-to-a-docx-document-section-using-python-docx _

# Generate new Target file from Source File
for src_paragraph in src_doc.paragraphs:
    src_paragraph_format = src_paragraph.paragraph_format
    # Get Target section(s)
    sections = trgt_doc.sections
    section = sections[0]
    sectPr = section._sectPr
    lnNumType = OxmlElement('w:lnNumType')
    lnNumType.set('fooAttrib', '42')
    sectPr.append(lnNumType)
    print('STUBB')

这里是行号,而不是大纲样式编号列表。我只是想做一个初始插入,看看它会起作用;它没有。

# Add Numbered List to Target paragraphs.
# Isolate the number bulleted paragraphs 
if src_paragraph._p.pPr.numPr:
   # SOURCE XML Paragraphs containing numPr
   print('--------------------------------------------')
   print('TEXT_SRC', src_paragraph.text,'\n')
   print('SRC ParXML \n', src_paragraph._p.xml)
   print('--------------------------------------------')     

我可以通过这种方式在源 .docx 中找到十进制数;诀窍就是把它变成我正在生成的目标。


推荐阅读