首页 > 解决方案 > Python-Docx:如何识别段落是否斜体

问题描述

所以,假设,我有这个代码:

import docx

doc = docx.Document(r'C:\example.docx')
run_list = []

for p in doc.paragraphs:
   for run in p.runs:
      if not run.italic:
         run_list.append(run.text)

它现在将准确地将所有非斜体运行添加到 run_list,并跳过所有斜体运行。但是,如果整个段落是斜体,而不仅仅是一个运行,run.italic将返回False. 没有paragraph.italic方法,所以我想知道如何也跳过所有斜体段落?

标签: pythonpython-docx

解决方案


Okay, so after a lot of digging, it seems I've run into a limitation of python-docx. There's a difference between explicit styles (i.e. what's directly applied to a run) and effective styles (i.e. what's being displayed as a result of style hierarchies) The paragraph that failed to work had the 'Subtle Emphasis' style applied to it. This style told Word to apply italics to the whole paragraph; therefore, individual runs did not have it assigned.

To some degree, it's possible to counteract this by comparing paragraph.style.name. Outside of that, though, python-docx is unable to evaluate effective styles.

Further reading: https://python-docx.readthedocs.io/en/latest/dev/analysis/features/text/font.html

https://python-docx.readthedocs.io/en/latest/user/styles-using.html


推荐阅读