python - Python-Docx:如何识别段落是否斜体
问题描述
所以,假设,我有这个代码:
import docx
doc = docx.Document(r'C:\example.docx')
run_list = []
for p in doc.paragraphs:
for run in p.runs:
if not run.italic:
run_list.append(run.text)
它现在将准确地将所有非斜体运行添加到 run_list,并跳过所有斜体运行。但是,如果整个段落是斜体,而不仅仅是一个运行,run.italic
将返回False
. 没有paragraph.italic
方法,所以我想知道如何也跳过所有斜体段落?
解决方案
Okay, so after a lot of digging, it seems I've run into a limitation of python-docx. There's a difference between explicit styles (i.e. what's directly applied to a run) and effective styles (i.e. what's being displayed as a result of style hierarchies) The paragraph that failed to work had the 'Subtle Emphasis' style applied to it. This style told Word to apply italics to the whole paragraph; therefore, individual runs did not have it assigned.
To some degree, it's possible to counteract this by comparing paragraph.style.name. Outside of that, though, python-docx is unable to evaluate effective styles.
Further reading: https://python-docx.readthedocs.io/en/latest/dev/analysis/features/text/font.html
https://python-docx.readthedocs.io/en/latest/user/styles-using.html
推荐阅读
- stream - 如何在时间翻转时禁用 VLC 停止录制 m3u8 流
- node.js - 在 catchError 中获取源值 - RxJS
- macos - 无法在 macos 中安装自制软件
- printing - ls 和打印命令
- javascript - 如何将画布保存为图像而不是直接下载?
- json - 如何在打字稿中使某些类型声明全局可用?
- javascript - 如何在 Laravel 8 中使用输入的值在不刷新页面或提交表单的情况下在同一页面上呈现特定内容
- mysql - SQL查询以查找击球手在powerplay中的最高击球率(1-6以上)
- typescript - Typescript参数'key'和'value'的类型不兼容
- json - 如何将 id 附加到 whitelist.json 以便将其添加并且机器人无需重新启动即可读取它