encoding - 如何使用 python-docx 将具有非 UTF 字符的 JPEG 图像添加到 docx 中?
问题描述
我正在尝试使用 python-docx 将一系列 JPEG 插入 Word 文档,但似乎其中一些可能包含非 UTF-8 元数据,这导致 docx 发出 Unicode 解码错误消息。我怎样才能解决这个问题?
这是代码:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from docx import Document
from docx.shared import Inches
from docx.enum.table import *
from docx.enum.text import WD_ALIGN_PARAGRAPH
from PIL import Image
from PIL.ExifTags import TAGS
document = Document()
table = document.add_table(rows=1, cols=1)
table.alignment = WD_TABLE_ALIGNMENT.CENTER
row_cells = table.add_row().cells
paragraph = row_cells[0].paragraphs[0]
paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
run = paragraph.add_run()
run.add_picture("143269.jpg", height=Inches(5))
document.save('demo.docx')
和错误回溯
runfile('/Users/fred/bin/nimble/dvids/picimporttest.py', wdir='/Users/fred/bin/nimble/dvids')
Traceback (most recent call last):
File "/Users/fred/bin/nimble/dvids/picimporttest.py", line 26, in <module>
run.add_picture("143269.jpg", height=Inches(5))
File "/usr/local/lib/python3.8/site-packages/docx/text/run.py", line 62, in add_picture
inline = self.part.new_pic_inline(image_path_or_stream, width, height)
File "/usr/local/lib/python3.8/site-packages/docx/parts/story.py", line 56, in new_pic_inline
rId, image = self.get_or_add_image(image_descriptor)
File "/usr/local/lib/python3.8/site-packages/docx/parts/story.py", line 29, in get_or_add_image
image_part = self._package.get_or_add_image_part(image_descriptor)
File "/usr/local/lib/python3.8/site-packages/docx/package.py", line 31, in get_or_add_image_part
return self.image_parts.get_or_add_image_part(image_descriptor)
File "/usr/local/lib/python3.8/site-packages/docx/package.py", line 74, in get_or_add_image_part
image = Image.from_file(image_descriptor)
File "/usr/local/lib/python3.8/site-packages/docx/image/image.py", line 55, in from_file
return cls._from_stream(stream, blob, filename)
File "/usr/local/lib/python3.8/site-packages/docx/image/image.py", line 176, in _from_stream
image_header = _ImageHeaderFactory(stream)
File "/usr/local/lib/python3.8/site-packages/docx/image/image.py", line 198, in _ImageHeaderFactory
return cls.from_stream(stream)
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 68, in from_stream
markers = _JfifMarkers.from_stream(stream)
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 111, in from_stream
for marker in marker_parser.iter_markers():
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 176, in iter_markers
marker = _MarkerFactory(
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 271, in _MarkerFactory
return marker_cls.from_stream(stream, marker_code, offset)
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 413, in from_stream
tiff = cls._tiff_from_exif_segment(stream, offset, segment_length)
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 455, in _tiff_from_exif_segment
return Tiff.from_stream(substream)
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 36, in from_stream
parser = _TiffParser.parse(stream)
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 63, in parse
ifd_entries = _IfdEntries.from_stream(stream_rdr, ifd0_offset)
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 176, in from_stream
entries = dict((e.tag, e.value) for e in ifd_parser.iter_entries())
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 176, in <genexpr>
entries = dict((e.tag, e.value) for e in ifd_parser.iter_entries())
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 204, in iter_entries
ifd_entry = _IfdEntryFactory(self._stream_rdr, dir_entry_offset)
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 231, in _IfdEntryFactory
return entry_cls.from_stream(stream_rdr, offset)
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 255, in from_stream
value = cls._parse_value(
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 294, in _parse_value
return stream_rdr.read_str(value_count-1, value_offset)
File "/usr/local/lib/python3.8/site-packages/docx/image/helpers.py", line 71, in read_str
unicode_str = chars.decode('UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 76: invalid start byte
和一个示例问题 jpeg。
大多数jpgs工作。
解决方案
推荐阅读
- windows-10 - Steps to enable the print screen key using edit registry in window 10
- kubernetes - Kuberenetes 证书管理器和 nginx
- java - 便携式 java JAR 并返回一个文件
- javascript - 如何在视图内制作带有图像和标题的导航按钮 React Native
- r - 是否有用于绘制 2^k 因子设计中效果的正态概率图的 R 函数?
- java - 如何解释此 java 代码中的排序?
- python - 如何从列表中删除重复键
- css - Textarea 拥有多个受影响的值
- unity3d - ARkit - 无法在 Unity 中使用 ARkit 更新 6 加载更大的 3d 模型
- docker - Docker 提交在 Windows Server 2019 上不起作用