首页 > 解决方案 > 如何使用 python-docx 将具有非 UTF 字符的 JPEG 图像添加到 docx 中?

问题描述

我正在尝试使用 python-docx 将一系列 JPEG 插入 Word 文档,但似乎其中一些可能包含非 UTF-8 元数据,这导致 docx 发出 Unicode 解码错误消息。我怎样才能解决这个问题?

这是代码:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from docx import Document
from docx.shared import Inches
from docx.enum.table import *
from docx.enum.text import WD_ALIGN_PARAGRAPH
from PIL import Image
from PIL.ExifTags import TAGS

document = Document()


table = document.add_table(rows=1, cols=1)
table.alignment = WD_TABLE_ALIGNMENT.CENTER
row_cells = table.add_row().cells
paragraph = row_cells[0].paragraphs[0]
paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
run = paragraph.add_run()


run.add_picture("143269.jpg", height=Inches(5))
document.save('demo.docx')

和错误回溯

runfile('/Users/fred/bin/nimble/dvids/picimporttest.py', wdir='/Users/fred/bin/nimble/dvids')
Traceback (most recent call last):

  File "/Users/fred/bin/nimble/dvids/picimporttest.py", line 26, in <module>
    run.add_picture("143269.jpg", height=Inches(5))

  File "/usr/local/lib/python3.8/site-packages/docx/text/run.py", line 62, in add_picture
    inline = self.part.new_pic_inline(image_path_or_stream, width, height)

  File "/usr/local/lib/python3.8/site-packages/docx/parts/story.py", line 56, in new_pic_inline
    rId, image = self.get_or_add_image(image_descriptor)

  File "/usr/local/lib/python3.8/site-packages/docx/parts/story.py", line 29, in get_or_add_image
    image_part = self._package.get_or_add_image_part(image_descriptor)

  File "/usr/local/lib/python3.8/site-packages/docx/package.py", line 31, in get_or_add_image_part
    return self.image_parts.get_or_add_image_part(image_descriptor)

  File "/usr/local/lib/python3.8/site-packages/docx/package.py", line 74, in get_or_add_image_part
    image = Image.from_file(image_descriptor)

  File "/usr/local/lib/python3.8/site-packages/docx/image/image.py", line 55, in from_file
    return cls._from_stream(stream, blob, filename)

  File "/usr/local/lib/python3.8/site-packages/docx/image/image.py", line 176, in _from_stream
    image_header = _ImageHeaderFactory(stream)

  File "/usr/local/lib/python3.8/site-packages/docx/image/image.py", line 198, in _ImageHeaderFactory
    return cls.from_stream(stream)

  File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 68, in from_stream
    markers = _JfifMarkers.from_stream(stream)

  File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 111, in from_stream
    for marker in marker_parser.iter_markers():

  File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 176, in iter_markers
    marker = _MarkerFactory(

  File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 271, in _MarkerFactory
    return marker_cls.from_stream(stream, marker_code, offset)

  File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 413, in from_stream
    tiff = cls._tiff_from_exif_segment(stream, offset, segment_length)

  File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 455, in _tiff_from_exif_segment
    return Tiff.from_stream(substream)

  File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 36, in from_stream
    parser = _TiffParser.parse(stream)

  File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 63, in parse
    ifd_entries = _IfdEntries.from_stream(stream_rdr, ifd0_offset)

  File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 176, in from_stream
    entries = dict((e.tag, e.value) for e in ifd_parser.iter_entries())

  File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 176, in <genexpr>
    entries = dict((e.tag, e.value) for e in ifd_parser.iter_entries())

  File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 204, in iter_entries
    ifd_entry = _IfdEntryFactory(self._stream_rdr, dir_entry_offset)

  File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 231, in _IfdEntryFactory
    return entry_cls.from_stream(stream_rdr, offset)

  File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 255, in from_stream
    value = cls._parse_value(

  File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 294, in _parse_value
    return stream_rdr.read_str(value_count-1, value_offset)

  File "/usr/local/lib/python3.8/site-packages/docx/image/helpers.py", line 71, in read_str
    unicode_str = chars.decode('UTF-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 76: invalid start byte

和一个示例问题 jpeg。

大多数jpgs工作。

在此处输入图像描述

标签: encodingpython-docx

解决方案


推荐阅读