首页 > 解决方案 > pypdf2命令将pdf转换为html失败并出现TypeError:write()参数必须是str,而不是字节,pdfminer,

问题描述

我正在尝试使用以下代码将多个 PDF 转换为 HTML:

import os
for x in range(100) :
pathName = "/Users/supreet/Downloads/PDFLocation/"
fileName = pathName + str(x+1) + ".pdf"
command = 'pdf2txt.py -o /Users/supreet/Downloads/HTMLFolder/'+str(x+1)+'.html -t html ' + fileName
os.system(command)

该命令前一段时间运行良好,但半天后开始失败,不确定发生了什么,这是错误日志:

Traceback (most recent call last):
  File "/Users/supreet/.pyenv/versions/3.9.1/bin/pdf2txt.py", line 115, in <module>
    if __name__ == '__main__': sys.exit(main(sys.argv))
  File "/Users/supreet/.pyenv/versions/3.9.1/bin/pdf2txt.py", line 96, in main
    device = HTMLConverter(rsrcmgr, outfp, scale=scale,
  File "/Users/supreet/.pyenv/versions/3.9.1/lib/python3.9/site-packages/pdfminer/converter.py", line 277, in __init__
    self.write_header()
  File "/Users/supreet/.pyenv/versions/3.9.1/lib/python3.9/site-packages/pdfminer/converter.py", line 289, in write_header
    self.write('<html><head>\n')
  File "/Users/supreet/.pyenv/versions/3.9.1/lib/python3.9/site-packages/pdfminer/converter.py", line 285, in write
    self.outfp.write(text)
TypeError: write() argument must be str, not bytes

请帮帮我,因为我正在尝试的任何 PDF 都会发生这种情况

标签: pythonpython-3.xpypdf2

解决方案


推荐阅读