python-3.x - 如何修复 - TypeError:int() 参数必须是字符串、类似字节的对象或数字,而不是“PSKeyword”?
问题描述
我正在尝试使用 pdfminer 从 pdf 文件中提取文本,但我遇到了这个问题,但仅限于某些文件。该代码在某些 pdf 上运行良好,但会为其他人返回此错误消息。这是我的代码(我从这个论坛的其他线程复制过来的):
import io
from pdfminer.converter import TextConverter
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfpage import PDFPage
def extract_text_from_pdf(pdf_path):
resource_manager = PDFResourceManager()
fake_file_handle = io.StringIO()
converter = TextConverter(resource_manager, fake_file_handle)
page_interpreter = PDFPageInterpreter(resource_manager, converter)
with open(pdf_path, 'rb') as fh:
for page in PDFPage.get_pages(fh,
caching=True,
check_extractable=True):
page_interpreter.process_page(page)
text = fake_file_handle.getvalue()
# close open handles
converter.close()
fake_file_handle.close()
if text:
return text
if __name__ == '__main__':
print(extract_text_from_pdf('test.pdf'))*
这是我得到的错误:
Traceback (most recent call last):
File "pdf.py", line 28, in <module>
print(extract_text_from_pdf('test.pdf'))
File "pdf.py", line 13, in extract_text_from_pdf
for page in PDFPage.get_pages(fh,
File "C:\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdfminer\pdfpage.py", line 129, in get_pages
doc = PDFDocument(parser, password=password, caching=caching)
File "C:\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdfminer\pdfdocument.py", line 566, in __init__
xref.load(parser)
File "C:\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdfminer\pdfdocument.py", line 195, in load
(_, obj) = parser.nextobject()
File "C:\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdfminer\psparser.py", line 616, in nextobject
self.do_keyword(pos, token)
File "C:\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdfminer\pdfparser.py", line 79, in do_keyword
(objid, genno) = (int(objid), int(genno))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'PSKeyword'
我一直在尝试寻找解决此问题的方法,但尚未成功。感谢帮助!多谢你们。
解决方案
推荐阅读
- python - Hackerrank Caesar Cipher Python 解决方案
- google-apps-script - 在谷歌表格上运行谷歌应用脚本时,它是运行一个应用脚本文件,还是可以运行多个文件?
- javascript - React Native 倒数计时器减少了文本输入焦点
- json - 更改 minecraft 版本的显示名称
- javascript - Next JS API Routes 在客户端添加查询
- typescript - 预期的参数,但在 TypeScript 的扩展类调用中得到 0
- windows - Visual Studio:错误 DEP0700:应用注册失败。[0x80073D1F]
- javascript - 如何减去特定的部分时间javascript momentjs
- wordpress - 如何存储多个用户条目
- c# - c#中类型或成员的声明顺序