首页 > 解决方案 > 引发 TesseractError(proc.returncode, get_errors(error_string))

问题描述

我正在尝试使用 Python 中的 pytesseract 模块从图像中提取文本,但是当我执行下面的代码时,我不断收到错误消息。有人提供了这个答案https://stackoverflow.com/a/54914105/12642523 .....但我仍然遇到同样的错误。有小费吗?

import pytesseract as py
from PIL import Image
cmd = py.pytesseract.tesseract_cmd =r'C:\Users\mortiz\AppData\Local\Programs\Python\Python37-32\Scripts\pytesseract.exe'
img=r"C:\Python\Images to text\databases.jpg"
py.image_to_string(img)

---------------------------------------------------------------------------
TesseractError                            Traceback (most recent call last)
<ipython-input-86-5e06d7c425c6> in <module>
      3 cmd = py.pytesseract.tesseract_cmd =r'C:\Users\mortiz\AppData\Local\Programs\Python\Python37-32\Scripts\pytesseract.exe'
      4 img=r"C:\Python\Images to text\databases.jpg"
----> 5 py.image_to_string(img)

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, config, nice, output_type, timeout)
    346         Output.DICT: lambda: {'text': run_and_get_output(*args)},
    347         Output.STRING: lambda: run_and_get_output(*args),
--> 348     }[output_type]()
    349 
    350 

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in <lambda>()
    345         Output.BYTES: lambda: run_and_get_output(*(args + [True])),
    346         Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 347         Output.STRING: lambda: run_and_get_output(*args),
    348     }[output_type]()
    349 

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
    256         }
    257 
--> 258         run_tesseract(**kwargs)
    259         filename = kwargs['output_filename_base'] + extsep + extension
    260         with open(filename, 'rb') as output_file:

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
    232     with timeout_manager(proc, timeout) as error_string:
    233         if proc.returncode:
--> 234             raise TesseractError(proc.returncode, get_errors(error_string))
    235 
    236 

TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')

标签: pythonimageocrtesseractpython-tesseract

解决方案


您将字符串作为图像而不是图像传递。您必须将 tesseract 调用更改为:

img=r"C:\Python\Images to text\databases.jpg"
py.image_to_string(Image.open(img))

或者,您可以使用 opencv 打开图像。工作正常。

您可以使用 pip install opencv。

pip install opencv-python

安装后,您可以通过以下方式读取图像

import cv2
import pytesseract
image=cv2.imread('path/to/image.jpg')
string=pytesseract.image_to_string(image)

推荐阅读