python - pytesseract 给出错误 PermissionError: [WinError 5] Access is denied
问题描述
我在 Python 中使用 pytesseract 来获取 pdf。但是我在 Windows 10 中遇到权限错误。我已经从https://github.com/UB-Mannheim/tesseract/wiki安装了 tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe 我也有poppler-20.09.0 文件。我正在使用 python 3.8.0
import pdf2image
import PyPDF2
import os
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR'
def pdf_to_img(pdf_file):
print('pdf_file = ', pdf_file)
return pdf2image.convert_from_path(pdf_file, dpi=200, fmt='jpg',
poppler_path=r'F:\lokesh\resume_script\poppler-20.09.0\bin')
def ocr_core(file):
text = pytesseract.image_to_string(file,)
return text
def print_pages(pdf_file):
images = pdf_to_img(pdf_file)
for pg, img in enumerate(images):
print(ocr_core(img))
print_pages("aa.pdf")
当我运行这段代码时。它给出了这个错误。
Traceback (most recent call last):
File "test.py", line 84, in <module>
print_pages("aa.pdf")
File "test.py", line 81, in print_pages
print(ocr_core(img))
File "test.py", line 74, in ocr_core
text = pytesseract.image_to_string(file,)
File "F:\python\lib\site-packages\pytesseract\pytesseract.py", line 344, in image_to_string
return {
File "F:\python\lib\site-packages\pytesseract\pytesseract.py", line 347, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "F:\python\lib\site-packages\pytesseract\pytesseract.py", line 258, in run_and_get_output
run_tesseract(**kwargs)
File "F:\python\lib\site-packages\pytesseract\pytesseract.py", line 229, in run_tesseract
raise e
File "F:\python\lib\site-packages\pytesseract\pytesseract.py", line 226, in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
File "F:\python\lib\subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "F:\python\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
PermissionError: [WinError 5] Access is denied
我们如何解决windows中的这个错误
解决方案
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR'
需要是
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
推荐阅读
- linux - 如何使用 libsecret 存储多个 Github 帐户(不是 repos)的凭据
- python-3.x - 如何正确解析/解码这个字节内容以获得它在pyhon中的等效字符串?
- javascript - Ext.grid.column.Boolean 文本颜色
- reactjs - reactjs 如何在表格中显示过滤后的数据?
- python - 如何融合两个(或更多)数组而不丢失numpy中的值
- reactjs - 无效查询不起作用 [React-Query]
- python-3.x - 如何在命令处理程序中创建子文件夹?
- node.js - Patch or modify "require()" in Node for worker threads
- php - 试图创建刀片文件的路径
- mongodb - 在 Mongodb 中记录过去 60 分钟内的所有查询