首页 > 解决方案 > CMake tesseract 错误:“文件下载哈希不匹配”| ocrmypdf

问题描述

系统:Windows 7 专业版

我正在尝试在 python 中运行 ocrmypdf,但收到错误消息:

    raise MissingDependencyError(
ocrmypdf.exceptions.MissingDependencyError: 
        ---------------------------------------------------------------------
        This error normally occurs when ocrmypdf can't find the Leptonica
        library, which is usually installed with Tesseract OCR. It could be that
        Tesseract is not installed properly, we can't find the installation
        on your system PATH environment variable.

        The library we are looking for is usually called:
            liblept-5.dll   (Windows)
            liblept*.dylib  (macOS)
            liblept*.so     (Linux/BSD)

        Please review our installation procedures to find a solution:
            https://ocrmypdf.readthedocs.io/en/latest/installation.html

我决定安装tesseract,为此我需要从这个库构建文件。我正在使用 Cmake 来做到这一点,已经为 Tesseract 构建了 Leptonica 和 TiFF。

但是当我试图在 Cmake 中“配置”tesseract 时,我收到了这个错误:

 CMake Error at training/CMakeLists.txt:40 (file):
  file DOWNLOAD HASH mismatch

    for file: [tesseract-3.05.01/build_win64/training/icu/icu64.zip]
      expected hash: [480c72491576c048de]
        actual hash: [db340097e390be978d]
             status: [0;"No error"]

我也改变了tesseract\training\CMakeLists.txt这一行:

"http://download.icu-project.org/files/icu4c/56.1/icu4c-56_1-Win${ARCH_DIR_NAME}-msvc10.zip"to :https://github.com/unicode-org/icu/releases/download/release-68-2/icu4c-68_2-Win64-MSVC2019.zip因为这个 zip 文件有问题。

任何想法如何解决它?谢谢!

标签: pythoncmaketesseract

解决方案


这对将来的某人可能很重要:

要解决这个问题,你不需要自己编译这些库,我liblept-5.dll从这里安装 tesseract 时发现:

https://github.com/UB-Mannheim/tesseract/wiki

使用最新的安装程序


推荐阅读