首页 > 解决方案 > Windows Task Sheduler + pytesseract + 多处理有什么问题?

问题描述

拥有带有 pytesseract 和多处理的 python 代码。当我从 PyCharm 手动启动代码时,它适用于任意数量的线程。当我使用带有'threads = 1'的Win Task Sheduler启动代码时,它工作正常。但是,如果我使用“threads=2”或超过 2 个的 Win Task Sheduler 启动代码,它会在不处理图像且没有任何错误的情况下完成。

我有这样的日志消息。脚本启动但不执行任何操作,并且 Win 日志中没有任何错误

2020-05-24 13:09:31,834;START
2020-05-24 13:09:31,834;threads: 2
2020-05-24 13:10:31,832;START
2020-05-24 13:10:31,832;threads: 2
2020-05-24 13:11:31,851;START
2020-05-24 13:11:31,851;threads: 2

代码

from PIL import Image
import pytesseract
from pytesseract import Output
import datetime
from glob import glob
import os
import multiprocessing as multiprocessing
import cv2
import logging

def loggerinit(name, filename, overwrite):

    logger = logging.getLogger(name)
    logger.setLevel(logging.INFO)


    # create the logging file handler
    fh = logging.FileHandler(filename, encoding = 'UTF-8')

    formatter = logging.Formatter('%(asctime)s;%(message)s')
    fh.setFormatter(formatter)

    # add handler to logger object
    logger.addHandler(fh)

    return logger

def getfiles(dirname, mask):
    return glob(os.path.join(dirname, mask))

def tess_file(fname):

    img = cv2.imread(fname)

    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    im_for_T = Image.fromarray(img)

    pytesseract.pytesseract.tesseract_cmd = 'C://Tesseract-OCR//tesseract.exe'
    TESSDATA_PREFIX = 'C:/Program Files/Tesseract-OCR/tessdata'
    try:
        os.environ['OMP_THREAD_LIMIT'] = '1'
        tess_data = pytesseract.image_to_osd(im_for_T, output_type=Output.DICT)
        return fname, tess_data
    except:
        return fname, None

if __name__ == '__main__':
    logger = loggerinit('tess', 'tess.log', False)

    files = getfiles('Croped', '*.jpg')

    t1 = datetime.datetime.now()
    logger.info('START')

    threads = 2
    logger.info('threads: ' + str(threads))
    p = multiprocessing.Pool(threads)

    results = p.map(tess_file,files)
    e = []
    for r in results:
        if type(r) == type(None):
            e.append('OCR error: ' + r)
        else:
            print(r[0],". rotate: ",r[1]['rotate'])

    p.close()
    p.join()

    t2 = datetime.datetime.now()

    delta = (t2 - t1).total_seconds()

    print('Total time: ', delta)
    print('Files: ', len(files))

    logger.info('Files: ' + str(len(files)))
    logger.info('Stop.' + 'Total time: ' + str(delta))

    # Print error if exist
    for ee in e:
        print(ee)

怎么了?我该如何解决这个问题?

标签: windowsmultiprocessingpython-tesseract

解决方案


推荐阅读