首页 > 解决方案 > 使用来自 csv 的字符串匹配来使用来自 OCR 的结果在 Python 中查找模式

问题描述

我是 python 新手,我想使用 OCR(字符串)的结果来匹配我的 csv 文件的第一列,然后仅在条件为真时(来自 ocr 的字符串与 csv 中的字符串匹配那么它应该使用图片。一旦我尝试将代码集成在一起,我就会收到错误消息。

对于 OCR,我使用 pytesseract 并且使用 Flask 来呈现 Web 应用程序。

我得到的错误是: AttributeError: '_io.TextIOWrapper' object has no attribute 'filename'

新错误: “upload_image”的视图函数未返回有效响应。该函数要么返回 None ,要么在没有 return 语句的情况下结束。

此错误仅在我尝试添加此代码时仍然存在:

    match = extracted_text
    matched_row = None
    with open("/Users/ri/Desktop/DPL/DPL.csv", "r") as file:
        # Read file as a CSV delimited by tabs.
        reader = csv.reader(file, delimiter='\t')
        for row in reader:
            if row[0] == match:
                matched_row = row
                print(matched_row)

应用程序.py

 @app.route('/', methods=['POST'])
def upload_image():
    if request.method == 'POST':
        # checks whether or not the post request has the file part
        if 'file' not in request.files:
            flash('No file part')
            return redirect(request.url)

        file = request.files['file']

        # if user does not select file, browser also
        # submit a empty part without filename
        if file.filename == '':
            flash('No file selected for uploading')
            return redirect(request.url)

        if file and allowed_file(file.filename):
            filename = secure_filename(file.filename)
            file.save(os.path.join(os.getcwd() +
                                   UPLOAD_INPUT_IMAGES_FOLDER, file.filename))

            flash('File successfully uploaded')

            # calls the ocr_processing function to perform text extraction
            extracted_text = ocr_processing(file)
            print(extracted_text)

            match = extracted_text
            matched_row = None
            with open("/Users/ri/Desktop/DPL/DPL.csv", "r") as f:
                # Read file as a CSV delimited by tabs.
                reader = csv.reader(f, delimiter='\t')
                for row in reader:
                    if row[0] == match:
                        matched_row = row
                        print(matched_row)

                        loaded_vec = CountVectorizer(
                            vocabulary=pickle.load(open("./tfidf_vector.pkl", "rb")))
                        loaded_tfidf = pickle.load(open("./tfidf_transformer.pkl", "rb"))
                        model_pattern_type = pickle.load(
                            open("./clf_svm_Pattern_Category.pkl", "rb"))
                        model_pattern_category = pickle.load(
                            open("./clf_svm_Pattern_Type.pkl", "rb"))
                        match = [match]
                        X_new_counts = loaded_vec.transform(
                            match)
                        # .values.astype('U')
                        X_new_tfidf = loaded_tfidf.transform(X_new_counts)

                        predicted_pattern_type = model_pattern_type.predict(X_new_tfidf)
                        your_predicted_pattern_type = predicted_pattern_type[0]

                        predicted_pattern_category = model_pattern_category.predict(
                            X_new_tfidf)
                        your_predicted_pattern_category = predicted_pattern_category[0]
                        return render_template('uploads/results.html',
                                               msg='Processed successfully!',
                                               match=match,
                                               your_predicted_pattern_category=your_predicted_pattern_category,
                                               your_predicted_pattern_type=your_predicted_pattern_type,
                                               img_src=UPLOAD_INPUT_IMAGES_FOLDER + file.filename)
                        # break
                    else:
                        print("no mattern found")


        else:
            flash('Allowed file types are txt, pdf, png, jpg, jpeg, gif')
            return redirect(request.url)

标签: pythonflaskpython-tesseract

解决方案


推荐阅读