python - 使用来自 csv 的字符串匹配来使用来自 OCR 的结果在 Python 中查找模式
问题描述
我是 python 新手,我想使用 OCR(字符串)的结果来匹配我的 csv 文件的第一列,然后仅在条件为真时(来自 ocr 的字符串与 csv 中的字符串匹配那么它应该使用图片。一旦我尝试将代码集成在一起,我就会收到错误消息。
对于 OCR,我使用 pytesseract 并且使用 Flask 来呈现 Web 应用程序。
我得到的错误是: AttributeError: '_io.TextIOWrapper' object has no attribute 'filename'
新错误: “upload_image”的视图函数未返回有效响应。该函数要么返回 None ,要么在没有 return 语句的情况下结束。
此错误仅在我尝试添加此代码时仍然存在:
match = extracted_text
matched_row = None
with open("/Users/ri/Desktop/DPL/DPL.csv", "r") as file:
# Read file as a CSV delimited by tabs.
reader = csv.reader(file, delimiter='\t')
for row in reader:
if row[0] == match:
matched_row = row
print(matched_row)
应用程序.py
@app.route('/', methods=['POST'])
def upload_image():
if request.method == 'POST':
# checks whether or not the post request has the file part
if 'file' not in request.files:
flash('No file part')
return redirect(request.url)
file = request.files['file']
# if user does not select file, browser also
# submit a empty part without filename
if file.filename == '':
flash('No file selected for uploading')
return redirect(request.url)
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
file.save(os.path.join(os.getcwd() +
UPLOAD_INPUT_IMAGES_FOLDER, file.filename))
flash('File successfully uploaded')
# calls the ocr_processing function to perform text extraction
extracted_text = ocr_processing(file)
print(extracted_text)
match = extracted_text
matched_row = None
with open("/Users/ri/Desktop/DPL/DPL.csv", "r") as f:
# Read file as a CSV delimited by tabs.
reader = csv.reader(f, delimiter='\t')
for row in reader:
if row[0] == match:
matched_row = row
print(matched_row)
loaded_vec = CountVectorizer(
vocabulary=pickle.load(open("./tfidf_vector.pkl", "rb")))
loaded_tfidf = pickle.load(open("./tfidf_transformer.pkl", "rb"))
model_pattern_type = pickle.load(
open("./clf_svm_Pattern_Category.pkl", "rb"))
model_pattern_category = pickle.load(
open("./clf_svm_Pattern_Type.pkl", "rb"))
match = [match]
X_new_counts = loaded_vec.transform(
match)
# .values.astype('U')
X_new_tfidf = loaded_tfidf.transform(X_new_counts)
predicted_pattern_type = model_pattern_type.predict(X_new_tfidf)
your_predicted_pattern_type = predicted_pattern_type[0]
predicted_pattern_category = model_pattern_category.predict(
X_new_tfidf)
your_predicted_pattern_category = predicted_pattern_category[0]
return render_template('uploads/results.html',
msg='Processed successfully!',
match=match,
your_predicted_pattern_category=your_predicted_pattern_category,
your_predicted_pattern_type=your_predicted_pattern_type,
img_src=UPLOAD_INPUT_IMAGES_FOLDER + file.filename)
# break
else:
print("no mattern found")
else:
flash('Allowed file types are txt, pdf, png, jpg, jpeg, gif')
return redirect(request.url)
解决方案
推荐阅读
- python - 我需要用多线程的 API 调用结果填充一个列表
- jquery - 将 Jquery 中的提示变量传递给 Flask
- python - dtype 为 str 的 numpy 数组不能与自身求和?
- jquery - jquery 对象在第二个`.forEach()` 中返回未定义
- node.js - 无法实现返回值的承诺 - NodeJS
- performance - Julia 类型不稳定性:LinearInterpolations 数组
- java - JPQL 查询返回自定义 DTO,来自父实体及其嵌套/子实体,带有 NULLABLE NESTED ENTITY
- unit-testing - ResultFilterAttribute 单元测试没有看到更改的结果
- multithreading - 为什么使用原子时需要加载
- ios - iOS离线时超时(使用waitsForConnectivity)