首页 > 解决方案 > 如何添加多个要转换为excel的pdf?

问题描述

我有将 pdf 转换为 excel 的程序,现在我想添加多个输入,即多个 pdf 将被一一转换。

我的代码如下:

from PIL import Image
import io
import pytesseract
from wand.image import Image as wi
import os
import cv2
import pandas as pd
import re
import numpy as np
import os

pdf = wi(filename= "pdfs/jaalna.pdf", resolution =300)
pdfImage = pdf.convert("jpg")

imageBlobs = []
for img in pdfImage.sequence:
    imgPage = wi(image = img)
    #img.filter(ImageFilter.EDGE_ENHANCE_MORE )
    imageBlobs.append(imgPage.make_blob('jpg'))
    recognized_text = []

for imgBlob in imageBlobs:
     im = Image.open(io.BytesIO(imgBlob))
     text = pytesseract.image_to_string(im, lang = 'eng1+mar1')
     recognized_text.append(text)

newfile = open('aama.txt','w')
newfile.write(",".join(recognized_text))

#add a folder as input.

标签: pythonpython-3.xpdf

解决方案


你可以使用循环

for name in ["pdfs/jaalna.pdf", "other/file.pdf"]:
    pdf = wi(filename=name, resolution=300)
    # rest of code

或者您可以使用sys.argv来获取名称

script.py pdfs/jaalna.pdf other/file.pdf other/third.pdf

和代码

import sys

for name in sys.argv[1:]:
    pdf = wi(filename=name, resolution=300)
    # rest of code

推荐阅读