首页 > 解决方案 > 阅读 PDF 格式的电影剧本并以正确的格式显示

问题描述

我正在尝试从 PDF 中读取电影脚本,并在终端中显示正确的换行符,但是逐行读取它似乎不起作用......

import PyPDF2
from PyPDF2 import PdfFileReader
import os.path

PDFInput = "Pulp-Fiction.pdf"

#checking if good
if "pdf" not in (PDFInput[-3:]):
    print("This isn't a pdf...")
    exit()

if os.path.isfile(PDFInput):
    print("it exists!")
else:
    print("This doesn't exist...")
    exit()


#opening file
with open(PDFInput, "rb") as PDFOpened: #this is my file object
    pdf = PdfFileReader(PDFOpened) #pdf is my reader object
    print ("This is the document infor: ", pdf.getNumPages())
    NumOfPage= pdf.getNumPages()
    print ("This is the document infor: ", pdf.getDocumentInfo())
    
    #eventually NumOfPage would go here but to keep compiling small
    for i in range(2): 
        pdfPage = pdf.getPage(i) #page object
        print ("page no: ", i)
        text = pdfPage.extractText().split('  ') #extracting and spliting text from page
    
    for i in range (len(text)): #lines stored in list and printed seprating
        print(text[0], end="\n")

标签: pythonpdfpypdf2

解决方案


推荐阅读