python - 阅读 PDF 格式的电影剧本并以正确的格式显示
问题描述
我正在尝试从 PDF 中读取电影脚本,并在终端中显示正确的换行符,但是逐行读取它似乎不起作用......
import PyPDF2
from PyPDF2 import PdfFileReader
import os.path
PDFInput = "Pulp-Fiction.pdf"
#checking if good
if "pdf" not in (PDFInput[-3:]):
print("This isn't a pdf...")
exit()
if os.path.isfile(PDFInput):
print("it exists!")
else:
print("This doesn't exist...")
exit()
#opening file
with open(PDFInput, "rb") as PDFOpened: #this is my file object
pdf = PdfFileReader(PDFOpened) #pdf is my reader object
print ("This is the document infor: ", pdf.getNumPages())
NumOfPage= pdf.getNumPages()
print ("This is the document infor: ", pdf.getDocumentInfo())
#eventually NumOfPage would go here but to keep compiling small
for i in range(2):
pdfPage = pdf.getPage(i) #page object
print ("page no: ", i)
text = pdfPage.extractText().split(' ') #extracting and spliting text from page
for i in range (len(text)): #lines stored in list and printed seprating
print(text[0], end="\n")
解决方案
推荐阅读
- json-ld - Google Rich Results Test 中的值类型“@context”不正确
- javafx - jpackage后找不到Ikonli类
- ios - UITabBarItem 可访问性语言不起作用
- python - Python CSV 文件到字节或可搜索的文件类对象
- javascript - 有没有办法判断一个javascript函数是否使用了一个rest参数?
- reactjs - 语义 UI React 复选框居中
- kubernetes - 指标服务器和 hpa 延迟问题
- python - 时间数据与格式不匹配
- azure-devops - 如何删除(或隐藏)Azure DevOps 中唯一的 Git 存储库?
- docker - 缺少 gdal-3.3.0-r5 所需的 libkml