首页 > 解决方案 > 使用 pypdf2 合并 PDF 页面失败

问题描述

有了这些演示文件

test.pdf:“你好”
tomerge1.pdf:“1”
tomerge2.pdf:“2”

在 中output.pdf,我想拥有:

这是我使用的:

from PyPDF2 import PdfFileWriter, PdfFileReader

outputpdf = PdfFileWriter()
inputpdf = PdfFileReader(open("test.pdf", "rb"))
tomerge1 = PdfFileReader(open("tomerge1.pdf", "rb"))
tomerge2 = PdfFileReader(open("tomerge2.pdf", "rb"))

page = inputpdf.getPage(0)
page.mergePage(tomerge1.getPage(0))
outputpdf.addPage(page)

# exit()
# if we stop here, the output is "Hello 1", which is good
# Why isn't "Hello 1" remembered here?
# del page    # doesn't change anything

page = inputpdf.getPage(0)
page.mergePage(tomerge2.getPage(0))
outputpdf.addPage(page)

with open("output.pdf", "wb") as f:
    outputpdf.write(f)

遗憾的是,它不起作用:输出不是“Hello 1”/“Hello 2”,而是:“Hello 2”/“Hello 2”。

问题:如何有预期的行为?(当有 10 或 20 页时,大小不会增长得很快)

标签: pythonpdfpypdfpypdf2

解决方案


我发现当我在做一个类似的练习时,你需要阅读一次并合并一次。解决方法是为两个阅读器合并的输入文件(“test.pdf”)设置两个阅读器。下面的示例代码:

addressfile = open("Documents/addresses.pdf","rb")
xwfile = "Downloads/input.pdf"
crosswordfile = open(xwfile,"rb")
xword = PdfFileReader(crosswordfile)
xw2 = PdfFileReader(crosswordfile)
addr = PdfFileReader(addressfile)
xwpage = xword.getPage(0)
addpage1 = addr.getPage(1)
addpage2 = addr.getPage(2)
pdfWriter = PdfFileWriter()
xp2 = xw2.getPage(0)
xwpage.mergePage(addpage1)
xp2.mergePage(addpage2)
res = open("/home/paula/xw.pdf",'wb')
pdfWriter.addPage(xwpage)
pdfWriter.addPage(xp2)
pdfWriter.write(res)
res.close()
crosswordfile.close()

所以在你的代码中是:

testfile = open("test.pdf", "rb")
outputpdf = PdfFileWriter()
inputpdf1 = PdfFileReader(testfile)
inputpdf2 = PdfFileReader(testfile)
tomerge1 = PdfFileReader(open("tomerge1.pdf", "rb"))
tomerge2 = PdfFileReader(open("tomerge2.pdf", "rb"))

page1 = inputpdf1.getPage(0)
page1.mergePage(tomerge1.getPage(0))
outputpdf.addPage(page1)

# exit()
# No need stop here, the output will have both "Hello 1" and "Hello 2"
# Using two readers for the same file fools PyPdf2 into thinking they 
# are two different files, i.e. that we are merging from two sperate sources

page2 = inputpdf2.getPage(0)
page2.mergePage(tomerge2.getPage(0))
outputpdf.addPage(page2)

with open("output.pdf", "wb") as f:
    outputpdf.write(f)

推荐阅读