首页 > 解决方案 > 如何将所有维基百科图像添加到我的 docx 文件中?

问题描述

我正在使用 wikipedia api,我想将页面上的所有照片放到 docx 文档中。目前我只能在文档上放一张图片,但这并不好。维基百科的一些页面没有给我任何照片,当我在互联网上搜索时,我可以看到网站上有一些照片。这是我的代码:

import wikipedia
import re
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.shared import Pt
from docx.shared import Mm
import requests
import io
from docx.shared import Inches

name = input("Introdu numele tau: ")
wikipedia.set_lang("ro")
hs = input("La ce liceu esti?\n")
cls = input("In ce clasa esti?\n")
date = input("Pe ce data trebuie facut proiectul?\n")
title = input("Despre ce vrei sa fie proiectul tau?\n")
while True:
    try:
        wiki = wikipedia.page(title)
        break
    except:
        print("Nume proiect invalid")
        title = input("Introdu alt nume de proiect: \n")
text = wiki.content
text = re.sub(r'==', '', text)
text = re.sub(r'=', '', text)
text = re.sub(r'\n', '\n    ', text)
split = text.split('Vezi și', 1)
text = split[0]
print(text)

document = Document()

section = document.sections[0]
section.page_height = Mm(297)
section.page_width = Mm(210)
section.left_margin = Mm(25.4)
section.right_margin = Mm(25.4)
section.top_margin = Mm(25.4)
section.bottom_margin = Mm(25.4)
section.header_distance = Mm(12.7)
section.footer_distance = Mm(12.7)

style = document.styles['Normal']
font = style.font
font.name = 'Times New Roman'
font.size = Pt(12)

url = wiki.images[1]
response = requests.get(url, stream=True)
image = io.BytesIO(response.content)
try:
    document.add_picture(image, width=Inches(1.5))
except:
    pass


paragraph = document.add_paragraph(date)
paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT
paragraph = document.add_paragraph(name)
paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
paragraph = document.add_paragraph('Clasa '+cls)
paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
paragraph = document.add_paragraph(hs)
paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
paragraph = document.add_heading(title, 0)
paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
paragraph = document.add_paragraph('    ' + text)
paragraph.style = document.styles['Normal']
paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT


document.save(title + ".docx")
input()

我认为故障在这里:

url = wiki.images[1]
response = requests.get(url, stream=True)
image = io.BytesIO(response.content)
try:
    document.add_picture(image, width=Inches(1.5))
except:
    pass

因为在 docx 文档上只显示一张图片

标签: pythonwikipediapython-docx

解决方案


我建议您探索Python 中的循环函数。循环使您能够执行某些代码零次或多次,而函数使您可以将一大块代码组合在一起并按名称访问它。在更高级的语言中,这称为抽象

用于此 Wikipedia 目的的循环类似于:

for image in wiki.images:
    document.add_picture(image, ...)

然后如果wiki.images为空,则不会添加图片。如果它有 5 个图像,则将添加所有 5 个图像。

一个函数可能类似于:

def add_wiki_image(document, image_url):
    response = requests.get(image_url, stream=True)
    image = io.BytesIO(response.content)
    document.add_picture(image, width=Inches(1.5)

可以这样称呼:

for image_url in wiki.images:
    add_wiki_image(document, image_url)

作为一个函数,可以在add_wiki_image()任何需要的地方简洁地引用(“调用”)该代码,并且实现该图像添加操作的细节被巧妙地封装在函数定义中。


推荐阅读