首页 > 解决方案 > 如何将我的网页抓取结果保存到美丽汤中的文本文件中?

问题描述

我试图为一些网络抓取编写此代码。代码运行良好,但我仍然无法弄清楚如何将网络抓取的结果保存到 .txt 文件中?我想将“print(div.text)”输出的结果写入 .txt 文件。

import bs4 as bs
import urllib.request

for pg in range(1, 100 + 1):
    source = urllib.request.urlopen('https://dsalsrv04.uchicago.edu/cgi-bin/app/hayyim_query.py?page='+ str(pg)).read()
    soup = bs.BeautifulSoup(source,'lxml')
    for div in soup.find_all('div', class_='hw_result'):
        print(div.text)

标签: pythonweb-scrapingbeautifulsoupsavefile-handling

解决方案


也许,与f.open,f.writef.close:

import bs4 as bs
import urllib.request
import re

output = ''
for pg in range(1, 100 + 1):
    source = urllib.request.urlopen('https://dsalsrv04.uchicago.edu/cgi-bin/app/hayyim_query.py?page='+ str(pg)).read()
    soup = bs.BeautifulSoup(source,'lxml')
    for div in soup.find_all('div', class_='hw_result'):
        output += div.text

output = re.sub(r"[\r\n]+", "", output)

f = open('/any/directory_you_like/any_name_that_you_like_with_any_extension.txt', 'w')
try:
    f.write(output)
finally:
    f.close()

推荐阅读