python - 从多个网站提取文本
问题描述
from bs4 import BeautifulSoup
import re
import urllib2
import urllib
list_open = open("weblist.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
for url in line_in_list:
Beautiful = urllib2.urlopen(url).read()
beautiful
soup = bs4.BeautifulSoup(beautiful)
for news in soup:
print soup.getText()
以下代码帮助我从多个网站(weblist.txt)中提取文本
但是当我的网络列表包含任何不使用此代码打开的链接或网站时,它会立即停止并且不检查进一步的链接。假设如果我有 10 个链接,而第二个链接未打开或无法解析,它会给出错误并在该链接中停止而不检查进一步的链接。我希望它应该检查 weblist 中的每个链接(从开始到结束)并从中提取文本所有那些真实的或能够解析的链接。
解决方案
只需添加一个 try except 语句,如下所示:
for url in line_in_list:
try:
Beautiful = urllib2.urlopen(url).read()
beautiful
soup = bs4.BeautifulSoup(beautiful)
for news in soup:
print soup.getText()
except Exception as e:
#Error handling
print(e)
推荐阅读
- c++ - 什么时候应该内联函数?
- sql - DATEDIFF 日期与 EXISTS/IN 另一个日期表
- node.js - 我最近在将 Angular 项目 9.0.7 更新到 9.1.9 时遇到错误
- javascript - Vue.js
- 默认排序(或根本不排序)/手动排序 - kubernetes - replicaof directive not allowed in cluster mode
- javascript - Discord.js problem id member can't sent if is it bot or else
- graphviz - Graphviz - 固定节点起点和终点
- angular - How to make synchronous HTTP request in Angular 8 or 9 (make a request and wait)
- javascript - Could I use splice() method without create it to new array?
- node.js - Does the crypto Module on NodeJS support PKCS12?