python - 为什么我的代码无法从此网页中抓取
问题描述
所以我试图在 python https://journals.sagepub.com/toc/CPS/current
我的主要目标是刮掉那里出现的所有论文的标题。检查页面的检查结构后,我得到了以下代码:
url = "https://journals.sagepub.com/toc/CPS/current"
req = Request(url, headers = { "User-Agent": "Mozilla/5.0"})
webpage = urlopen(req).read()
page_soup = BeautifulSoup(webpage,"html.parser")
nameList = page_soup.findAll("h3", {"class":"heading-title"})
List = []
for name in nameList:
List.append(name.get_text())
nameList
但是,由于某种原因,我的新列表总是空的。我已经在其他页面上使用了这种方法并且我得到了很好的结果,所以我不确定这里缺少什么。
有任何想法吗?
解决方案
似乎urllib
从服务器获得正确结果有问题。尝试requests
模块,它更强大:
import requests
from bs4 import BeautifulSoup
url = "https://journals.sagepub.com/toc/CPS/current"
req = requests.get(url)
page_soup = BeautifulSoup(req.content, "html.parser")
nameList = page_soup.findAll("h3", {"class": "heading-title"})
List = []
for name in nameList:
List.append(name.get_text())
print(List)
印刷:
[
"When Does the Public Get It Right? The Information Environment and the Accuracy of Economic Sentiment",
"Does Affirmative Action Work? Evaluating India’s Quota System",
"Legacies of Resistance: Mobilization Against Organized Crime in Mexico",
"Political Institutions and Coups in Dictatorships",
"Generous to Workers ≠ Generous to All: Implications of European Unemployment Benefit Systems for the Social Protection of Immigrants",
"Drinking Alone: Local Socio-Cultural Degradation and Radical Right Support—The Case of British Pub Closures",
]
推荐阅读
- javascript - 如何使用 Javascript 代码将 XML 转换为 Javascript 对象?
- google-cloud-platform - 在 Dialogflow 中设置输出上下文
- javascript - 检查失败的 HTTP 请求
- c# - AutoCAD C# 版本 .dwl 文件
- sql - Golang 中的多租户
- python - Python - PyQt:QThread 完成后继续
- php - 带有嵌套路由的 PHP REST API
- excel - 日期之间的 Sumproduct 与跨列的附加条件
- jenkins - 当特定 JIRA 项目下的所有问题的状态更改为“IN PROGRESS”状态时触发 Jenkins 构建
- css - SVG 在多个图层上的悬停状态,不仅是顶层