首页 > 解决方案 > 网页抓取新闻网站时出现索引错误

问题描述

我一直在尝试对新闻文章的标题进行网络抓取,但在以下代码中遇到“索引错误”。我只在最后一行代码中遇到问题。

import requests
from bs4 import BeautifulSoup
URL= 'https://www.ndtv.com/coronavirus?pfrom=home-mainnavgation'
r1 = requests.get(URL)
coverpage = r1.content
soup1 = BeautifulSoup(coverpage, 'html5lib')
coverpage_news = soup1.find_all('h3', class_='item-title')
coverpage_news[4].get_text()

这是错误:

IndexError                                Traceback (most recent call last)
<ipython-input-10-f7f1f6fab81c> in <module>
      6 soup1 = BeautifulSoup(coverpage, 'html5lib')
      7 coverpage_news = soup1.find_all('h3', class_='item-title')
----> 8 coverpage_news[4].get_text()

IndexError: list index out of range

标签: pythonweb-scrapingbeautifulsoup

解决方案


用于soup1.select()搜索匹配 CSS 选择器的嵌套元素:

coverpage_news = soup1.select("h3 a.item-title")

这将找到一个a元素class="item-title"的后代h3元素。


推荐阅读