首页 > 解决方案 > 为什么我会在 python 中遇到这个网络抓取问题?

问题描述

**现在我正在使用 python 进行我的第二个网络抓取项目。我遇到的问题是我无法从网站上提取航班价格(网站将在代码中,指向正确方向的点会很棒。**

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


from bs4 import BeautifulSoup
from selenium import webdriver
chromedriver_path = "/usr/bin/chromedriver"

browser = webdriver.Chrome(chromedriver_path)

sats = 'https://www.google.com/travel/explore?tfs=CBsQAxooagwIAhIIL20vMGZyMHQSCjIwMjAtMDQtMjByDAgEEggvbS8wMTYwdxooagwIBBIIL20vMDE2MHcSCjIwMjAtMDQtMjdyDAgCEggvbS8wZnIwdHABQAFIAQ&curr=USD&gl=us&hl=en&authuser=0&origin=https%3A%2F%2Fwww.google.com'
browser.get(sats)
browser.title

browser.save_screenshot('/home/UrbanGuide/Desktop/test_flights.png')


soup = BeautifulSoup(browser.page_source, "html5lib")


cards = soup.select('div[class*=tsAU4e]')
cards[0]

print(card.select('h3')[0].text)
print(card.select('span[class*=price]')[0].text)
#the line of code above gives me the error message:IndexError                                #Traceback (most recent call last)
#<ipython-input-173-c949e249e30b> in <module>
#      2 for card in cards:
 #     3     print(card.select('h3')[0].text)
#----> 4     print(card.select('span[class*=price]')[0].text)

#IndexError: list index out of range

标签: pythonseleniumweb-scrapingbeautifulsoupselenium-chromedriver

解决方案


cards[0](我假设你打算分配card给它)

card = """
<div class="tsAU4e ">
 <div class="wIuJz">
  <h3 class="W6bZuc YMlIz">Nassau</h3>
  <div class="ZjDced CQYfx">
   <img alt="American" class="C5fbBf" height="16" 
src="//www.gstatic.com/flights/airline_logos/70px/AA.png" width="16"/>
   <span class="nx0jzf">1 stop</span>
   <span class="qeoz6e U325Rc"></span>
   <span class="Xq1DAb">6 hr 25 min</span>
  </div>
 </div>
 <div class="Q70fcd sSHqwe">
  <div class="MJg7fb">
   <span class="QB2Jof xLPuCe">
    <span aria-label="293 US dollars" data-gs="CidHSUNLQUJHLS0tLS0tLS0tcGZleTIxQUFBQUFGNTFhcTRONExSQUESATAaCwif5AEQAhoDVVNEKgoyMDIwLTA0LTIwMgoyMDIwLTA0LTI3OAhKBAgBEAE=">$293
    </span>
   </span>
  </div>
 </div>
</div>
"""

鉴于此,card.select('span[class*=price]')将返回一个空列表,因此索引列表的第 0 个元素将失败

在尝试提取信息之前,请尝试更仔细地检查每个元素。这样您就可以看到您实际需要搜索的内容


推荐阅读