首页 > 解决方案 > BeautifulSoup 网页抓取 python。使用 click() 方法的未知错误

问题描述

我想抓取 booking.com 网站的 10 个页面:

这是我的代码:

from bs4 import BeautifulSoup 
from selenium import webdriver
import pandas as pd

url= 'https://www.booking.com/searchresults.en-gb.html?label=gen173nr-1FCAEoggJCAlhYSDNYBGgOiAEBmAEuwgEKd2luZG93cyAxMMgBDNgBAegBAfgBC5ICAXmoAgM&lang=en-gb&sid=aebeca0be36c2e9975167200426f126a&sb=1&src=searchresults&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Fsearchresults.en-gb.html%3Flabel%3Dgen173nr-1FCAEoggJCAlhYSDNYBGgOiAEBmAEuwgEKd2luZG93cyAxMMgBDNgBAegBAfgBC5ICAXmoAgM%3Bsid%3Daebeca0be36c2e9975167200426f126a%3Btmpl%3Dsearchresults%3Bclass_interval%3D1%3Bdest_id%3D-390625%3Bdest_type%3Dcity%3Bdtdisc%3D0%3Bfrom_sf%3D1%3Bgroup_adults%3D2%3Bgroup_children%3D0%3Binac%3D0%3Bindex_postcard%3D0%3Blabel_click%3Dundef%3Bno_rooms%3D1%3Boffset%3D0%3Bpostcard%3D0%3Braw_dest_type%3Dcity%3Broom1%3DA%252CA%3Bsb_price_type%3Dtotal%3Bshw_aparth%3D1%3Bslp_r_match%3D0%3Bsrc%3Dindex%3Bsrc_elem%3Dsb%3Bsrpvid%3D9fe26fd0b0e90202%3Bss%3DMadrid%3Bss_all%3D0%3Bssb%3Dempty%3Bsshis%3D0%3Bssne%3DMadrid%3Bssne_untouched%3DMadrid%26%3B&ss=Madrid&is_ski_area=0&ssne=Madrid&ssne_untouched=Madrid&city=-390625&checkin_monthday=5&checkin_month=12&checkin_year=2018&checkout_monthday=6&checkout_month=12&checkout_year=2018&group_adults=2&group_children=0&no_rooms=1&from_sf=1'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
driver = webdriver.Chrome(r"C:\Users\yefida\Desktop\Study_folder\Online_Courses\1DONE\python mega course\Project 6 - Web Scraping\chromedriver.exe")
driver.get(url)


soup = BeautifulSoup(html, 'html.parser')

pages = [str(i) for i in range(10)]
df = []
for page in pages:
    html = driver.page_source
    soup = BeautifulSoup(html, 'html.parser')
    data = soup.find_all('div',{'class':'sr_item_content sr_item_content_slider_wrapper '})
    list_data = []
    for item in data:
        temp = {}
        temp['Title'] = item.find('span',{'class':'sr-hotel__name'}).text.replace('\n','')
        temp['Address'] = item.find('div',{'class':'address'}).text.replace('\n','').lstrip(' ').partition(',')[0]

        list_data.append(temp)
    #next page:
    df.append(pd.DataFrame(list_data,columns = list_data[0].keys()))
    driver.find_element_by_xpath('//*[@id="search_results_table"]/div[4]/div[1]/ul/li[3]/a').click()

data_frame_new = pd.concat(df)
data_frame_new.reset_index(drop=True, inplace=True)

但最后我得到一个错误,这与硒的点击方法有关:

编辑: WebDriverException:消息:未知错误:元素 ... 在点 (670、662) 处不可点击。其他元素将收到点击:...(会话信息:chrome=70.0.3538.102)(驱动程序信息:chromedriver=2.42.591088(7b2b2dca23cca0862f674758c9a3933e685c27d5),平台=Windows NT 10.0.17134 x86_64)

正如其他用户所建议的那样,我使用 find_element_by_xpath 作为 click 方法。但是很难理解上面的错误消息出了什么问题。我该如何解决这个问题?

标签: pythonselenium-webdriverweb-scrapingbeautifulsoup

解决方案


推荐阅读