首页 > 解决方案 > Selenium 只抓取它找到的第一个项目

问题描述

我使用以下代码块来抓取网站

driver = webdriver.Chrome(executable_path=r'C:/Users/USER/Downloads/chromedriver_win32/chromedriver.exe')
url = 'https://mamikos.com/cari/ugm/all/bulanan/0-15000000'
driver.get(url)

kamar = driver.find_elements_by_class_name('kost-rc__content')

for desc in kamar :
    nama = desc.find_element_by_xpath('//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[1]').text
    kecamatan = desc.find_element_by_xpath('//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[2]').text
    harga = desc.find_element_by_xpath('//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[4]/div/div[2]/div/span[1]').text
    print(nama, kecamatan, harga)

运行后,输出似乎只打印该页面的第一个结果。我试图将 xpath 更改为此

for desc in kamar :
    nama = desc.find_element_by_xpath('.//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[1]').text
    kecamatan = desc.find_element_by_xpath('.//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[2]').text
    harga = desc.find_element_by_xpath('.//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[4]/div/div[2]/div/span[1]').text
    print(nama, kecamatan, harga)

但它只会给出一个错误,请帮助。

旁注:谷歌浏览器版本 95.0.4638.69(官方版本)(64 位)和使用的驱动程序是 ChromeDriver 95.0.4638.69

标签: python-3.xselenium-webdriverweb-scrapingxpathwebdriverwait

解决方案


要抓取名称信息价格信息,您可以使用定位器策略

代码块:

driver.get("https://mamikos.com/cari/ugm/all/bulanan/0-15000000")
names = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='kost-rc__info']//span[contains(@class, 'rc-info__name')]")))]
infos = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='kost-rc__info']//span[contains(@class, 'rc-info__location')]")))]
prices = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='rc-price__real']//span[contains(@class, 'rc-price__text')]")))]
for i,j,k in zip(names, infos, prices):
    print(f"Name:{i} Title:{j} Price:{k}")
driver.quit()

控制台输出:

Name:Kost Singgahsini Sakura Karanggayam Sleman Yogyakarta Title:Kecamatan Depok Price:Rp1.370.000
Name:Kost Singgahsini Granada UGM Yogyakarta Title:Kecamatan Depok Price:Rp1.790.000
Name:Kost Kurnia Terban Tipe A UGM Yogyakarta RMZ Title:Kecamatan Gondokusuman Price:Rp606.000
Name:Kost Singgahsini Maleo UGM Kaliurang Yogyakarta Title:Kecamatan Depok Price:Rp1.973.000
Name:Kost AB-AE Tipe B Gejayan Yogyakarta RMZ Title:Depok Price:Rp1.710.000
Name:Kost AB-AE Tipe A Gejayan Yogyakarta RMZ Title:Depok Price:Rp1.425.000
Name:Kost Pogung Familia Tipe C Sleman Yogyakarta RMZ Title:Mlati Price:Rp1.900.000
Name:Kost Pogung Familia Tipe B Sleman Yogyakarta RMZ Title:Mlati Price:Rp1.710.000
Name:Kost Pogung Familia Tipe A Sleman Yogyakarta RMZ Title:Mlati Price:Rp1.425.000
Name:Kost Hanung Tipe B UGM Yogyakarta RMZ Title:Mlati Price:Rp736.000
Name:Kost Apik Tapak Dara Tipe B Deresan Yogyakarta Title:Depok Price:Rp1.620.000
Name:Kost Singgahsini Putri Maoni Tipe A Gejayan Yogyakarta Title:Depok Price:Rp1.520.000
Name:Kost Singgahsini Omah Khiar Tipe F Karang Gayam Yogyakarta Title:Depok Price:Rp1.720.000
Name:Kost Apik Tapak Dara Tipe C Deresan Yogyakarta Title:Kecamatan Depok Price:Rp2.205.000
Name:Kost Singgahsini Putri Maoni Tipe B Gejayan Yogyakarta Title:Depok Price:Rp1.720.000
Name:Kost Wisma Yudhistira Tipe C Mlati Sleman Yogyakarta Title:Mlati Price:Rp2.250.000
Name:Kost Pondok Bugenvil 3 Caturtunggal Depok Sleman Title:Depok Price:Rp1.800.000
Name:Kost Pranasmara 34C Tipe B Depok Sleman Title:Depok Price:Rp1.200.000
Name:Kost Pondok Bugenvil 2 Caturtunggal Depok Sleman Yogyakarta Title:Depok Price:Rp1.800.000
Name:Kost Rahayu Residence Tipe C Depok Sleman Yogyakarta Title:Depok Price:Rp1.150.000

推荐阅读