python-3.x - Selenium 只抓取它找到的第一个项目
问题描述
我使用以下代码块来抓取网站
driver = webdriver.Chrome(executable_path=r'C:/Users/USER/Downloads/chromedriver_win32/chromedriver.exe')
url = 'https://mamikos.com/cari/ugm/all/bulanan/0-15000000'
driver.get(url)
kamar = driver.find_elements_by_class_name('kost-rc__content')
for desc in kamar :
nama = desc.find_element_by_xpath('//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[1]').text
kecamatan = desc.find_element_by_xpath('//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[2]').text
harga = desc.find_element_by_xpath('//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[4]/div/div[2]/div/span[1]').text
print(nama, kecamatan, harga)
运行后,输出似乎只打印该页面的第一个结果。我试图将 xpath 更改为此
for desc in kamar :
nama = desc.find_element_by_xpath('.//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[1]').text
kecamatan = desc.find_element_by_xpath('.//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[2]').text
harga = desc.find_element_by_xpath('.//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[4]/div/div[2]/div/span[1]').text
print(nama, kecamatan, harga)
但它只会给出一个错误,请帮助。
旁注:谷歌浏览器版本 95.0.4638.69(官方版本)(64 位)和使用的驱动程序是 ChromeDriver 95.0.4638.69
解决方案
要抓取名称、信息和价格信息,您可以使用定位器策略:
代码块:
driver.get("https://mamikos.com/cari/ugm/all/bulanan/0-15000000")
names = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='kost-rc__info']//span[contains(@class, 'rc-info__name')]")))]
infos = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='kost-rc__info']//span[contains(@class, 'rc-info__location')]")))]
prices = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='rc-price__real']//span[contains(@class, 'rc-price__text')]")))]
for i,j,k in zip(names, infos, prices):
print(f"Name:{i} Title:{j} Price:{k}")
driver.quit()
控制台输出:
Name:Kost Singgahsini Sakura Karanggayam Sleman Yogyakarta Title:Kecamatan Depok Price:Rp1.370.000
Name:Kost Singgahsini Granada UGM Yogyakarta Title:Kecamatan Depok Price:Rp1.790.000
Name:Kost Kurnia Terban Tipe A UGM Yogyakarta RMZ Title:Kecamatan Gondokusuman Price:Rp606.000
Name:Kost Singgahsini Maleo UGM Kaliurang Yogyakarta Title:Kecamatan Depok Price:Rp1.973.000
Name:Kost AB-AE Tipe B Gejayan Yogyakarta RMZ Title:Depok Price:Rp1.710.000
Name:Kost AB-AE Tipe A Gejayan Yogyakarta RMZ Title:Depok Price:Rp1.425.000
Name:Kost Pogung Familia Tipe C Sleman Yogyakarta RMZ Title:Mlati Price:Rp1.900.000
Name:Kost Pogung Familia Tipe B Sleman Yogyakarta RMZ Title:Mlati Price:Rp1.710.000
Name:Kost Pogung Familia Tipe A Sleman Yogyakarta RMZ Title:Mlati Price:Rp1.425.000
Name:Kost Hanung Tipe B UGM Yogyakarta RMZ Title:Mlati Price:Rp736.000
Name:Kost Apik Tapak Dara Tipe B Deresan Yogyakarta Title:Depok Price:Rp1.620.000
Name:Kost Singgahsini Putri Maoni Tipe A Gejayan Yogyakarta Title:Depok Price:Rp1.520.000
Name:Kost Singgahsini Omah Khiar Tipe F Karang Gayam Yogyakarta Title:Depok Price:Rp1.720.000
Name:Kost Apik Tapak Dara Tipe C Deresan Yogyakarta Title:Kecamatan Depok Price:Rp2.205.000
Name:Kost Singgahsini Putri Maoni Tipe B Gejayan Yogyakarta Title:Depok Price:Rp1.720.000
Name:Kost Wisma Yudhistira Tipe C Mlati Sleman Yogyakarta Title:Mlati Price:Rp2.250.000
Name:Kost Pondok Bugenvil 3 Caturtunggal Depok Sleman Title:Depok Price:Rp1.800.000
Name:Kost Pranasmara 34C Tipe B Depok Sleman Title:Depok Price:Rp1.200.000
Name:Kost Pondok Bugenvil 2 Caturtunggal Depok Sleman Yogyakarta Title:Depok Price:Rp1.800.000
Name:Kost Rahayu Residence Tipe C Depok Sleman Yogyakarta Title:Depok Price:Rp1.150.000
推荐阅读
- angular - 如何更改 ng2 智能表列宽?
- vba - 如果单元格包含除数字以外的任何内容,我需要显示一个 msgbox 并退出 sub。现在它给出了一个空白
- python - Django `python manage.py runserver` 不支持 asyncio&aiohttp
- reactjs - 将 React Web 应用程序迁移到 Electron
- c# - SQL Server - 重复键错误
- peg - 具有任意数量限定符的 PEG 解析声明
- azure - 基于 deviceids 触发 azure 函数
- javascript - 使用 Knockout 从 JSON 编写列表元素
- ios - 快速:滚动集合视图
- elasticsearch - 如何查询具有多个条件的字符串?