python - 如何刮第二没有唯一显着标记的同类标签
问题描述
我正在尝试从代码中读取第二个 div 类的内容: div class="eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top- 1 eds-event-card-content__sub--cropped">使用 python 3 从 RM15.75 开始
<div class="eds-event-card-content__sub-content">
<div class="eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1
eds-event-card-content__sub--cropped">
<div class="card-text--truncated__one">Found8 KL Sentral • Kuala Lumpur, Kuala
Lumpur</div>
</div>
<div class="eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1
eds-event-card-content__sub--cropped">Starts at RM15.75</div></div>
我的python代码:
url = 'https://www.eventbrite.com/d/malaysia--kuala-lumpur--85675181/all-events/?page=2'
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
# Select all the 20 event containers from a single page
event_containers = html_soup.find_all('div', class_='search-event-card-square-image')
# Getting price of ticket
price = container.find_all('div', class_= "eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1 eds-event-card-content__sub--cropped").text
print("price: ", price[1])
但是我的代码不起作用,它给了我输出:
IndexError: list index out of range
但我想要
Starts at RM15.75
谁能帮我这个?谢谢
解决方案
我在 html 源代码中看不到任何价格。我猜它们是使用 js 脚本生成的。
因此,对于这种情况,您需要使用 Selenium。
代码:
# import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
from webdriver_manager.chrome import ChromeDriverManager
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options)
driver.set_window_size(1024, 600)
driver.maximize_window()
url = 'https://www.eventbrite.com/d/malaysia--kuala-lumpur--85675181/all-events/?page=2'
# response = requests.get(url)
driver.get(url)
time.sleep(4)
html_soupdf = BeautifulSoup(driver.page_source, 'html.parser')
# Select all the 20 event containers from a single page
event_containers = html_soup.find('ul', class_='search-main-content__events-list')
for event in event_containers.find_all('li'):
event_time = event.find('div', class_= "eds-text-color--primary-brand eds-l-pad-bot-1 eds-text-weight--heavy eds-text-bs").text
event_name = event.find('div', class_= "eds-event-card__formatted-name--is-clamped eds-event-card__formatted-name--is-clamped-three eds-text-weight--heavy").text
event_price_place = event.find('div', class_ = "eds-event-card-content__sub-content")
event_pp = event_price_place.find_all('div')
event_place = event_pp[0].text
try:
event_price = event_pp[2].text
except:
event_price = None
print(f"{event_name}\n{event_time}\n{event_place}\n{event_price}\n\n")
结果:
KL International Flea Market 2020 / Bazaar Antarabangsa Kuala Lumpur
Mon, Oct 5, 10:00 AM
VIVA Shopping Mall • Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur
Free
FGTSD Physical Church Service
Sun, Jul 19, 9:30 AM + 105 more events
Full Gospel Tabernacle Sri Damansara • Kuala Lumpur
Free
EFE 2020 - 16th Export Furniture Exhibition Malaysia
Thu, Aug 27, 9:00 AM
Kuala Lumpur Convention Centre • Kuala Lumpur, Kuala Lumpur
Free
International Beauty Expo (IBE) 2020
Sat, Sep 12, 11:00 AM
Malaysia International Trade and Exhibition Centre • Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur
Free
Learn How To Earn USD3500 In 4 Week Using Your SmartPhone
Today at 8:00 PM + 2 more events
KL Online Event • Kuala Lumpur, Bangkok
None
Turn Customers into Raving Fans of Your Brand via Equity Crowdfunding
Thu, Aug 27, 4:00 PM
Found8 KL Sentral • Kuala Lumpur, Kuala Lumpur
Starts at RM15.75
.
.
.
.
.
编辑:
我添加了使其无标题的选项。
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options)
推荐阅读
- jquery - 在 HTML 选择中无法预选来自数据库的值
- python - 如果添加到部分,则“出现错误”消息失败,并且 error.log 列出了第一个属性
- python - 打开多个文本文件并阅读直到特定字符
- android - 需要 com.google.android.gms:play-services-base 吗?
- css - 高 DPI 显示器上 div 背景和边框之间变化 0-1px 的间隙
- java - 我的应用下载量意外减少了 10 倍以上
- ruby-on-rails - Bitbucket Pipelines 使用 Capistrano 的设备错误的不适当 ioctl
- web-applications - 解耦的 Web 应用程序的反面是什么?
- firebase - 如何从firebase函数中的firebase存储中获取图像的尺寸?
- php - 拆分日期和时间然后放入其他列