python - 使用 Selenium 刮取无限滚动问题
问题描述
我无法抓取每个手提包的名称、价格和颜色。该网站是:https ://www.coach.com/shop/women-handbags
我已经尝试过不同的抓取工具,以及将抓取信息放在 while 循环的不同部分。
提供的代码是在 while 循环滚动整个页面然后返回到最顶部之后。
products = driver.find_elements_by_xpath('/html/body/div[1]/div[8]/div[4]/div/div/div/div[1]/div[1]/div')
for product in products:
bag_dict = {}
try:
name = product.find_element_by_tag_name('a').text
price = thing.find_element_by_xpath('.//span[@class="price-sales"]').text
bag_dict['name'] = name
bag_dict['price'] = price
except:
continue
print(bag_dict)
我收到一个空字典或一条错误消息,指出未找到 bag_dict。
解决方案
发现网站发出的请求以 24 个为一组加载手提包,此代码将遍历所有组,然后将每个手提包的价格和名称存储在数据框中。Selenium 不是必须的,我使用了 requests 和 beautifulsoup。
代码
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
handbags = pd.DataFrame()
for next_set in range(0, 481, 24):
payload = f'start={next_set}&format=page-element'
r = requests.get('https://www.coach.com/shop/women-handbags', params = payload)
soup = BeautifulSoup(r.text, 'html.parser')
names = [name.meta['content'] for name in soup.find_all(class_="product-name")]
prices = [price.find('span', {'data-sales-price': re.compile(r'\d+\.\d+')})['data-sales-price'] for price in soup.find_all(class_="product-price")]
temp_df = pd.DataFrame({'Names': names, 'Prices': prices})
handbags = handbags.append(temp_df).reset_index(drop=True)
print("Appended next set")
print(handbags)
输出
Names Prices
0 TROUPE TOTE IN COLORBLOCK 695.0
1 TROUPE TOTE IN COLORBLOCK WITH SNAKESKIN DETAIL 750.0
2 TROUPE TOTE 695.0
3 TROUPE TOTE WITH KAFFE FASSETT PRINT 795.0
4 TROUPE TOTE IN SIGNATURE CANVAS WITH PATCHWORK... 895.0
5 TROUPE TOTE IN SIGNATURE CANVAS WITH KAFFE FAS... 795.0
6 TROUPE TOTE IN SIGNATURE CANVAS 695.0
7 TROUPE CARRYALL WITH CROCODILE DETAIL 1100.0
8 TROUPE CARRYALL 595.0
9 TROUPE CARRYALL IN SIGNATURE CANVAS 595.0
10 TROUPE CARRYALL 35 IN COLORBLOCK WITH SNAKESKI... 850.0
11 TROUPE CARRYALL 35 IN SIGNATURE CANVAS WITH KA... 995.0
12 TROUPE SHOULDER BAG WITH KAFFE FASSETT PRINT 550.0
13 TROUPE CROSSBODY WITH KAFFE FASSETT PRINT 595.0
14 TROUPE CROSSBODY 495.0
15 TROUPE CROSSBODY IN SIGNATURE CANVAS 495.0
16 TABBY TOP HANDLE IN COLORBLOCK SNAKESKIN 695.0
17 TABBY TOP HANDLE IN COLORBLOCK 550.0
18 TABBY TOP HANDLE IN COLORBLOCK 550.0
19 TABBY TOP HANDLE 550.0
20 TABBY TOP HANDLE IN SIGNATURE CANVAS WITH KAFF... 650.0
21 TABBY SHOULDER BAG 26 IN SIGNATURE CANVAS WITH... 450.0
22 TABBY SHOULDER BAG 26 IN SNAKESKIN 650.0
23 TABBY SHOULDER BAG 26 IN COLORBLOCK WITH SNAKE... 450.0
24 TABBY SHOULDER BAG 26 IN COLORBLOCK 350.0
25 TABBY SHOULDER BAG 26 IN COLORBLOCK WITH SNAKE... 450.0
26 TABBY SHOULDER BAG 26 350.0
27 TABBY SHOULDER BAG 26 350.0
28 TABBY SHOULDER BAG WITH KAFFE FASSETT PRINT 550.0
29 TABBY SHOULDER BAG IN SNAKESKIN 595.0
.. ... ...
439 DINKY CHAIN STRAP 35.0
440 NOVELTY STRAP 95.0
441 NOVELTY STRAP 50.0
442 STRAP IN SIGNATURE CANVAS 95.0
443 STRAP IN SNAKESKIN 150.0
444 STRAP WITH CHAIN 150.0
445 STRAP WITH WAVE PATCHWORK AND SNAKESKIN DETAIL 150.0
446 CASSIE CROSSBODY 350.0
447 NOVELTY STRAP WITH TEA ROSE AND TOOLING 150.0
448 CENTRAL TOTE WITH ZIP 295.0
449 DREAMER WRISTLET 175.0
450 DREAMER WRISTLET IN COLORBLOCK 175.0
451 DREAMER WRISTLET IN SIGNATURE CANVAS 175.0
452 DREAMER WRISTLET WITH SNAKESKIN DETAIL 225.0
453 RIVINGTON CONVERTIBLE POUCH 250.0
454 RIVINGTON CONVERTIBLE POUCH IN SIGNATURE CANVAS 250.0
455 ROGUE POUCH 325.0
456 ROGUE POUCH 325.0
457 CHARLIE POUCH 175.0
458 CHARLIE POUCH IN COLORBLOCK SIGNATURE CANVAS 175.0
459 CHARLIE POUCH WITH MEADOW PRAIRIE PRINT 195.0
460 CHARLIE POUCH WITH SCATTERED RIVETS 195.0
461 CHARLIE POUCH WITH SIGNATURE CANVAS BLOCKING 175.0
462 LARGE CHARLIE POUCH 225.0
463 LARGE CHARLIE POUCH WITH PATCHWORK STRIPES 275.0
464 LARGE CHARLIE POUCH WITH SCATTERED RIVETS 275.0
465 LARGE WRISTLET 30 IN SIGNATURE CANVAS WITH STA... 195.0
466 LARGE WRISTLET 30 WITH REXY AND CARRIAGE 195.0
467 KISSLOCK CLUTCH 225.0
468 KISSLOCK CLUTCH IN COLORBLOCK 225.0
[469 rows x 2 columns]
推荐阅读
- swift - 如何从本机(swift)代码打开反应本机特定屏幕?
- node.js - 出现错误:模块“mailchimp-api-v3”未在 package.json 中列为依赖项
- flutter - 如何检查用户是否向左或向右滑动(可关闭)颤动
- css - Prime Ng 自动完成下拉位置不起作用
- javascript - 我正在尝试将图像添加到标签栏项目但未加载
- bluetooth - 通过在 VirtualBox 中运行的 Windows 访问我的 Mac 的蓝牙
- spring-mvc - HTTP 状态 403 - 未找到预期的 CSRF 令牌。会话是否已过期?
- java - 在junit窗口中运行maven测试
- javascript - 当 for/in 循环完成所有迭代时发送信号
- scroll - Sapper中的goto功能后如何保留滚动位置?