首页 > 解决方案 > 使用 Selenium Returns 刮取 eBay 出售的物品 []

问题描述

我几乎没有网络抓取经验,也无法使用 BeautifulSoup 解决这个问题,所以我正在尝试 selenium(今天安装)。我正在尝试在 eBay 上抓取已售出的商品。我正在尝试抓取:

https://www.ebay.com/sch/i.html?_from=R40&_nkw=oakley+sunglasses&_sacat=0&Brand=Oakley&rt=nc&LH_Sold=1&LH_Complete=1&_ipg=200&_oaa=1&_fsrp=1&_dcat=79720

这是我在 html 代码中加载并转换为 selenium html 的代码:

    ebay_url = 'https://www.ebay.com/sch/i.html?_from=R40&_nkw=oakley+sunglasses&_sacat=0&Brand=Oakley&rt=nc&LH_Sold=1&LH_Complete=1&_ipg=200&_oaa=1&_fsrp=1&_dcat=79720'

    html = requests.get(ebay_url)
    #print(html.text)

    driver = wd.Chrome(executable_path=r'/Users/mburley/Downloads/chromedriver')
    driver.get(ebay_url)

这会在正确的 url 正确打开一个新的 chrome 会话。我正在获取标题、价格和销售日期,然后将其加载到 csv 文件中。这是我的代码:

    # Find all div tags and set equal to main_data
    all_items = driver.find_elements_by_class_name("s-item__info clearfix")[1:]
    #print(main_data)

    # Loop over main_data to extract div classes for title, price, and date
    for item in all_items:
    date = item.find_element_by_xpath("//span[contains(@class, 'POSITIVE']").text.strip()
    title = item.find_element_by_xpath("//h3[contains(@class, 's-item__title s-item__title--has-tags']").text.strip()
    price = item.find_element_by_xpath("//span[contains(@class, 's-item__price']").text.strip()

    print('title:', title)
    print('price:', price)
    print('date:', date)
    print('---')
    data.append( [title, price, date] )

只返回[]。我认为 ebay 可能阻止了我的 IP,但 html 代码加载并看起来正确。希望有人可以提供帮助!谢谢!

标签: pythonpandasseleniumweb-scraping

解决方案


您可以使用以下代码来抓取详细信息。您也可以使用 pandas 将数据存储在 csv 文件中。

代码 :

ebay_url = 'https://www.ebay.com/sch/i.html?_from=R40&_nkw=oakley+sunglasses&_sacat=0&Brand=Oakley&rt=nc&LH_Sold=1&LH_Complete=1&_ipg=200&_oaa=1&_fsrp=1&_dcat=79720'

html = requests.get(ebay_url)
# print(html.text)

driver = wd.Chrome(executable_path=r'/Users/mburley/Downloads/chromedriver')
driver.maximize_window()
driver.implicitly_wait(30)
driver.get(ebay_url)


wait = WebDriverWait(driver, 20)
sold_date = []
title = []
price = []
i = 1
for item in driver.find_elements(By.XPATH, "//div[contains(@class,'title--tagblock')]/span[@class='POSITIVE']"):
    sold_date.append(item.text)
    title.append(driver.find_element_by_xpath(f"(//div[contains(@class,'title--tagblock')]/span[@class='POSITIVE']/ancestor::div[contains(@class,'tag')]/following-sibling::a/h3)[{i}]").text)
    price.append(item.find_element_by_xpath(f"(//div[contains(@class,'title--tagblock')]/span[@class='POSITIVE']/ancestor::div[contains(@class,'tag')]/following-sibling::div[contains(@class,'details')]/descendant::span[@class='POSITIVE'])[{i}]").text)
    i = i + 1

print(sold_date)
print(title)
print(price)

data = {
         'Sold_date': sold_date,
         'title': title,
         'price': price
        }
df = pd.DataFrame.from_dict(data)
df.to_csv('out.csv', index = 0)

进口:

import pandas as pd
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By

推荐阅读