python - 网页抓取，从产品网格中提取href

问题描述

我正在使用 cloudscraper 和美丽的汤制作 webscraper（我是新手）。对于 1 个网页 ( https://www.feelunique.com/makeup?filter=fh_location=//c1/en_GB/categories%3C{c1_c1c6}/!exclude_countries%3E{gb}/!site_exclude%3E{1}/!品牌={a70}/%26special-page=dept_home%26customer-country=GB%26site_id=1%26gender=female%26device=desktop%26site_area=department%26date_time=20210429T060257%26fh_view_size=40%26fh_start_index=0%26fh_view_size=40 ) 我正在尝试抓取产品网格中每个产品的链接。

我写了这段代码：

    baseurl = 'https://www.feelunique.com/'
    productlinks = []
    productlinks2 = []

        r = scraper.get(url)
        soup = BeautifulSoup(r.content, 'lxml')
        for a in soup.select("#fullcolumn > div.eba-component.eba-product-listing"):
            print(a)
            if a.has_attr('href'):
                productlinks.append(baseurl + a['href'])
                print(len(productlinks))

当我打印时，a我得到了相关的 HTML 和标签，但似乎无法从中获取 href。任何帮助，将不胜感激

标签： pythonhtmlweb-scrapingbeautifulsoupcloudflare

通过使用更好的选择器，您的问题将得到解决。

只需更换线下。

for a in soup.select("a[class='Product-link thumb']"):

问候，

python - 网页抓取，从产品网格中提取href

问题描述

解决方案

推荐阅读