首页 > 解决方案 > 如何通过其 CSS 选择器字符串选择此元素?

问题描述

从这个urlhttps://www.collinsdictionary.com/dictionary/french-english/conjugation/aimer,我试图提取链接

<a class="link-right verbtable" href="https://www.collinsdictionary.com/dictionary/french-english/conjugation/aimer">Full verb table</a>

其 CSS 选择器是div.content.definitions.dictionary.biling > div.hom > span > span.xr > a. 我按照书中的说明用 Python 自动化无聊的东西

在此处输入图像描述

from bs4 import BeautifulSoup

url = 'https://www.collinsdictionary.com/dictionary/french-english/aimer'
soup = BeautifulSoup(url, 'html.parser')

soup.select('div.content.definitions.dictionary.biling > div.hom > span > span.xr > a')

您能否详细说明结果如何[]

标签: python-3.xbeautifulsoupcss-selectors

解决方案


这是因为柯林斯词典使用 Cloudfare 来提高其网站和服务的安全性和性能。因此,当您向其服务器请求时。它不会给你 HTML 文件。

<title>Access denied | www.collinsdictionary.com used Cloudflare to restrict access</title>

为了通过它的安全。您必须在请求中设置用户代理。

from bs4 import BeautifulSoup
import requests

user_agent = {'User-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"}

url = 'https://www.collinsdictionary.com/dictionary/french-english/aimer'
doc = requests.get(url, headers = user_agent).text
soup = BeautifulSoup(doc, 'html.parser')
result = soup.select('div.content.definitions.dictionary.biling > div.hom > span > span.xr > a')
print(result)

这将为您提供结果:

[<a class="link-right verbtable" href="https://www.collinsdictionary.com/dictionary/french-english/conjugation/aimer">Full verb table</a>]

推荐阅读