python - 抓取谷歌搜索结果页面数据python
问题描述
我想在搜索结果查询中抓取电子邮件。但是当我使用 css 选择器“select”访问类并打印时,它总是显示空列表。我如何访问 .r 类或“class=g”?
import requests
from bs4 import BeautifulSoup
url = "https://www.google.com/search?sxsrf=ACYBGNQA4leQETe0psVZPu7daLWbdsc9Ow%3A1579194494737&ei=fpggXpvRLMakwQKkqpSICg&q=%22computer+science+%22%22usa%22+%22%40yahoo.com%22&oq=%22computer+science+%22%22usa%22+%22%40yahoo.com%22&gs_l=psy-ab.12...0.0..7407...0.0..0.0.0.......0......gws-wiz.82okhpdJLYg&ved=0ahUKEwibiI_3zYjnAhVGUlAKHSQVBaEQ4dUDCAs"
responce = requests.get(url)
soup = BeautifulSoup(responce.text, "html.parser")
test = soup.select('.r')
print(test)
解决方案
您的程序是正确的,但要从 Google 获得正确答案,您需要指定User-Agent
标题:
来自 bs4 的导入请求 import BeautifulSoup
url = "https://www.google.com/search?sxsrf=ACYBGNQA4leQETe0psVZPu7daLWbdsc9Ow%3A1579194494737&ei=fpggXpvRLMakwQKkqpSICg&q=%22computer+science+%22%22usa%22+%22%40yahoo.com%22&oq=%22computer+science+%22%22usa%22+%22%40yahoo.com%22&gs_l=psy-ab.12...0.0..7407...0.0..0.0.0.......0......gws-wiz.82okhpdJLYg&ved=0ahUKEwibiI_3zYjnAhVGUlAKHSQVBaEQ4dUDCAs"
headers = {'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0'}
responce = requests.get(url, headers=headers) # <-- specify custom header
soup = BeautifulSoup(responce.text, "html.parser")
test = soup.select('.r')
print(test)
印刷:
[<div class="r"><a href="https://www.yahoo.com/news/11-course-complete-computer-science-171322233.html" onmousedown="return rwt(this,'','','','1','AOvVaw2wM4TUxc_4V7s9GjeWTNAG','','2ahUKEwjt17Kk-YjnAhW2R0EAHcnsC3QQFjAAegQIAxAB','','',event)"><div class="TbwUpd"><img alt="https://...
...
推荐阅读
- java - 在spring boot中使用之前如何KMS解密application.properties中的密码
- windows - 在bat中处理文本文件时如何不丢失特殊字符?
- javascript - 在 php 文件上使用 js 的计时器使输入和 textarea 样式返回为默认值
- html - 如何检查按钮是否被第二次点击?
- asp.net - asp.net中继器控件内的引导网格
- arrays - 如何使用 Observable 填充 Angular Mat-Table
- javascript - Making a conditional function more efficient
- python - Import two modules in Main file, and call methods in them from each other
- html - 如何使用@media 修改动画
- json - Process multi-level nested escaped JSON strings inside JSON with fluentd