首页 > 解决方案 > 无法抓取谷歌结果

问题描述

我是 python 新手,我正在学习automating boring stuff with python,所以目前我在本书的网络抓取章节中。所以,我只想抓取搜索结果的标题。这是我的代码 -

import requests
from bs4 import BeautifulSoup
import webbrowser

term = 'python'
req = requests.get('https://www.google.com/search?q=' + term)
req.raise_for_status()

soup = BeautifulSoup(req.text, 'lxml')
title = soup.find('div', class_ = 'r')

print(title)

问题是这总是返回None。我什至攻击了检查元素工具的屏幕截图,以便您可以看到我正在使用的名称divclass

检查元素页面的屏幕截图

感谢任何帮助谢谢

标签: pythonweb-scrapingbeautifulsouppython-requests

解决方案


要从服务器获得正确的响应,请指定User-AgentHTTP 标头:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0'}

term = 'python'
req = requests.get('https://www.google.com/search?q=' + term, headers=headers)
req.raise_for_status()

soup = BeautifulSoup(req.content, 'lxml')
title = soup.find('div', class_ = 'r')

print(title.get_text(strip=True, separator=' '))

印刷:

Welcome to Python.org www.python.org www.python.org ...

推荐阅读