python-3.x - 不能用beautifulsoup 抓取谷歌搜索结果
问题描述
我想抓取谷歌搜索结果,但每当我尝试这样做时,程序都会返回一个空列表
from bs4 import BeautifulSoup
import requests
keyWord = input("Input Your KeyWord :")
url = f'https://www.google.com/search?q={keyWord}'
src = requests.get(url).text
soup = BeautifulSoup(src, 'lxml')
container = soup.findAll('div', class_='g')
print(container)
解决方案
补充Andrej Kesely 的答案,如果你得到空的结果,你总是可以爬上div
或下来测试,然后从那里开始。
代码(假设你想抓取title、summary和link):
from bs4 import BeautifulSoup
import requests
import json
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
html = requests.get('https://www.google.com/search?q=ice cream',
headers=headers).text
soup = BeautifulSoup(html, 'lxml')
summary = []
for container in soup.findAll('div', class_='tF2Cxc'):
heading = container.find('h3', class_='LC20lb DKV0Md').text
article_summary = container.find('span', class_='aCOpRe').text
link = container.find('a')['href']
summary.append({
'Heading': heading,
'Article Summary': article_summary,
'Link': link,
})
print(json.dumps(summary, indent=2, ensure_ascii=False))
输出部分:
[
{
"Heading": "Ice cream - Wikipedia",
"Article Summary": "Ice cream (derived from earlier iced cream or cream ice) is a sweetened frozen food typically eaten as a snack or dessert. It may be made from dairy milk or cream and is flavoured with a sweetener, either sugar or an alternative, and any spice, such as cocoa or vanilla.",
"Link": "https://en.wikipedia.org/wiki/Ice_cream"
},
{
"Heading": "Jeni's Splendid Ice Creams",
"Article Summary": "Jeni's Splendid Ice Cream, built from the ground up with superlative ingredients. Order online, visit a scoop shop, or find the closest place to buy Jeni's near you.",
"Link": "https://jenis.com/"
}
]
或者,您可以使用来自 SerpApi 的Google 搜索引擎结果 API来执行此操作。这是一个付费 API,可免费试用 5,000 次搜索。看看操场。
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "ice cream",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results["organic_results"]:
print(f"Title: {result['title']}\nSummary: {result['snippet']}\nLink: {result['link']}\n")
部分输出:
Title: Ice cream - Wikipedia
Summary: Ice cream (derived from earlier iced cream or cream ice) is a sweetened frozen food typically eaten as a snack or dessert. It may be made from dairy milk or cream and is flavoured with a sweetener, either sugar or an alternative, and any spice, such as cocoa or vanilla.
Link: https://en.wikipedia.org/wiki/Ice_cream
Title: 6 Ice Cream Shops to Try in Salem, Massachusetts ...
Summary: 6 Ice Cream Shops to Try in Salem, Massachusetts · Maria's Sweet Somethings, 26 Front Street · Kakawa Chocolate House, 173 Essex Street · Melt ...
Link: https://www.salem.org/icecream/
Title: Melt Ice Cream - Salem
Summary: Homemade ice cream made on-site in Salem, MA. Bold innovative flavors, exceptional customer service, local ingredients.
Link: https://meltsalem.com/
免责声明,我为 SerpApi 工作。
推荐阅读
- php - 无法使用常量链接 css 文件
- selenium - 尝试在 Firefox 中通过 iframe 打印 pdf
- ios - 搜索栏过滤问题
- javascript - 如何计算小时到分钟?
- php - 如何使用 phpprepared-statement 从 MySQL 返回 Json 格式的对象数组
- pycharm - 打印功能在构建 CNN 架构时不打印任何内容
- javascript - 如果用户单击取消按钮,如何重置 ngModel 的值
- python-3.x - 在 python 3.7.3 中安装 apache 气流
- ansible - 为什么 Ansible-Tower 忽略了额外的变量?
- javascript - 如何从可拖动项目制作可拖动项目?