首页 > 解决方案 > 自动化无聊的东西(不能让 BeautifulSoup 解析 HTML)

问题描述

该脚本接受一个关键字,在 Google 上搜索它,然后在浏览器中打开结果选项卡。该脚本在该方法处返回一个空数组,select我对为什么感到困惑。我检查了搜索结果的 HTML,CSS 选择器似乎应该可以工作。

#! /usr/bin/env python3

import webbrowser, sys, requests, bs4, pyperclip

if len(sys.argv) > 1:
    address = ' '.join(sys.argv[1:])
else:
    address = pyperclip.paste()

res = requests.get('https://www.google.com/search?q=' + address)

soup = bs4.BeautifulSoup(res.text, "lxml")

linkElems = soup.select('.r a')
numOpen = min(5, len(linkElems))
for i in range(numOpen):
    webbrowser.open('http://google.com' + linkElems[i].get('href'))

标签: pythonbeautifulsoup

解决方案


尝试User-Agent在标题中设置 a :

from bs4 import BeautifulSoup
import requests

url = "https://www.google.com/search?q=python"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0"
}

response = requests.get(url, headers=headers)
assert response.status_code == 200

soup = BeautifulSoup(response.text, "html.parser")

for element in soup.select(".r a"):
    print(element)

推荐阅读