首页 > 解决方案 > google search html doesn't contain div id='resultStats'

问题描述

I'm trying to get the number of search results of a google search, which looks like this in the html, if i just save it from the browser:

<div id="resultStats">About 8,660,000,000 results<nobr> (0.49 seconds)&nbsp;</nobr></div>

But the HTML retrieved by python looks like a mobile website when I open it in a browser and it doesn't contain 'resultStats'.

I already tried (1) adding parameters to the URL like https://www.google.com/search?client=firefox-b-d&q=test and (2) copying a complete URL from a browser, but it didn't help.

import requests
from bs4 import BeautifulSoup
import re

def google_results(query):
    url = 'https://www.google.com/search?q=' + query
    html = requests.get(url).text
    soup = BeautifulSoup(html, 'html.parser')
    div = soup.find('div', id='resultStats')
    return int(''.join(re.findall(r'\d+', div.text.split()[1])))

print(google_results('test'))

Error:

Traceback: line 11, in google_results
    return int(''.join(re.findall(r'\d+', div.text.split()[1])))
AttributeError: 'NoneType' object has no attribute 'text'

标签: pythonpython-3.xpython-requests

解决方案


解决方案是添加标题(谢谢,约翰):

import requests
from bs4 import BeautifulSoup
import re

def google_results(query):
    url = 'https://www.google.com/search?q=' + query
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'
    }
    html = requests.get(url, headers=headers).text
    soup = BeautifulSoup(html, 'html.parser')
    div = soup.find('div', id='resultStats')
    return int(''.join(re.findall(r'\d+', div.text.split()[1])))

print(google_results('test'))

输出:

9280000000

推荐阅读