首页 > 解决方案 > how to web scrape a google results?

问题描述

I need to get the contents of a google result page, like the following:

enter image description here

enter image description here

I tried to use xpath using this code, but did not find the element

import lxml.html
import requests

html= requests.get("https://www.google.com/search?q=curitiba")
lxml = lxml.html.fromstring(html.content)




test=lxml.xpath('/html/body/div[7]/div[2]/div[9]/div[3]/div/div/div[1]/div[2]/div/div/div/div[1]/div/div/div/div[1]/div/div/div/div/span/text()')

print(test)

this is the xpath that chrome itself provides

how can i get the contents of this page?

标签: pythonwebscreen-scraping

解决方案


使用BeautifulSoup

import bs4
import requests
html = requests.get("https://www.google.com/search?q=curitiba")
soup = bs4.BeautifulSoup(html.content)

targeth3 = soup.find("h3", string="Descrição")  # Finds the h3 tag above the span
targetspantext = targeth3.nextSibling.text  # access the text in the target span tag

编辑:您无法通过请求检索该框,因为它已加载 javascript。您可以使用 selenium 或使用https://serpapi.com/。您可以使用 API 检索该框,它称为“知识图”


推荐阅读