首页 > 解决方案 > 如何用 Python requests-html 抓取评分?

问题描述

我在使用 requests-html 来掌握网站上的评级信息时遇到了困难。这是我写的代码:

from requests_html import HTMLSession

import requests

from bs4 import BeautifulSoup

import re

url="https://www.immobilienscout24.de/expose/107160613/"

session=HTMLSession()

r=session.get(url)

r.html.render()

rating=r.html.find("div#style__truncateChild___2Z9XG is24-rating",first=False)

print(rating)

这里用于评级信息的网站 html 如下:

在此处输入图像描述

但是,我只能收到错误消息:

Traceback (most recent call last):
  File "D:/Program Files/python/draft.py", line 8, in <module>
    r.html.render()
  File "E:\master\thesis\thesis\venv\lib\site-packages\requests_html.py", line 583, in render
    content, result, page = self.session.loop.run_until_complete(_async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
  File "D:\Program Files\python\lib\asyncio\base_events.py", line 568, in run_until_complete
    return future.result()
  File "E:\master\thesis\thesis\venv\lib\site-packages\requests_html.py", line 545, in _async_render
    await page.goto(url, options={'timeout': int(timeout * 1000)})
  File "E:\master\thesis\thesis\venv\lib\site-packages\pyppeteer\page.py", line 854, in goto
    result = await self._navigate(url, referrer)
  File "E:\master\thesis\thesis\venv\lib\site-packages\pyppeteer\page.py", line 869, in _navigate
    'Page.navigate', {'url': url, 'referrer': referrer})
pyppeteer.errors.NetworkError: Protocol error Page.navigate: Target closed.

我期望的是掌握相关的评级信息:3 Sterne。

标签: javascriptpythonhtmlpython-requests

解决方案


我意识到这已经很老了,但是我能够使用异步和设置超时来获得一些东西:

from requests_html import AsyncHTMLSession

s = AsyncHTMLSession()
async def main():
    r = await s.get('https://www.immobilienscout24.de/expose/107160613/')
    await r.html.arender(timeout=60)
    print(r.html.find('span[class*=rating]'))

s.run(main)

[<Element 'span' class=('overall-rating', 'margin-right-s') title='4,2 Sterne'>, 
 <Element 'span' class=('overall-rating', 'margin-right-s') title='4,2 Sterne'>, 
 <Element 'span' class=('overall-rating', 'margin-right-s') title='4,2 Sterne'>, 
 <Element 'span' class=('overall-rating', 'margin-right-s') title='4,2 Sterne'>]

推荐阅读