首页 > 解决方案 > 尽管网站上的内容完全相同,但为什么不包含 BeautifulSoup 中的文本

问题描述

我必须从这个网站上抓取3 个元素:

http://www.altitude-maps.com/city/170_562,波兹南,大波兰,波兰

我需要纬度,经度和海拔,所以我的代码是:

import requests
from bs4 import BeautifulSoup as bs

url = 'http://www.altitude-maps.com/city/170_562,Poznan,Wielkopolskie,Poland'
r = requests.get(url)
soup = bs(r.content, features="html.parser")

latitude = soup.find('span', attrs={'id': 'curLat'}).get_text()
longitude = soup.find('span', attrs={'id': 'curLng'}).get_text()
elevation1 = soup.find('span', attrs={'id': 'altitude'}).get_text()  # from the text in the center
elevation2 = soup.find('span', attrs={'id': 'curElevation'}).get_text()  # from the box in the left

它会找到纬度和经度的值,但不会找到海拔的值(在这两种情况下)。而不是得到'80.33 m'和'80.33 m(263.55 ft)'我得到空白和空str。

BS 和网站的 HTML 比较:

BS_elevation1 = soup.find('span', attrs={'id': 'altitude'}) 
#  BS_elevation1: <span id="altitude" style="font-size: 1.5em;"> </span>
#  This part on the website: <span id="altitude" style="font-size: 1.5em;">80.33 m (263.55 ft)</span>

BS_elevation2 = soup.find('span', attrs={'id': 'curElevation'})
#  BS_elevation2: <span id="curElevation" style=""></span>
#  This part on the website: <span id="curElevation" style>80.33 m</span>

似乎该文本在网站上可用,但在 BeautifulSoup 中不可用。我不明白为什么会这样。如何克服它?

标签: pythonhtmlweb-scrapingbeautifulsoup

解决方案


import httpx
import trio
import re


async def main():
    async with httpx.AsyncClient(timeout=None) as client:
        r = await client.get('http://www.altitude-maps.com/city/170_562,Poznan,Wielkopolskie,Poland')
        goal = re.findall(r"(lati|long|elev).*?'(.+)'", r.text)
        print(goal)

if __name__ == "__main__":
    trio.run(main)

输出:

[('lati', '52.4063740'), ('long', '16.9251681'), ('elev', '80.329216003418')]

推荐阅读