首页 > 解决方案 > 使用带有循环的漂亮汤在 Python 中进行 Webscrape 交互式图表

问题描述

下面的代码提供了页面中所有数字标签的信息。我可以使用过滤器为每个区域提取一次吗

例如: https ://opensignal.com/reports/2019/04/uk/mobile-network-experience ,我只对区域分析选项卡下和所有区域的数字感兴趣。

import requests
from bs4 import BeautifulSoup

html=requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience").text
soup=BeautifulSoup(html,'html.parser')
items=soup.find_all('div',class_='c-ru-graph__rect')


for item in items:
    provider=item.find('span', class_='c-ru-graph__label').text
    prodvalue=item.find_next_sibling('span').find('span', class_='c-ru-graph__number').text
    print(provider + " : " + prodvalue)

我想要一张桌子或 df 如下复活节地区

                       o2      Vodaphone   3    EE
4G Availability        82      76.9        73.0   89.2
Upload Speed Experience 5.6    5.9         6.8    9.5

任何可以帮助获得结果的指针?

标签: pythonpython-3.xweb-scrapingchartsbeautifulsoup

解决方案


假设固定公司的顺序(确实如此),您可以简单地将要检查的内容减少到仅包含您需要的信息的那些 div。

import requests
from bs4 import BeautifulSoup

html = requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience").text
soup = BeautifulSoup(html,'html.parser')

res = soup.find_all('div', {'id':'eastern'})

aval = res[0].find_all('div', {'data-chart-name':'4g-availability'})
avalname = aval[0].find('span', {'class':'js-metric-name'}).text

upload = res[0].find_all('div', {'data-chart-name':'upload-speed'})
uploadname = upload[0].find('span', {'class':'js-metric-name'}).text

companies = [i.text for i in aval[0].find_all('span', class_='c-ru-graph__label')]

row1 = [i.text for i in aval[0].find_all('span', class_='c-ru-graph__number')]
row2 = [i.text for i in upload[0].find_all('span', class_='c-ru-graph__number')]

import pandas as pd

df = pd.DataFrame({avalname:row1,
                   uploadname:row2})


df.index = companies

df = df.T

输出

                          O2    Vodafone      3      EE
4G Availability         82.0        76.9   73.0    89.2
Upload Speed Experience  5.6         5.9    6.8     9.5

推荐阅读