首页 > 解决方案 > 我怎样才能用beautifulsoup从维基百科表中提取一条数据

问题描述

因此,我试图通过一个网站了解哥伦比亚目前确认了多少冠状病毒病例。我只需要显示案例数量,我使用的是 bs4。但是,我知道有关编程的基本知识,但我不知道 python。这就是我所拥有的

import bs4

import requests

response = requests.get("https://es.wikipedia.org/wiki/Pandemia_de_enfermedad_por_coronavirus_de_2020_en_Colombia")

if response is not None:
    html = bs4.BeautifulSoup(response.text, 'html.parser')

    title = html.select(".infobox")[0].text
    paragraphs = html.select("tr")
    #for para in paragraphs:
        #print (para.text)

    mylist = soup.find_all('td')
    print(mylist.text)

标签: python-3.xapibeautifulsoupwikipedia

解决方案


这是另一个使用 API 而不是抓取Wikipedia 的示例,在本例中是免费的covid19 API

import requests

class Covid19ApiHelper:
    URL_API = 'https://api.covid19api.com/summary'

    def __init__(self):
        self._global_info = None
        self._countries = None

    def refresh(self):
        """Request data from the API and saves it"""
        response = requests.get(self.URL_API)
        data = response.json()

        self._global_info = data['Global']   
        self._countries = {item['CountryCode']: item for item in data['Countries']}

    def get_global_info(self):
        return self._global_info

    def get_country_info(self, countryCode):
        """Returns the information by country using the standard two digit country code"""
        return self._countries[countryCode]

if __name__=='__main__':
    covid_helper = Covid19ApiHelper()
    covid_helper.refresh()

    print(covid_helper.get_global_info())
    print(covid_helper.get_country_info('CO'))

全局输出:

{'NewConfirmed': 86850, 'TotalConfirmed': 2894581, 'NewDeaths': 5839, 'TotalDeaths': 202795, 'NewRecovered': 27616, 'TotalRecovered': 815948}

输出哥伦比亚:

{'Country': 'Colombia', 'CountryCode': 'CO', 'Slug': 'colombia', 'NewConfirmed': 261, 'TotalConfirmed': 5142, 'NewDeaths': 8, 'TotalDeaths': 233, 'NewRecovered': 
64, 'TotalRecovered': 1067, 'Date': '2020-04-26T09:16:56Z'}

数据来源:https ://covid19api.com/#details


推荐阅读