python - 如何解析来自该站点的冠状病毒数据?
问题描述
我是 Python 的初学者,我对从 Internet 获取数据知之甚少。我在这里使用的这种方法用于获取和打印 IMDB Top 250 电影。所以我想对这个冠状病毒数据做同样的事情。但与 IMDB 数据不同,程序没有将项目视为列表。我看不出与 IMDB 数据有太大区别。那么我怎样才能通过使用这样的简单请求和漂亮的汤来打印至少国家的名字呢?
import requests
from bs4 import BeautifulSoup
url = requests.get("https://www.worldometers.info/coronavirus/")
soup = BeautifulSoup(url.content, "html.parser")
new_soup = soup.find_all("table", {"id":"main_table_countries_today"})
country_table = new_soup[0].contents[3]
country_table = country_table.find_all("tr")
for country in country_table:
country_name = country.find_all("td", {"style":"font-weight: bold; font-size:15px; text-align:left;"})
print(country_name[0].text)
解决方案
我一直从约翰霍普金斯大学的GitHub 存储库中获取数据,该存储库被认为是有信誉的来源:
names = ('confirmed', 'deaths', 'recovered')
src_base = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_{name}_global.csv'
可以通过以下方式进行感染requests
:
import requests
for name, url in src.items():
response = requests.get(url)
并方便地转换为 Pandas 数据框:
import io
import pandas
dfs = {}
for name, url in src.items():
response = requests.get(url)
dfs[name] = pd.read_csv(io.BytesIO(response.content))
print(name, url)
print(dfs[name])
confirmed https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
Province/State Country/Region ... 4/13/20 4/14/20
0 NaN Afghanistan ... 665 714
1 NaN Albania ... 467 475
2 NaN Algeria ... 1983 2070
3 NaN Andorra ... 646 659
4 NaN Angola ... 19 19
.. ... ... ... ... ...
259 Saint Pierre and Miquelon France ... 1 1
260 NaN South Sudan ... 4 4
261 NaN Western Sahara ... 6 6
262 NaN Sao Tome and Principe ... 4 4
263 NaN Yemen ... 1 1
[264 rows x 88 columns]
deaths https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv
Province/State Country/Region ... 4/13/20 4/14/20
0 NaN Afghanistan ... 21 23
1 NaN Albania ... 23 24
2 NaN Algeria ... 313 326
3 NaN Andorra ... 29 31
4 NaN Angola ... 2 2
.. ... ... ... ... ...
259 Saint Pierre and Miquelon France ... 0 0
260 NaN South Sudan ... 0 0
261 NaN Western Sahara ... 0 0
262 NaN Sao Tome and Principe ... 0 0
263 NaN Yemen ... 0 0
[264 rows x 88 columns]
recovered https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv
Province/State Country/Region ... 4/13/20 4/14/20
0 NaN Afghanistan ... 32 40
1 NaN Albania ... 232 248
2 NaN Algeria ... 601 691
3 NaN Andorra ... 128 128
4 NaN Angola ... 4 5
.. ... ... ... ... ...
245 Saint Pierre and Miquelon France ... 0 0
246 NaN South Sudan ... 0 0
247 NaN Western Sahara ... 0 0
248 NaN Sao Tome and Principe ... 0 0
249 NaN Yemen ... 0 0
[250 rows x 88 columns]
你最终可以有一些快速的情节:
此处提供完整代码。
推荐阅读
- c# - C#锁与多个互斥而不是一个
- javascript - discord.js 用户不包含所有属性
- python - 如何用引号将出现的单词括起来?
- vue.js - 无法更改 vue2editor 中的视频链接输入位置
- javascript - React JS:index.css 1:0 中的错误(模块解析失败:意外令牌 (1:0))
- android - 在 Android Studio 中使用 OkHttp 向 AVS 发送事件 https 请求后,未通过 response.body().string() 获得 downchannelStream 响应
- sql - 使用sqlserver根据时间段获取不同的MO
- mysql - 如何杀死 Mysql Pending 线程?
- c# - 在 WPF 中将枚举绑定到 ComboBox
- input - 稀疏特征之间的差异