javascript - 使用 JavaScript 或 Python 抓取 HTML 数据
问题描述
我想从 html 中抓取数据并将其保存到文本文件我有 URL 可以请你如果 JavaScript 更可取或 python 我可以尝试
# Import libraries
import requests
import urllib2
import time from bs4
import BeautifulSoup
# Set the URL you want to webscrape from
url = 'theweathernetwork.com/ca/hourly-weather-forecast/ontario/london'
# Connect to the URL
response = requests.get(url)
# Parse HTML and save to BeautifulSoup object
soup = BeautifulSoup(response.text, "html.parser")
# To download the whole data set, let's do a for loop through all a tags
解决方案
您可以从 api 获取数据(通过查看 DevTools -> XHR 找到)。但是,如果您要获取 html,则需要使用 selenium 之类的东西才能呈现页面,然后获取 html 源代码。
所以,不确定这是否是你想要的,但数据就在那里,你可以拉你想要拉的任何东西。这是每小时数据的示例。
import requests
import pandas as pd
url = 'https://www.theweathernetwork.com/api/data/caon0383/hourly/cm/ci?ts=1012'
jsonData = requests.get(url).json()
df = pd.DataFrame(jsonData['hourly']['periods'])
输出:
print (df.to_string())
b cc_class cdate cloud_coverage dayname_alt dewpt_unit dn f fc feelsLikeNight_unit fu hour ic icon ii it ms n pop_class pp r rain_bar_height rain_unit_language rain_value rr ru s sd showrainunit showsnowunit sky_tenths snow_bar_height snow_unit_language snow_value sr su t tc tmau tmu tsg tsl tu w wd wg wgk wgu wk wu wx
0 default cc9 Tuesday, November 5 90 Tue C Tue 0 0 C C 6 am sunny 8 chart-sun Mainly cloudy Nov 1 pop3 30 - 0 NaN - 0 mm - Tue Nov 5 False False 9 0 NaN - 0 cm 4 4 C C 1572951600000 1572933600000 C 14 W 21 21 km/h 14 km/h O-N
1 default cc7 Tuesday, November 5 70 Tue C Tue 0 0 C C 7 am sunny 3 chart-stormy A mix of sun and clouds Nov 2 pop3 30 - 0 NaN - 0 mm - Tue Nov 5 False False 7 0 NaN - 0 cm 4 4 C C 1572955200000 1572937200000 C 16 W 24 24 km/h 16 km/h B
2 default cc7 Tuesday, November 5 70 Tue C Tue 0 0 C C 8 am sunny 3 chart-stormy A mix of sun and clouds Nov 3 pop3 30 - 0 NaN - 0 mm - Tue Nov 5 False False 7 0 NaN - 0 cm 4 4 C C 1572958800000 1572940800000 C 20 W 29 29 km/h 20 km/h B
3 default cc7 Tuesday, November 5 70 Tue C Tue 0 0 C C 9 am sunny 3 chart-stormy A mix of sun and clouds Nov 4 pop3 30 - 0 NaN - 0 mm - Tue Nov 5 False False 7 0 NaN - 0 cm 4 4 C C 1572962400000 1572944400000 C 22 W 33 33 km/h 22 km/h B
4 default cc7 Tuesday, November 5 70 Tue C Tue 1 1 C C 10 am sunny 3 chart-stormy A mix of sun and clouds Nov 5 pop2 20 - 0 NaN - 0 mm - Tue Nov 5 False False 7 0 NaN - 0 cm 5 5 C C 1572966000000 1572948000000 C 24 W 37 37 km/h 24 km/h B
5 default cc6 Tuesday, November 5 60 Tue C Tue 0 0 C C 11 am sunny 3 chart-stormy A mix of sun and clouds Nov 6 pop2 20 - 0 NaN - 0 mm - Tue Nov 5 False False 6 0 NaN - 0 cm 5 5 C C 1572969600000 1572951600000 C 28 W 42 42 km/h 28 km/h B
6 default cc6 Tuesday, November 5 60 Tue C Tue 1 1 C C 12 pm sunny 3 chart-stormy A mix of sun and clouds Nov 7 pop2 20 - 0 NaN - 0 mm - Tue Nov 5 False False 6 0 NaN - 0 cm 6 6 C C 1572973200000 1572955200000 C 29 W 44 44 km/h 29 km/h B
7 default cc6 Tuesday, November 5 60 Tue C Tue 3 3 C C 1 pm sunny 3 chart-stormy A mix of sun and clouds Nov 8 pop2 20 - 0 NaN - 0 mm - Tue Nov 5 False False 6 0 NaN - 0 cm 7 7 C C 1572976800000 1572958800000 C 29 W 44 44 km/h 29 km/h B
8 default cc6 Tuesday, November 5 60 Tue C Tue 2 2 C C 2 pm sunny 3 chart-stormy A mix of sun and clouds Nov 9 pop2 20 - 0 NaN - 0 mm - Tue Nov 5 False False 6 0 NaN - 0 cm 6 6 C C 1572980400000 1572962400000 C 27 W 41 41 km/h 27 km/h B
9 default cc7 Tuesday, November 5 70 Tue C Tue 2 2 C C 3 pm sunny 3 chart-stormy A mix of sun and clouds Nov 10 pop3 30 - 0 NaN - 0 mm - Tue Nov 5 False False 7 0 NaN - 0 cm 6 6 C C 1572984000000 1572966000000 C 24 W 36 36 km/h 24 km/h B
10 default cc8 Tuesday, November 5 80 Tue C Tue 1 1 C C 4 pm sunny 4 chart-overcast Cloudy with sunny breaks Nov 11 pop3 30 - 0 NaN - 0 mm - Tue Nov 5 False False 8 0 NaN - 0 cm 5 5 C C 1572987600000 1572969600000 C 23 W 35 35 km/h 23 km/h B+
11 default cc7 Tuesday, November 5 70 Tue C Tue 1 1 C C 5 pm sunny 20 chart-overcast Partly cloudy Nov 12 pop3 30 - 0 NaN - 0 mm - Tue Nov 5 False False 7 0 NaN - 0 cm 5 5 C C 1572991200000 1572973200000 C 23 W 35 35 km/h 23 km/h BN
12 ...
推荐阅读
- c# - 如何从存储过程返回的日期时间对象中获取日期?
- c# - 与注册的 IEnumerable 结合使用的意外行为 Log4NetExtension
在统一容器中 - c# - libgit2sharp 相当于 git diff
- ios - 带有自定义下划线的 NSAttributedString
- postgresql - 在 PostgreSQL 中动态选择列
- javascript - vue 复选框没有通过 v-model 正确绑定数据
- python - TypeError:无法在多线程中腌制 _thread.lock 对象
- html - 覆盖领英关注按钮上的关注标签
- postgis - 如何查看数百万用户是否使用 postgis 跨越路径
- tinymce - TinyMCE 表未在编辑器中显示