首页 > 解决方案 > Web_scrapping - 空列表

问题描述

我正在尝试从网页获取历史天气数据(链接将在代码上可用)。但是,它返回空列表。我查了,找不到原因。我已经从其他网页中进行了类似的网页抓取,并且它们有效。我事先感谢您的帮助。请尝试代码,看看您是否能理解代码返回空列表的原因。由于返回的空列表,我注释了代码的结尾。

from bs4 import BeautifulSoup
from requests import get
import numpy as np
import pandas as pd
headers = ({'User-Agent':
            'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit\
            /537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'})
year = []
date = []
time = []
maxtemp = []
mintemp = []
wind = []
years = list(range(2010, 2021, 1))
months = list(range(1, 13, 1))
urls = []
for yr in years:
    for month in months:
        base_url = "https://www.timeanddate.com/weather/germany/berlin/historic?month=" + str(month) + 
        "&year=" + str(year)
        response = get(base_url,headers=headers)
        html = response.text
        soup = BeautifulSoup(html, 'html.parser')
        year.append(soup.find_all('h1', attrs={'class': 'time'}))#[0].text.split()))
        date.append(soup.find_all('div', attrs={'class': 'date'}))#[0].text.split(',')[1].split())
        time.append(soup.find_all('div', attrs={'class': 'time'}))#[0].text.split())
        maxtemp.append(soup.find_all('div', attrs={'class': 'temp low'}))#[0].text.split(':')[1])
        mintemp.append(soup.find_all('div', attrs={'class': 'tempLow low'}))#[0].text.split(':')[1])
        wind.append(soup.find_all('div', attrs={'class': 'wstext'}))#[0].text.split())
print(year)
print(date)
print(time)
print(maxtemp)``
print(mintemp)
print(wind)
print(len(year),len(date),len(time),len(max_temp),len(min_temp),len(date), len(wind))

标签: pythonweb-scraping

解决方案


推荐阅读