首页 > 解决方案 > 迭代多个月以获取不同的数据

问题描述

我需要经历从 2015 年 1 月到 2020 年 2 月 5 日的每一天。

以下脚本为我提供了截至 2020 年 2 月 5 日为止的每个月的日期:

import pandas as pd
date = pd.datetime.now().strftime("%Y%m%d")


dates = pd.date_range(start='20150101', end='20200205', freq = "M").strftime("%Y%m%d")

print(dates)

结果:

Index(['20150131', '20150228', '20150331', '20150430', '20150531', '20150630',
       '20150731', '20150831', '20150930', '20151031', '20151130', '20151231',
       '20160131', '20160229', '20160331', '20160430', '20160531', '20160630',
       '20160731', '20160831', '20160930', '20161031', '20161130', '20161231',
       '20170131', '20170228', '20170331', '20170430', '20170531', '20170630',
       '20170731', '20170831', '20170930', '20171031', '20171130', '20171231',
       '20180131', '20180228', '20180331', '20180430', '20180531', '20180630',
       '20180731', '20180831', '20180930', '20181031', '20181130', '20181231',
       '20190131', '20190228', '20190331', '20190430', '20190531', '20190630',
       '20190731', '20190831', '20190930', '20191031', '20191130', '20191231',
       '20200131'],
      dtype='object'

以下脚本会在 2015 年 1 月的每一天抓取风速:在我的主目录中,我指定了 URL 中使用的 API 密钥、开始日期和结束日期。我相信这是可以合并两个脚本的地方。

import pandas as pd
import requests
import warnings

headers = {
    'scheme': 'https',
    'accept': 'application/json, text/plain, */*',
    'accept-encoding' : 'gzip, deflate, br',
    'accept-language': 'en-GB,en;q=0.9,en-US;q=0.8,da;q=0.7',
    'origin': 'https://www.wunderground.com',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'cross-site',
    'user-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
}

#Here I get the relevant data, being the dates and wind speed, and add it to a seperate dataframe called dkk
def get_data(response):
    df = response.json()
    df = pd.DataFrame(df["observations"])#[1]["valid_time_gmt", "wspd"]
    df["time"] = pd.to_datetime(df["valid_time_gmt"],unit='s')
    dkk = df.groupby(df["time"].dt.date)["wspd"].mean()
    return dkk 


if __name__ == "__main__":
    date = pd.datetime.now().strftime("%d-%m-%Y")

    api_key = "xxxxxx"
    start_date = "20150101"
    end_date = "20150131"

    urls = [
    "https://api.weather.com/v1/location/EGNV:9:GB/observations/historical.json?apiKey="+api_key+"&units=e&startDate="+start_date+"&endDate="+end_date+""
    ]

    #here I append data to dataframe and transpose it and store in df_transposed, which results in the 
    below. 
    df = pd.DataFrame()
    for url in urls:  
        warnings.simplefilter('ignore' ,InsecureRequestWarning)
        res = requests.get(url, headers= headers, verify = False)
        data = get_data(res)
        df = df.append(data) 
    df_transposed = df.T
    print(df_transposed)

结果:

                 wspd
2015-01-01  24.333333
2015-01-02  18.696970
...
2015-01-30  12.121212
2015-01-31  21.575758

问题是:我需要获取 2015 年 1 月 1 日至 2020 年 2 月 5 日的风速。如何最好地组合我的脚本以获得所需的输出,这将是一个包含日期和风速的两列数据框(wspd ) 在第二。

所需的输出:

                 wspd
2015-01-01  24.333333
2015-01-02  18.696970
2015-01-03   8.454545
2015-01-04  10.363636
2015-01-05  11.333333
...
2020-02-04  13.5
2020-02-05  7.1

最后两个日期的 wspd 可以在这里看到:

https://www.wunderground.com/history/monthly/gb/darlington/EGNV/date/2020-2

标签: pythonpandasdataframeweb-scraping

解决方案


使用Series.where

s = df_transposed.index.to_series()
df_transposed= df_transposed.where((s >='2015-01-01') &(s<='2020-02-05'),'XXX')

编辑

s = df_transposed.index.to_series()
df_transposed= df_transposed.where((s >=pd.to_datetime('2015-01-01')) &
                                   (s<=pd.to_datetime('2020-02-05')),'XXX')

推荐阅读