首页 > 解决方案 > 如何重新采样 csv 以便它可以与我的其他 csv 完美对齐

问题描述

我正在研究项目的另一个方面,但使用从 Mesowest 下载的 csv 文件进行实验。新代码是

df_pirates_all = pd.read_csv(
    "https://cdn.touringplans.com/datasets/pirates_of_caribbean_dlr.csv",usecols=['date','datetime','SPOSTMIN'],
    parse_dates=['date', 'datetime'], 
)
df_pirates_all['ride'] = 'pirates'
df_pirates_all['open'] = ~((df_pirates_all['SPOSTMIN'] == -999))

df_pirates = df_pirates_all.set_index('datetime').sort_index()
df_pirates = df_pirates.loc['2017-01-01 06:00':'2017-02-01 00:00']
c = df_pirates.groupby(level=0).transform("count")
c[c["date"]>1].index.tolist()
df_pirates = df_pirates[~df_pirates.index.isin(c[c["date"]>1].index.tolist())].resample('10Min').fillna("nearest",limit=1)

wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"

wx=pd.read_csv(wxdataadd)
wxdata=wx.resample('10Min')


temp=wxdata['air_temp_set_1']
time=wxdata['Date_Time']

wxtest=pd.concat([df_pirates, temp,time])
wxtest=wxtest.set_index([df_pirates,temp]).unstack()


print (wxtest)

但是我得到了一个

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

关于天气数据的重新采样,因为 wxdata 在列中有值:

Index(['Date_Time', 'air_temp_set_1', 'relative_humidity_set_1',
       'wind_speed_set_1', 'wind_direction_set_1', 'wind_gust_set_1',
       'precip_accum_since_local_midnight_set_1'],
      dtype='object')

当我想将其设置为每 10 分钟甚至 15 分钟以与骑行数据很好地对齐时,每 5 分钟一次。

以下是 wx 的前几行: Excel 示例 但是,我知道 wx 中的时代与海盗的时代并不一致。

标签: pythonpandaspandas-groupbypandas-resample

解决方案


  • 出现此问题是因为.resample需要日期时间索引
    • .resample最后还需要某种类型的聚合函数,例如.sum()or .mean()
    • .mean()对于此数据,由于您通过从 5 分钟到 10 分钟间隔重新采样而失去了分辨率,因此将其用作聚合函数会更有意义
  • 可能还有其他问题,但这会导致TypeError.
import pandas as pd

wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"

# convert Date_Time to datetime when reading file
wx = pd.read_csv(wxdataadd, parse_dates=['Date_Time'])

# set Date_Time to the index
wx.set_index('Date_Time', inplace=True)

# resample the weather data from 5 minutes to 10 minutes, using mean as the agg function
wx = wx.resample('10Min').mean()  # there needs to be an agg function

# join the air temp data to df_pirates on the datetime index
wxtest = df_pirates.join(wx.air_temp_set_1)

# this is an example of the wxtest dataframe
# display(wxtest.head())

                     SPOSTMIN     ride  open  air_temp_set_1
2017-01-01 09:10:00       5.0  pirates  True       51.539172
2017-01-01 09:20:00       5.0  pirates  True       42.690527
2017-01-01 09:30:00       5.0  pirates  True       46.879581
2017-01-01 09:40:00       5.0  pirates  True       41.588848
2017-01-01 09:50:00       5.0  pirates  True       45.575068

样本数据

import pandas as pd
import numpy as np
from datetime import datetime

# sample weather data in 5 minutes intervals
np.random.seed(45)
wxdata = {'Date_Time': pd.bdate_range(datetime(2017, 1, 1), freq='5Min', periods=10), 'air_temp_set_1': np.random.uniform(40, 55, size=(10,))}

wx = pd.DataFrame(wxdata)
wx.Date_Time = wx.Date_Time + pd.Timedelta(hours=9, minutes=10)

# display(wx.head())
                     air_temp_set_1
Date_Time                          
2017-01-01 09:10:00       54.835173
2017-01-01 09:15:00       48.243171
2017-01-01 09:20:00       44.221710
2017-01-01 09:25:00       41.159343
2017-01-01 09:30:00       46.667042

# pirates data in 10 minutes intervals
pirates = {pd.Timestamp('2017-01-01 09:10:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:20:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:30:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:40:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:50:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True}}

df_pirates = pd.DataFrame.from_dict(pirates, orient='index')

# display(df_pirates)
                     SPOSTMIN     ride  open
2017-01-01 09:10:00       5.0  pirates  True
2017-01-01 09:20:00       5.0  pirates  True
2017-01-01 09:30:00       5.0  pirates  True
2017-01-01 09:40:00       5.0  pirates  True
2017-01-01 09:50:00       5.0  pirates  True

推荐阅读