python - 如何重新采样 csv 以便它可以与我的其他 csv 完美对齐
问题描述
我正在研究项目的另一个方面,但使用从 Mesowest 下载的 csv 文件进行实验。新代码是
df_pirates_all = pd.read_csv(
"https://cdn.touringplans.com/datasets/pirates_of_caribbean_dlr.csv",usecols=['date','datetime','SPOSTMIN'],
parse_dates=['date', 'datetime'],
)
df_pirates_all['ride'] = 'pirates'
df_pirates_all['open'] = ~((df_pirates_all['SPOSTMIN'] == -999))
df_pirates = df_pirates_all.set_index('datetime').sort_index()
df_pirates = df_pirates.loc['2017-01-01 06:00':'2017-02-01 00:00']
c = df_pirates.groupby(level=0).transform("count")
c[c["date"]>1].index.tolist()
df_pirates = df_pirates[~df_pirates.index.isin(c[c["date"]>1].index.tolist())].resample('10Min').fillna("nearest",limit=1)
wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"
wx=pd.read_csv(wxdataadd)
wxdata=wx.resample('10Min')
temp=wxdata['air_temp_set_1']
time=wxdata['Date_Time']
wxtest=pd.concat([df_pirates, temp,time])
wxtest=wxtest.set_index([df_pirates,temp]).unstack()
print (wxtest)
但是我得到了一个
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
关于天气数据的重新采样,因为 wxdata 在列中有值:
Index(['Date_Time', 'air_temp_set_1', 'relative_humidity_set_1',
'wind_speed_set_1', 'wind_direction_set_1', 'wind_gust_set_1',
'precip_accum_since_local_midnight_set_1'],
dtype='object')
当我想将其设置为每 10 分钟甚至 15 分钟以与骑行数据很好地对齐时,每 5 分钟一次。
解决方案
- 出现此问题是因为
.resample
需要日期时间索引.resample
最后还需要某种类型的聚合函数,例如.sum()
or.mean()
。.mean()
对于此数据,由于您通过从 5 分钟到 10 分钟间隔重新采样而失去了分辨率,因此将其用作聚合函数会更有意义
- 可能还有其他问题,但这会导致
TypeError
.
import pandas as pd
wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"
# convert Date_Time to datetime when reading file
wx = pd.read_csv(wxdataadd, parse_dates=['Date_Time'])
# set Date_Time to the index
wx.set_index('Date_Time', inplace=True)
# resample the weather data from 5 minutes to 10 minutes, using mean as the agg function
wx = wx.resample('10Min').mean() # there needs to be an agg function
# join the air temp data to df_pirates on the datetime index
wxtest = df_pirates.join(wx.air_temp_set_1)
# this is an example of the wxtest dataframe
# display(wxtest.head())
SPOSTMIN ride open air_temp_set_1
2017-01-01 09:10:00 5.0 pirates True 51.539172
2017-01-01 09:20:00 5.0 pirates True 42.690527
2017-01-01 09:30:00 5.0 pirates True 46.879581
2017-01-01 09:40:00 5.0 pirates True 41.588848
2017-01-01 09:50:00 5.0 pirates True 45.575068
样本数据
import pandas as pd
import numpy as np
from datetime import datetime
# sample weather data in 5 minutes intervals
np.random.seed(45)
wxdata = {'Date_Time': pd.bdate_range(datetime(2017, 1, 1), freq='5Min', periods=10), 'air_temp_set_1': np.random.uniform(40, 55, size=(10,))}
wx = pd.DataFrame(wxdata)
wx.Date_Time = wx.Date_Time + pd.Timedelta(hours=9, minutes=10)
# display(wx.head())
air_temp_set_1
Date_Time
2017-01-01 09:10:00 54.835173
2017-01-01 09:15:00 48.243171
2017-01-01 09:20:00 44.221710
2017-01-01 09:25:00 41.159343
2017-01-01 09:30:00 46.667042
# pirates data in 10 minutes intervals
pirates = {pd.Timestamp('2017-01-01 09:10:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:20:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:30:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:40:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:50:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True}}
df_pirates = pd.DataFrame.from_dict(pirates, orient='index')
# display(df_pirates)
SPOSTMIN ride open
2017-01-01 09:10:00 5.0 pirates True
2017-01-01 09:20:00 5.0 pirates True
2017-01-01 09:30:00 5.0 pirates True
2017-01-01 09:40:00 5.0 pirates True
2017-01-01 09:50:00 5.0 pirates True
推荐阅读
- kotlin - 初始化后如何更改 Kotlin 中字符串的值?
- python - 从批处理文件启动时“ImportError: DLL load failed while importing _ssl”
- powershell - Powershell:指定电子邮件:密码与随机数据之间
- arrays - 为什么 Array.filter 的谓词返回类型为未知(而不是布尔值)?
- javascript - 反应钩子 | useEffect :为什么这两个钩子都有一个依赖 lint 错误,但不是两个?
- c# - 显示数量来自数据库且价格由用户输入的每一行的总价
- c++ - C++ 长整数显示
- html - 即使加载了所有内容,Angular 应用程序也会在 iframe 中显示白页
- r - 将前导零添加到特定行
- python - 列表在被要求这样做之前附加字典