python - 大熊猫平均时间戳之间的数据
问题描述
如果我有一些数据(24 小时时间序列)读入 Pandas:
import pandas as pd
import numpy as np
#read CSV file
df = pd.read_csv('https://raw.githubusercontent.com/bbartling/Building-Demand-Electrical-Load-Profiles/master/july15.csv',
index_col='Date', parse_dates=True)
如何将这些时间戳之间的df
列平均到一个新的单独的 pandas df 中?kW
bkps_timestamps_kW = [
'2013-06-19 00:15:00',
'2013-06-19 05:15:00',
'2013-06-19 16:30:00',
'2014-06-18 00:00:00']
新的 pandas df 可能有类似这样的列名avg_kw1, avg_kw2, avg_kw3
,表示bkps_timestamps_kW
感谢任何帮助/提示
解决方案
我认为您需要cut
通过转换为日期时间和聚合的列表进行绑定mean
:
d = [
'2013-06-19 00:00:00',
'2013-06-19 00:15:00',
'2013-06-19 01:15:00',
'2013-06-19 05:15:00',
'2013-06-19 07:15:00',
'2013-06-19 16:30:00',
'2013-06-20 16:30:00',
'2014-06-18 00:00:00',
'2015-06-18 00:00:00']
df = pd.DataFrame({'Date':range(len(d))}, index=pd.to_datetime(d))
print (df)
Date
2013-06-19 00:00:00 0
2013-06-19 00:15:00 1
2013-06-19 01:15:00 2
2013-06-19 05:15:00 3
2013-06-19 07:15:00 4
2013-06-19 16:30:00 5
2013-06-20 16:30:00 6
2014-06-18 00:00:00 7
2015-06-18 00:00:00 8
bkps_timestamps_kW = [
'2013-06-19 00:15:00',
'2013-06-19 05:15:00',
'2013-06-19 16:30:00',
'2014-06-18 00:00:00']
b = pd.to_datetime(bkps_timestamps_kW)
labels = [f'{i}-{j}' for i, j in zip(bkps_timestamps_kW[:-1], bkps_timestamps_kW[1:])]
df = df.groupby(pd.cut(df.index, bins=b, labels=labels)).mean()
print (df)
Date
2013-06-19 00:15:00-2013-06-19 05:15:00 2.5
2013-06-19 05:15:00-2013-06-19 16:30:00 4.5
2013-06-19 16:30:00-2014-06-18 00:00:00 6.5
如果需要关闭左区间cut
:
df = df.groupby(pd.cut(df.index, bins=b, labels=labels, right=False)).mean()
print (df)
Date
2013-06-19 00:15:00-2013-06-19 05:15:00 1.5
2013-06-19 05:15:00-2013-06-19 16:30:00 3.5
2013-06-19 16:30:00-2014-06-18 00:00:00 5.5