首页 > 解决方案 > 大熊猫平均时间戳之间的数据

问题描述

如果我有一些数据(24 小时时间序列)读入 Pandas:

import pandas as pd
import numpy as np


#read CSV file
df = pd.read_csv('https://raw.githubusercontent.com/bbartling/Building-Demand-Electrical-Load-Profiles/master/july15.csv', 
                 index_col='Date', parse_dates=True)

如何将这些时间戳之间的df列平均到一个新的单独的 pandas df 中?kW

bkps_timestamps_kW = [
'2013-06-19 00:15:00',
'2013-06-19 05:15:00',
'2013-06-19 16:30:00',
'2014-06-18 00:00:00']

新的 pandas df 可能有类似这样的列名avg_kw1, avg_kw2, avg_kw3,表示bkps_timestamps_kW

感谢任何帮助/提示

标签: pythonpandas

解决方案


我认为您需要cut通过转换为日期时间和聚合的列表进行绑定mean

d = [
'2013-06-19 00:00:00',
'2013-06-19 00:15:00',
'2013-06-19 01:15:00',
'2013-06-19 05:15:00',
'2013-06-19 07:15:00',
'2013-06-19 16:30:00',
'2013-06-20 16:30:00',
'2014-06-18 00:00:00',
'2015-06-18 00:00:00']
df = pd.DataFrame({'Date':range(len(d))}, index=pd.to_datetime(d))
print (df)
                     Date
2013-06-19 00:00:00     0
2013-06-19 00:15:00     1
2013-06-19 01:15:00     2
2013-06-19 05:15:00     3
2013-06-19 07:15:00     4
2013-06-19 16:30:00     5
2013-06-20 16:30:00     6
2014-06-18 00:00:00     7
2015-06-18 00:00:00     8

bkps_timestamps_kW = [
'2013-06-19 00:15:00',
'2013-06-19 05:15:00',
'2013-06-19 16:30:00',
'2014-06-18 00:00:00']


b = pd.to_datetime(bkps_timestamps_kW)
labels = [f'{i}-{j}' for i, j in zip(bkps_timestamps_kW[:-1], bkps_timestamps_kW[1:])] 

df = df.groupby(pd.cut(df.index, bins=b, labels=labels)).mean()
print (df)
                                         Date
2013-06-19 00:15:00-2013-06-19 05:15:00   2.5
2013-06-19 05:15:00-2013-06-19 16:30:00   4.5
2013-06-19 16:30:00-2014-06-18 00:00:00   6.5

如果需要关闭左区间cut

df = df.groupby(pd.cut(df.index, bins=b, labels=labels, right=False)).mean()
print (df)
                                         Date
2013-06-19 00:15:00-2013-06-19 05:15:00   1.5
2013-06-19 05:15:00-2013-06-19 16:30:00   3.5
2013-06-19 16:30:00-2014-06-18 00:00:00   5.5

推荐阅读