python - 计算时间间隔内列值的平均值
问题描述
我有数据框
id timestamp data gradient Start
timestamp
2020-01-15 06:12:49.213 40250 2020-01-15 06:12:49.213 20.0 0.00373 NaN
2020-01-15 06:12:49.313 40251 2020-01-15 06:12:49.313 19.5 0.00354 0.0
2020-01-15 08:05:10.083 40256 2020-01-15 08:05:10.083 20.0 0.00020 1.0
2020-01-15 08:05:10.183 40257 2020-01-15 08:05:10.183 20.5 -0.00440 0.0
...
2020-01-31 09:01:50.993 40310 2020-01-31 09:01:50.993 21.0 0.55473 1.0
2020-01-31 09:01:51.093 40311 2020-01-31 09:01:51.093 21.5 0.00589 0.0
...
我想找到data
介于两者之间start_time ==1
的平均值30 seconds
。
可重现的例子:
d = {'timestamp':["2020-01-15 06:12:49.213", "2020-01-15 06:12:49.313", "2020-01-15 08:05:10.083", "2020-01-15 08:05:10.183", "2020-01-15 09:01:50.993", "2020-01-15 09:01:51.093", "2020-01-15 09:51:01.890", "2020-01-15 09:51:01.990", "2020-01-15 10:40:59.657", "2020-01-15 10:40:59.757", "2020-01-15 10:42:55.693", "2020-01-15 10:42:55.793", "2020-01-15 10:45:35.767", "2020-01-15 10:45:35.867", "2020-01-15 10:45:46.770", "2020-01-15 10:45:46.870", "2020-01-15 10:47:19.783", "2020-01-15 10:47:19.883", "2020-01-15 10:47:22.787"],
'data': [20.0, 19.5, 20.0, 20.5, 21.0, 21.5, 22.0, 22.5, 23.0, 23.5, 23.0, 22.5, 23.0, 23.5, 24.0, 24.5, 25.0, 25.5, 26],
'gradient': [NaN, NaN, 0.000000, 0.000148, 0.000294, 0.000294, 0.000339, 0.000339, 0.000334, 0.000334, 0.000000, -0.008618, 0.000000, 0.006247, 0.090884, 0.090884, 0.010751, 0.010751, 0.332889],
'Start': [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,]
}
df = pd.DataFrame(d)
预期输出:
start_time end_time Average
2020-01-15 08:05:10.083 2020-01-15 09:01:51.093 20.25 = average of (20.0, 20.5)
2020-01-15 10:45:35.767 2020-01-15 10:45:35.767 23.75 = average of (23.0, 23.5, 24.0, 24.5)
编辑:
使用@jezrael 的代码:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['g'] = df['Start'].cumsum()
df1 = df[df['g'].ne(0)].copy()
#
s = df1.groupby('g')['timestamp'].transform('first')
df1 = df1[df1['timestamp'].between(s, s + pd.Timedelta(30, 's'))]
#
df2 = df1.groupby('g').agg(start_time=('timestamp','first'),
end_time=('timestamp','last'),
Average=('data','mean')).reset_index(drop=True)
print (df2)
似乎有些开始和结束时间非常接近,大约相差 0.1 秒。这是数据采集设备的故障,每次记录 2 个数据点,而不是 1 个,并且数据点0.5
有data
. 此外,数据点很少,导致开始和结束时间在一个30 seconds
时间间隔内非常接近。我的问题是,如果我们向前填充样品,是否有可能?以便有更多的数据来衡量。
解决方案
首先获取timestamp
每组GroupBy.transform
,GroupBy.first
然后比较Series.between
:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['g'] = df['Start'].cumsum()
df1 = df[df['g'].ne(0)].copy()
#
s = df1.groupby('g')['timestamp'].transform('first')
df1 = df1[df1['timestamp'].between(s, s + pd.Timedelta(30, 's'))]
#
df2 = df1.groupby('g').agg(start_time=('timestamp','first'),
end_time=('timestamp','last'),
Average=('data','mean')).reset_index(drop=True)
print (df2)
start_time end_time Average
0 2020-01-15 08:05:10.083 2020-01-15 08:05:10.183 20.25
1 2020-01-15 10:45:35.767 2020-01-15 10:45:46.870 23.75
推荐阅读
- swift - 无法在 UITableViewController 中显示 API。“[DriverStanding] 没有成员 Driver”
- scala - 如何使用 Akka Stream 和 Akk-Http 流式传输响应
- c - 使其他进程正确处理 SIGTERM
- javascript - 错误无法读取未定义的属性“添加”
- c# - 控制台应用程序入口点和静态非异步方法
- python - 无法从 Python 中的另一个文件夹导入模块
- mysql - MySQL - 从当天数据中减去前一天数据,有特殊条件
- c++ - 对 Eigen 块的自动引用未按预期运行
- c# - c#根据一个共同的id合并两个文本文件
- vba - 在某个循环结束之前,无法让我的脚本处理错误