python - 熊猫转换时间戳和每月摘要
问题描述
我有几个通过 Pandas 导入的 .csv 文件,然后计算出数据摘要(最小值、最大值、平均值),最好是每周和每月报告。我有以下代码,但似乎无法使月份摘要起作用,我确定问题出在时间戳转换上。
我究竟做错了什么?
import pandas as pd
import numpy as np
#Format of the data that is been imported
#2017-05-11 18:29:14+00:00,264.0,987.99,26.5,23.70,512.0,11.763,52.31
df = pd.read_csv('data.csv')
df['timestamp'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
print 'month info'
print [g for n, g in df.groupby(pd.Grouper(key='timestamp',freq='M'))]
print(data.groupby('timestamp')['light'].mean())
解决方案
IIUC,您几乎拥有它,并且您的日期时间转换很好。这是一个例子:
从这样的数据框开始(这是您的示例行,重复稍作修改):
>>> df
time x y z a b c d
0 2017-05-11 18:29:14+00:00 264.0 947.99 24.5 53.7 511.0 11.463 12.31
1 2017-05-15 18:29:14+00:00 265.0 957.99 25.5 43.7 512.0 11.563 22.31
2 2017-05-21 18:29:14+00:00 266.0 967.99 26.5 33.7 513.0 11.663 32.31
3 2017-06-11 18:29:14+00:00 267.0 977.99 26.5 23.7 514.0 11.763 42.31
4 2017-06-22 18:29:14+00:00 268.0 997.99 27.5 13.7 515.0 11.800 52.31
你可以用你的日期时间做你以前做过的事情:
df['timestamp'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
然后分别获取您的摘要:
monthly_mean = df.groupby(pd.Grouper(key='timestamp',freq='M')).mean()
monthly_max = df.groupby(pd.Grouper(key='timestamp',freq='M')).max()
monthly_min = df.groupby(pd.Grouper(key='timestamp',freq='M')).min()
weekly_mean = df.groupby(pd.Grouper(key='timestamp',freq='W')).mean()
weekly_min = df.groupby(pd.Grouper(key='timestamp',freq='W')).min()
weekly_max = df.groupby(pd.Grouper(key='timestamp',freq='W')).max()
# Examples:
>>> monthly_mean
x y z a b c d
timestamp
2017-05-31 265.0 957.99 25.5 43.7 512.0 11.5630 22.31
2017-06-30 267.5 987.99 27.0 18.7 514.5 11.7815 47.31
>>> weekly_mean
x y z a b c d
timestamp
2017-05-14 264.0 947.99 24.5 53.7 511.0 11.463 12.31
2017-05-21 265.5 962.99 26.0 38.7 512.5 11.613 27.31
2017-05-28 NaN NaN NaN NaN NaN NaN NaN
2017-06-04 NaN NaN NaN NaN NaN NaN NaN
2017-06-11 267.0 977.99 26.5 23.7 514.0 11.763 42.31
2017-06-18 NaN NaN NaN NaN NaN NaN NaN
2017-06-25 268.0 997.99 27.5 13.7 515.0 11.800 52.31
或者将它们聚合在一起以获得带有摘要的多索引数据框:
monthly_summary = df.groupby(pd.Grouper(key='timestamp',freq='M')).agg(['mean', 'min', 'max'])
weekly_summary = df.groupby(pd.Grouper(key='timestamp',freq='W')).agg(['mean', 'min', 'max'])
# Example of summary of row 'x':
>>> monthly_summary['x']
mean min max
timestamp
2017-05-31 265.0 264.0 266.0
2017-06-30 267.5 267.0 268.0
>>> weekly_summary['x']
mean min max
timestamp
2017-05-14 264.0 264.0 264.0
2017-05-21 265.5 265.0 266.0
2017-05-28 NaN NaN NaN
2017-06-04 NaN NaN NaN
2017-06-11 267.0 267.0 267.0
2017-06-18 NaN NaN NaN
2017-06-25 268.0 268.0 268.0
推荐阅读
- swift - 快速比较时间
- node.js - 誓言 facebook 错误:“URL 被阻止:此重定向失败,因为...”并且无法获取
- android - 使用 ZXing 1D 条码阅读器的 CameraX 实现,ImageProxy 从目标旋转 90 度并且找不到旋转它的方法
- sql - 从数据库表中计算多个值/运算符的布尔结果
- oracle - 在 sqlplus 与 sql developer 中运行时,Oracle 时区偏移量不同
- c# - 无法将字符串转换为字节 []
- c - 我如何阅读带有 fread 的列表?
- python - 我如何在决策树中应用 x_train 和 y_train
- java - 尝试从凌空响应调用返回 JSON 对象
- docker - 无法理解 Docker Compose 环境变量