首页 > 解决方案 > 使用 df,groupby 时组内的记录数不正确

问题描述

我遵循从此处获取的修改代码,根据时间戳将行拆分为 5 秒组。

df = pd.read_csv(file_name, delimiter=',')
df['dt'] = pd.to_datetime(df['datetime'], unit='s')
for g in df.groupby(pd.Grouper(freq='5s', key='dt')):
    print(f'Start time {g[0]} has {len(g)} records within 5 secs')

但我在组内得到的记录数不正确。

输出

Start time 2017-05-02 16:00:45 has 2 records within 5 secs
...

示例 CSV 如下所示

datetime,x,y,z,label
1493740845,0.0004,-0.0001,0.0045,bad
1493740846,0.0004,0.0006,0.0049,bad
1493740847,0.0002,0.0013,0.0044,bad
1493740848,0.0002,0.0005,0.0046,bad
1493740849,0.0006,0.0006,0.0038,bad
1493740850,0.0009,0.0002,0.0038,bad
...

标签: pythonpandas

解决方案


g2 个值的元组,所以总是 get 2

我认为您可以将元组解压缩为nameg变量,然后像您需要的那样工作:

for name, g in df.groupby(pd.Grouper(freq='5s', key='dt')):
    print(f'Start time {name} has {len(g)} records within 5 secs')

Start time 2017-05-02 16:00:45 has 5 records within 5 secs
Start time 2017-05-02 16:00:50 has 1 records within 5 secs

在您的解决方案中使用g[1]s length

for g in df.groupby(pd.Grouper(freq='5s', key='dt')):
    print(f'Start time {g[0]} has {len(g[1])} records within 5 secs')

推荐阅读