首页 > 解决方案 > 两个时间戳之间的持续时间

问题描述

我有一个为每个用户提供不同时间戳的数据框,我想计算持续时间。我使用此代码导入我的 CSV 文件:

import pandas as pd
import glob

path = r'C:\Users\...\Desktop' 
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0,encoding='ISO-8859-1')
    li.append(df)

df = pd.concat(li, axis=0, ignore_index=True)

df.head()

  ID     timestamp
1828765  31-05-2021 22:27:03    
1828765  31-05-2021 22:27:12    
1828765  31-05-2021 22:27:13    
1828765  31-05-2021 22:27:34
2056557  21-07-2021 10:27:12
2056557  21-07-2021 10:27:20
2056557  21-07-2021 10:27:22

我想得到类似的东西

   ID    timestamp             duration(s)
1828765  31-05-2021 22:27:03    NAN
1828765  31-05-2021 22:27:12    9
1828765  31-05-2021 22:27:13    1
1828765  31-05-2021 22:27:34    21
2056557  21-07-2021 10:27:12    NAN
2056557  21-07-2021 10:27:20    8
2056557  21-07-2021 10:27:22    2

我用过这段代码,但对我不起作用

import datetime
df['timestamp'] =  pd.to_datetime(df['timestamp'], format = "%d-%m-%Y %H:%M:%S") 
df['time_diff'] = 0
for i in range(df.shape[0] - 1):
    df['time_diff'][i+1] = (datetime.datetime.min +  (df['timestamp'][i+1] - df['timestamp'][i])).time()

标签: pythonpandasdataframe

解决方案


在值组上发生的操作是GroupBy操作pandas

pandas原生支持对时间戳的数学运算。因此,减法将给出任意两个时间戳之间的正确持续时间。

我们已经成功地将timestamp列转换为datetime64[ns]

df['timestamp'] = pd.to_datetime(df['timestamp'], format="%d-%m-%Y %H:%M:%S")

现在我们可以用Groupby.diff

df['duration'] = df.groupby('ID')['timestamp'].diff()

df

        ID           timestamp        duration
0  1828765 2021-05-31 22:27:03             NaT
1  1828765 2021-05-31 22:27:12 0 days 00:00:09
2  1828765 2021-05-31 22:27:13 0 days 00:00:01
3  1828765 2021-05-31 22:27:34 0 days 00:00:21
4  2056557 2021-07-21 10:27:12             NaT
5  2056557 2021-07-21 10:27:20 0 days 00:00:08
6  2056557 2021-07-21 10:27:22 0 days 00:00:02

如果我们想获得以秒为单位的持续时间,我们可以使用以下方法提取总秒数Series.dt.total_seconds

df['duration (s)'] = df.groupby('ID')['timestamp'].diff().dt.total_seconds()

df

        ID           timestamp  duration (s)
0  1828765 2021-05-31 22:27:03           NaN
1  1828765 2021-05-31 22:27:12           9.0
2  1828765 2021-05-31 22:27:13           1.0
3  1828765 2021-05-31 22:27:34          21.0
4  2056557 2021-07-21 10:27:12           NaN
5  2056557 2021-07-21 10:27:20           8.0
6  2056557 2021-07-21 10:27:22           2.0

完整的工作示例:

import pandas as pd

df = pd.DataFrame({
    'ID': [1828765, 1828765, 1828765, 1828765, 2056557, 2056557, 2056557],
    'timestamp': ['31-05-2021 22:27:03', '31-05-2021 22:27:12',
                  '31-05-2021 22:27:13', '31-05-2021 22:27:34',
                  '21-07-2021 10:27:12', '21-07-2021 10:27:20',
                  '21-07-2021 10:27:22']
})

df['timestamp'] = pd.to_datetime(df['timestamp'], format="%d-%m-%Y %H:%M:%S")
df['duration (s)'] = df.groupby('ID')['timestamp'].diff().dt.total_seconds()
print(df)

推荐阅读