python - 两个时间戳之间的持续时间
问题描述
我有一个为每个用户提供不同时间戳的数据框,我想计算持续时间。我使用此代码导入我的 CSV 文件:
import pandas as pd
import glob
path = r'C:\Users\...\Desktop'
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0,encoding='ISO-8859-1')
li.append(df)
df = pd.concat(li, axis=0, ignore_index=True)
df.head()
ID timestamp
1828765 31-05-2021 22:27:03
1828765 31-05-2021 22:27:12
1828765 31-05-2021 22:27:13
1828765 31-05-2021 22:27:34
2056557 21-07-2021 10:27:12
2056557 21-07-2021 10:27:20
2056557 21-07-2021 10:27:22
我想得到类似的东西
ID timestamp duration(s)
1828765 31-05-2021 22:27:03 NAN
1828765 31-05-2021 22:27:12 9
1828765 31-05-2021 22:27:13 1
1828765 31-05-2021 22:27:34 21
2056557 21-07-2021 10:27:12 NAN
2056557 21-07-2021 10:27:20 8
2056557 21-07-2021 10:27:22 2
我用过这段代码,但对我不起作用
import datetime
df['timestamp'] = pd.to_datetime(df['timestamp'], format = "%d-%m-%Y %H:%M:%S")
df['time_diff'] = 0
for i in range(df.shape[0] - 1):
df['time_diff'][i+1] = (datetime.datetime.min + (df['timestamp'][i+1] - df['timestamp'][i])).time()
解决方案
在值组上发生的操作是GroupBy操作pandas
。
pandas
原生支持对时间戳的数学运算。因此,减法将给出任意两个时间戳之间的正确持续时间。
我们已经成功地将timestamp
列转换为datetime64[ns]
df['timestamp'] = pd.to_datetime(df['timestamp'], format="%d-%m-%Y %H:%M:%S")
现在我们可以用Groupby.diff
df['duration'] = df.groupby('ID')['timestamp'].diff()
df
ID timestamp duration
0 1828765 2021-05-31 22:27:03 NaT
1 1828765 2021-05-31 22:27:12 0 days 00:00:09
2 1828765 2021-05-31 22:27:13 0 days 00:00:01
3 1828765 2021-05-31 22:27:34 0 days 00:00:21
4 2056557 2021-07-21 10:27:12 NaT
5 2056557 2021-07-21 10:27:20 0 days 00:00:08
6 2056557 2021-07-21 10:27:22 0 days 00:00:02
如果我们想获得以秒为单位的持续时间,我们可以使用以下方法提取总秒数Series.dt.total_seconds
:
df['duration (s)'] = df.groupby('ID')['timestamp'].diff().dt.total_seconds()
df
:
ID timestamp duration (s)
0 1828765 2021-05-31 22:27:03 NaN
1 1828765 2021-05-31 22:27:12 9.0
2 1828765 2021-05-31 22:27:13 1.0
3 1828765 2021-05-31 22:27:34 21.0
4 2056557 2021-07-21 10:27:12 NaN
5 2056557 2021-07-21 10:27:20 8.0
6 2056557 2021-07-21 10:27:22 2.0
完整的工作示例:
import pandas as pd
df = pd.DataFrame({
'ID': [1828765, 1828765, 1828765, 1828765, 2056557, 2056557, 2056557],
'timestamp': ['31-05-2021 22:27:03', '31-05-2021 22:27:12',
'31-05-2021 22:27:13', '31-05-2021 22:27:34',
'21-07-2021 10:27:12', '21-07-2021 10:27:20',
'21-07-2021 10:27:22']
})
df['timestamp'] = pd.to_datetime(df['timestamp'], format="%d-%m-%Y %H:%M:%S")
df['duration (s)'] = df.groupby('ID')['timestamp'].diff().dt.total_seconds()
print(df)
推荐阅读
- ios - 如何将 UIView 附加到 UITextView?
- python - python中使用cx_freeze的问题,错误查找文件
- angular - Angular test gives error - Failed: Cannot read property 'className' of undefined
- javascript - 带有 ajax 选择的字段包含有关从搜索字段读取的产品的数据
- python - 加载前修改模块命名空间
- javascript - 如何访问嵌套数组和对象数据结构中的属性?
- javascript - 如何在 node express 中获得类似 username.mywebsite.com 或仪表板的路由
- powerbi - Power BI 中的重新格式化列
- terraform - 使用 AWS ECS 进行蓝绿色部署
- amazon-s3 - 如何获取 aws-iam-token 以使用 IRSA 访问 S3?