首页 > 解决方案 > Python:两列之间的时间差(以小时为单位)

问题描述

我在这个数据框中有以下两列。

      DATE1                       DATE2

2020-07-08 23:54:17.0   2020-07-09 19:00:56.9970000
2020-07-08 08:22:28.0   2020-07-08 13:23:10.3630000
2020-07-08 10:24:25.0   2020-07-08 13:25:30.8990000
2020-07-08 20:19:35.0   2020-07-09 18:57:07.6900000
2020-07-08 06:07:45.0   2020-07-08 13:20:49.9960000
2020-07-08 10:20:25.0   2020-07-08 13:25:20.0390000
2020-07-08 19:18:23.0   2020-07-09 18:56:06.6550000
2020-07-08 22:12:03.0   2020-07-09 18:59:11.6250000
2020-07-08 09:38:44.0   2020-07-08 13:24:44.9820000
2020-07-08 09:54:44.0   2020-07-08 13:24:45.3750000
2020-07-08 06:23:45.0   2020-07-08 13:21:05.5150000
2020-07-08 18:49:17.0   2020-07-09 18:55:41.9710000
2020-07-08 19:47:23.0   2020-07-09 18:56:37.7690000
2020-07-08 10:48:25.0   2020-07-08 13:25:45.0060000
2020-07-08 05:30:45.0   2020-07-08 13:20:15.8920000
2020-07-08 06:09:45.0   2020-07-08 13:20:54.9810000

我想找出这些时间戳之间的差异并添加一个布尔列,说明这两个日期之间的差异是否大于 24 小时。

我尝试了以下代码段,但得到了错误:'unsupported operand type(s) for -:'str' and 'str''

df['diff_hours'] = df['DATE2'] - df['DATE1']
df['diff_hours']= df['diff_hours']/np.timedelta64(1,'h')

有人可以帮我解决这个片段或有其他方法来轻松解决这个问题吗?提前致谢!

标签: python-3.xpandasdataframedatetimepython-datetime

解决方案


The sample data doesn't have the time difference greater than 24hrs

In [26]: df = pd.read_csv("a.csv", parse_dates=["DATE1","DATE2"])

In [27]: df
Out[27]:
                 DATE1                   DATE2
0  2020-07-08 23:54:17 2020-07-09 19:00:56.997
1  2020-07-08 08:22:28 2020-07-08 13:23:10.363
2  2020-07-08 10:24:25 2020-07-08 13:25:30.899
3  2020-07-08 20:19:35 2020-07-09 18:57:07.690
4  2020-07-08 06:07:45 2020-07-08 13:20:49.996
5  2020-07-08 10:20:25 2020-07-08 13:25:20.039
6  2020-07-08 19:18:23 2020-07-09 18:56:06.655
7  2020-07-08 22:12:03 2020-07-09 18:59:11.625
8  2020-07-08 09:38:44 2020-07-08 13:24:44.982
9  2020-07-08 09:54:44 2020-07-08 13:24:45.375
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981

In [28]: df["diff_hours"] = (df.DATE2-df.DATE1).astype('timedelta64[h]')

In [29]: df
Out[29]:
                 DATE1                   DATE2  diff_hours
0  2020-07-08 23:54:17 2020-07-09 19:00:56.997        19.0
1  2020-07-08 08:22:28 2020-07-08 13:23:10.363         5.0
2  2020-07-08 10:24:25 2020-07-08 13:25:30.899         3.0
3  2020-07-08 20:19:35 2020-07-09 18:57:07.690        22.0
4  2020-07-08 06:07:45 2020-07-08 13:20:49.996         7.0
5  2020-07-08 10:20:25 2020-07-08 13:25:20.039         3.0
6  2020-07-08 19:18:23 2020-07-09 18:56:06.655        23.0
7  2020-07-08 22:12:03 2020-07-09 18:59:11.625        20.0
8  2020-07-08 09:38:44 2020-07-08 13:24:44.982         3.0
9  2020-07-08 09:54:44 2020-07-08 13:24:45.375         3.0
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515         6.0
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971        24.0
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769        23.0
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006         2.0
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892         7.0
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981         7.0

In [30]: df["status"] = df["diff_hours"] > 24

In [31]: df
Out[31]:
                 DATE1                   DATE2  diff_hours  status
0  2020-07-08 23:54:17 2020-07-09 19:00:56.997        19.0   False
1  2020-07-08 08:22:28 2020-07-08 13:23:10.363         5.0   False
2  2020-07-08 10:24:25 2020-07-08 13:25:30.899         3.0   False
3  2020-07-08 20:19:35 2020-07-09 18:57:07.690        22.0   False
4  2020-07-08 06:07:45 2020-07-08 13:20:49.996         7.0   False
5  2020-07-08 10:20:25 2020-07-08 13:25:20.039         3.0   False
6  2020-07-08 19:18:23 2020-07-09 18:56:06.655        23.0   False
7  2020-07-08 22:12:03 2020-07-09 18:59:11.625        20.0   False
8  2020-07-08 09:38:44 2020-07-08 13:24:44.982         3.0   False
9  2020-07-08 09:54:44 2020-07-08 13:24:45.375         3.0   False
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515         6.0   False
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971        24.0   False
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769        23.0   False
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006         2.0   False
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892         7.0   False
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981         7.0   False

推荐阅读