首页 > 解决方案 > 优化计算日期时间差的查询

问题描述

我有一个 SQL 表:

+---------+----------+---------------------+---------------------+---------+
| id      | party_id | begintime           | endtime             | to_meas |
+---------+----------+---------------------+---------------------+---------+
| 1395035 |     9255 | 2010-09-26 00:34:02 | 2010-09-26 03:56:20 |       0 |
| 1395036 |     8974 | 2009-07-10 11:00:00 | 2009-07-10 21:30:00 |       0 |
| 1395037 |     8974 | 2009-07-10 23:14:00 | 2009-07-11 08:48:00 |       0 |
| 1395038 |     8975 | 2009-07-10 11:00:00 | 2009-07-10 21:30:00 |       0 |
| 1395039 |     8975 | 2009-07-10 23:14:00 | 2009-07-11 08:48:00 |       0 |
| 1395040 |     8974 | 2009-07-11 10:08:31 | 2009-07-12 18:49:51 |       0 |
| 1395041 |     8975 | 2009-07-11 10:08:31 | 2009-07-12 18:49:51 |       0 |
| 1395042 |     8974 | 2009-07-12 20:38:27 | 2009-07-13 20:33:21 |       0 |
| 1395043 |     8975 | 2009-07-12 20:38:27 | 2009-07-13 20:33:21 |       0 |
| 1395044 |     8974 | 2009-07-13 21:57:37 | 2009-07-15 08:25:45 |       0 |
| 1395045 |     8975 | 2009-07-13 21:57:37 | 2009-07-15 08:25:45 |       0 |
| 1395046 |     8974 | 2009-07-15 08:51:25 | 2009-07-16 10:29:13 |       0 |
| 1395047 |     8975 | 2009-07-15 08:51:25 | 2009-07-16 10:29:13 |       0 |
| 1395048 |     8974 | 2009-07-16 12:22:22 | 2009-07-17 14:39:10 |       0 |
| 1395049 |     8975 | 2009-07-16 12:22:22 | 2009-07-17 14:39:10 |       0 |
| 1395050 |     8976 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 |       0 |
| 1395051 |     8977 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 |       0 |
| 1395052 |     8978 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 |       0 |
| 1395053 |     8979 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 |       0 |
| 1395054 |     8976 | 2009-07-25 10:47:14 | 2009-07-26 09:41:44 |       0 |
+---------+----------+---------------------+---------------------+---------+
...

如果此差异> 30 分钟,我需要计算两者之间的时间begintimeendtime设置为 1。to_meas这是我在 MySQL 中的尝试:

update doses d set to_meas=1 where d.id in 
  (select a.id from party join (select * from doses) a 
  on party_id=a.party_id 
  left join (select * from doses) b 
  on party.id=b.party_id 
  and b.begintime=(select min(begintime) 
  from (select * from doses) c 
  where c.begintime > a.endtime) 
  and timestampdiff(minute, a.endtime, b.begintime) > 30 
  group by party.id);

该命令永远(准)运行。我试图在 python 中做到这一点pandas

conn = engine.connect()
sql =
    '''
    select doses.id, party_id, party.ml, begintime, endtime
    from doses join party on party.id=doses.party_id
'''
df = pd.read_sql(con=conn, sql=sql,
measure = df.groupby('party_id', as_index=False).apply(
    lambda x: x[pd.to_datetime(x['begintime']) - 
    pd.to_datetime(x.shift()['endtime']) > pd.to_timedelta('30 minutes')])
measure_ids = measure['id'].to_list()
measure_list = ','.join([str(x) for x in measure_ids])
conn.execute(
    'update doses set to_meas=true where id in(%s)' % measure_list)

最后一条语句运行大约 10 秒。pandas有没有一种方法可以优化 SQL 代码,使其运行速度与`一样快?

标签: mysqlsql

解决方案


您可以使用以下方式更新数据:

Update doses d
   Set meas = 1
 Where begintime > (select max(dd.endtime) + interval '30' minute 
From doses dd where dd.begintime < d.begintime
  And dd.party_id = d.party_id)

推荐阅读