首页 > 解决方案 > SQL: how to create a column to record the length of a user's session time?

问题描述

I have a table which looks like the following (it has been ordered by user_id and timestamp):

before

I'm looking to create a new feature called session_time, which records how long it has been since the start of a user's session. If there is a gap of more than 20 from one timestamp to the next for a particular user, I would like the session_time to reset to 0. Otherwise, I would like session_time to increase by the gap between the two timestamps.

Here is an example of what I would like the table with the new feature to look like:

after

I'm struggling to implement this in SQL and would really appreciate any help.

The following creates the sample table to save transcription:

CREATE TABLE t (row_id int,user_id int,timestamp int);
INSERT INTO t (row_id,user_id,timestamp) VALUES (1, 115, 0);
INSERT INTO t (row_id,user_id,timestamp) VALUES (2, 115, 1);
INSERT INTO t (row_id,user_id,timestamp) VALUES (3, 115, 3);
INSERT INTO t (row_id,user_id,timestamp) VALUES (4, 115, 28);
INSERT INTO t (row_id,user_id,timestamp) VALUES (5, 115, 29);
INSERT INTO t (row_id,user_id,timestamp) VALUES (6, 115, 0);
INSERT INTO t (row_id,user_id,timestamp) VALUES (7, 115, 2);
INSERT INTO t (row_id,user_id,timestamp) VALUES (8, 115, 45);
INSERT INTO t (row_id,user_id,timestamp) VALUES (9, 115, 47);

标签: sqlgoogle-bigquery

解决方案


您可以使用它lag()来确定会话开始的位置。然后使用算术和累积最大值来获取会话开始的时间:

select t.*,
       (timestamp -
        max(case when timestamp - prev_timestamp > 20 or prev_timestamp is null
                 then timestamp
            end) over (partition by user_id order by timestamp)
       )
from (select t.*,
             lag(timestamp) over (partition by user_id order by timestamp) as prev_timestamp
      from t
     ) t;

推荐阅读