首页 > 解决方案 > In SQL how do I find the first record per-user if it's within a time slice, without scanning the entire DB


I've got a database, user_requests that basically looks like this:

  user_id  |    request_timestamp    | request_type | other_metadata
  user1    |    2018-11-01:04:04:41  |    type1     | opaquedata_A
  user2    |    2018-11-01:04:03:41  |    type2     | opaquedata_B
  user1    |    2018-11-01:04:01:41  |    type1     | opaquedata_C
  user3    |    2018-11-01:04:05:41  |    type3     | opaquedata_D
  user4    |    2018-11-01:04:01:41  |    type4     | opaquedata_E

And it is huge. Doing any operation over the entire thing is absolutely untenable, everything needs to be like "which queries were most common this month" no one ever checks it overall.

What I'm trying to do is some analysis on the first requests for several user. I absolutely do not need the first requests of every user or over all-time, as long as it's a representative sample.

However I'm running into a problem where all my usual attempts to restrict this are finding "the first request within bounds" not "the first request if it's within bounds"

              first_value(request_type) over (PARTITION BY user_id ORDER BY request_timestamp
                rows BETWEEN unbounded preceding and unbounded following) requestType,
              first_value(other_metadata) over (PARTITION BY user_id ORDER BY request_timestamp
                rows BETWEEN unbounded preceding and unbounded following) otherMetadata,
              first_value(request_timestamp) over (PARTITION BY user_id ORDER BY request_timestamp
                rows BETWEEN unbounded preceding and unbounded following) utteranceTimestamp
FROM user_requests
WHERE request_timestamp BETWEEN '2018-11-01' AND request_timestamp < '2018-12-01'

Like this finds the earliest request from a user in November, when what I want is the earliest request from a user overall if that request is in November.

Any idea how I can get what I want while still writing queries that don't take hours to complete?

标签: sqlamazon-redshift



SELECT Curr.user_id, Curr.request_type, Curr.other_metadata, Curr.request_timestamp
FROM User_Requests Curr
WHERE  Curr.request_timestamp >='2018-11-01' 
       AND Curr.request_timestamp < '2018-12-01'
                       FROM User_Requests Prev
                       WHERE Prev.user_id = Curr.user_id
                             AND Prev.request_timestamp < Curr.request_timestamp)


为获得最佳结果,您需要在(user_id, request_timestamp).

奖金LEFT JOIN排除形式,以防它表现更好。

SELECT Curr.user_id, Curr.request_type, Curr.other_metadata, Curr.request_timestamp
FROM User_Requests Curr
LEFT JOIN User_Requests Prev
       ON Prev.user_id = Curr.user_id
          AND Prev.request_timestamp < Curr.request_timestamp
WHERE  Curr.request_timestamp >='2018-11-01' 
       AND Curr.request_timestamp < '2018-12-01'
       AND Prev.user_id IS NULL
