首页 > 解决方案 > 在 GoogleBigQuery 中计算具有相同 id 的行之间的 time_diff

问题描述

我正在使用 BigQuery 练习我的 SQL 技能,并且我正在尝试计算每辆自行车的自行车租金之间的时间差。基本上,我想为每个不同的 ID 计算 time_diff,对于每一对具有相同自行车 ID 的行。我正在尝试找到每个 bikeid 的 time_diff 分布的中位数现在,我有:

SELECT bikeid,
       DATE_DIFF(date(start_time), date(prev_start_time), day) AS Tempo,
       OrderCount
FROM ( SELECT bikeid,
              start_time, 
              ROW_NUMBER() OVER(PARTITION BY bikeid ORDER BY start_time ASC) OrderCount,
              LAG(start_time) OVER(PARTITION BY bikeid ORDER BY start_time ASC) prev_start_time
       FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` 
     ) 
ORDER BY bikeid, start_time 

我正在使用公共 BigQuery 数据集bigquery-public-data.austin_bikeshare.bikeshare_trips,我的结果很奇怪,因为它没有显示任何自行车 ID(我已经期望很多空值(0)作为 date_diff,因为数据库注册了时间戳,有时自行车租了很多一天内的次数)。

    | Linha | bikeid | Tempo | OrderCount |
    |   1   |  null  | null  |     1      |
    |   2   |  null  |  57   |     2      |
    |   3   |  null  |  1    |     3      |

标签: sqldatabasestatisticsgoogle-bigquery

解决方案


bikeid 列中有很多空值。您看到的是空值,因为 ASC 订单将首先获取空值。您可以选择的选项很少 • 您可以将您的 order by 子句更改为 bikeid 上的 DESC SELECT bikeid, DATE_DIFF(date(start_time), date(prev_start_time), day) AS Tempo, OrderCount FROM ( SELECT bikeid, start_time, ROW_NUMBER() OVER(PARTITION BY bikeid ORDER BY start_time ASC) OrderCount, LAG(start_time) OVER(PARTITION BY bikeid ORDER BY start_time ASC bigquery-public-data.austin_bikeshare.bikeshare_trips
) prev_start_time FROM
ORDER BY bikeid desc, start_time • 您可以通过添加 where 子句“where bikeid is not null”来删除 null bikeid SELECT bikeid, DATE_DIFF(date(start_time), date(prev_start_time), day) AS Tempo, OrderCount FROM ( SELECT bikeid, start_time , ROW_NUMBER() OVER(PARTITION BY bikeid ORDER BY start_time ASC) OrderCount, LAG(start_time) OVER(PARTITION BY bikeid ORDER BY start_time ASC) prev_start_time FROM bigquery-public-data.austin_bikeshare.bikeshare_trips
where bikeid is not null )
ORDER BY OrderCount desc, bikeid desc, start_time


推荐阅读