首页 > 解决方案 > 对数组求和页面浏览量

问题描述

我已经尝试解决一个问题几天了,但我完全陷入困境:

这就是表应该的样子。(在浏览量 ID 上计数不同)

正确的数据

这是我的会话表 SQL 的样子: 错误数据

我很确定我误解了数组求和和 array_length 的工作方式,但我不知道,出了什么问题......

会话表的 SQL

with all_page_views as (
  select
    *
  from
    `page_views_table`

),
sessions_agg as (
  select
    pv.session_id,
    array_agg(
      pv
      order by
        pv.page_view_in_session_index
    ) as all_pageviews
  from
    all_page_views as pv
  group by
    1
),
sessions_agg_xf as (
  select
    session_id,
    all_pageviews,
    (
      select
        struct(
          min(page_view_start) as session_start,
          max(page_view_end) as session_end
        )
      from
        unnest(all_pageviews)
    ) as timing
  from
    sessions_agg
),
sessions as (
  select
    timing.session_start,
    timing.session_end,
    array_length(all_pageviews) as count_page_views
  from
    sessions_agg_xf
)
select
  sum(count_page_views )
from
  sessions

  where date(session_start) = "2020-02-01"

标签: google-bigquery

解决方案


我相信我在其他地方发现了问题。Snowplow 中有一个错误没有重置会话 ID,所以我的会话是错误的......

https://github.com/snowplow/snowplow-javascript-tracker/issues/718


推荐阅读