首页 > 解决方案 > 在 BigQuery Google Analytics 数据中的两个页面之间提取用户旅程数据

问题描述

如何在 Google Analytics BigQuery Export 数据中提取两个特定页面之间的用户旅程数据?

例子:

网站有 100 页:hits.page.pagePath=/page_1hits.page.pagePath=/page_100.

/page_13目标是从到提取用户旅程数据/page_22,包括所有中间页面。

挑战在于旅程不是连续的,例如/page_13-> /page14-> ...-> /page_22

但可能是/page13-> /page_5-> /page_41-> /page_99-> /page_22

标签: sqlgoogle-analyticsgoogle-bigquery

解决方案


您可以使用array_agg(). 如果我理解正确,您希望一个组在第一次到达 page_13 时出现,并在到达 page_22 时结束。

让我假设对于每个用户,您希望第一次点击 13 到 22 的第一次点击。您可以通过以下两个特征来识别组:

select h.*
from (select h.*,
             countif( page like '%/page_13' ) over (partition by user order by hit_time) as hit_13,
             countif( page like '%/page_22' ) over (partition by user order by hit_time) as hit_22,
             countif( page like '%/page_22' ) over (partition by user) as has_22
      from hits h
     ) h
where has_22 and
      hit_13 > 0 and
      (hit_22 = 0 or page like '%/page_22);

这将返回以 13 开头、以 22 结尾的页面,并确保用户两者都有。

现在对于旅程,只需使用聚合。但是,唉,BQ 不允许对数组进行聚合——如果您随后想通过旅程进行总结。所以,我将使用string_agg()

select h.user,
       string_agg(page order by hit_time, ' -> ')
from (select h.*
      from (select h.*,
                   countif( page like '%/page_13' ) over (partition by user order by hit_time) as hit_13,
                   countif( page like '%/page_22' ) over (partition by user order by hit_time) as hit_22,
                   countif( page like '%/page_22' ) over (partition by user) as has_22
            from hits h
           ) h
      where has_22 and
            hit_13 > 0 and
            (hit_22 = 0 or page like '%/page_22)
     ) h
group by user;

推荐阅读