首页 > 解决方案 > 在 Bigquery 中访问主页后找出用户的第二页

问题描述

我想知道用户登陆 BigQuery 主页后访问了哪个页面。

以下是我到目前为止想到的查询,但是当我查看第二页时返回的结果与 Google Analytics 不一致(来自 [行为] --> [站点内容] --> [所有页面] --> 主要维度:着陆页,二级维度:第二页)。

但是,当我查看 Next Pagepath 时,查询结果匹配(从 [Site Content] --> [All Pages] --> Primary Dimension: Landing Page, Secondary Dimension: Next Page path)。

多篇文章说使用 [Second page] 比 [Next pagepath] 更合适。

-- Total pageviews by pagepath after landing homepage
#standardSQL
SELECT
  next_page.pagePath AS pagePath,
  COUNT(*) as pageviews
FROM
  (
    SELECT
      CONCAT(fullVisitorId, ".", CAST(visitId AS STRING)) AS session_id,
      hits.page.pagePath AS pagePath,
      hits.hitNumber AS hitNumber,
      hits.type AS type
    FROM
      `GA_data.ga_sessions_*`,
      UNNEST(hits) as hits
    WHERE
      _TABLE_SUFFIX BETWEEN '20190814'
      AND '20191008'
      AND hits.type = 'PAGE'
      AND hits.page.pagePath = '/***/' -- Landing page URL
      AND hits.isEntrance = TRUE
      AND totals.visits = 1
  ) AS landing_hp
  INNER JOIN (
    SELECT
      CONCAT(fullVisitorId, ".", CAST(visitId AS STRING)) AS session_id,
      hits.page.pagePath AS pagePath,
      hits.hitNumber AS hitNumber,
      hits.type AS type
    FROM
      `GA_data.ga_sessions_*`,
      UNNEST(hits) as hits
    WHERE
      _TABLE_SUFFIX BETWEEN '20190814'
      AND '20191008'
      AND hits.type = 'PAGE'
      AND hits.isEntrance IS NULL
      AND totals.visits = 1
  ) AS next_page ON landing_hp.session_id = next_page.session_id
WHERE
  landing_hp.hitNumber < next_page.hitNumber
GROUP BY
  pagePath
ORDER BY
  pageviews DESC 

谁能告诉我为什么会发生这种情况以及我应该使用什么查询?

标签: google-analyticsgoogle-bigquery

解决方案


您的第二张表不一定返回第二页,不是吗?但是,在涉及所有连接的情况下这样做也有点效率低下。

最好切断所有连接并使用子查询:

SELECT 
  fullvisitorid -- identify user
  ,visitstarttime -- identify session per user
  -- visitid is timestamp of pre-midnight session
  ,visitstarttime<>visitid AS isMidnightSplitSession 

  -- get hitnumber and pagepath from hits where the type is not event 
  -- limit to one while sorting by hitnumber - offset 1 to get second page
  ,(SELECT AS STRUCT hitnumber, page.pagePath 
     FROM UNNEST(hits) 
     WHERE type<>'EVENT' 
     ORDER BY hitnumber ASC 
     LIMIT 1 OFFSET 1) AS secondPage

  ,(SELECT AS STRUCT hitnumber, page.pagePath FROM UNNEST(hits) 
     WHERE type<>'EVENT' ORDER BY hitnumber ASC LIMIT 1 OFFSET 2) AS thirdPage

  -- no need to left join with all those arrays and bloat up the table
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801` t 

-- check that first page = '/home'
WHERE (SELECT page.pagePath FROM UNNEST(hits) WHERE isEntrance=true) = '/home'

  and totals.pageviews>1 -- for testing purpose
LIMIT 1000

推荐阅读