首页 > 解决方案 > BigQuery:根据着陆页重新创建 GA 细分 - 指标不一致

问题描述

我想在 BigQuery 上创建一个表格,其中包含我们的主要Google Analytics 细分及其指标(会话、用户、跳出率、交易、收入、网站停留时间),按维度(设备、新闻/返回)细分。

以下查询可以正常工作,并且可以通过 Google Analytics 找到结果:

SELECT
  SUBSTR(date,1,6) AS date,
  CASE
    WHEN REGEXP_CONTAINS(trafficSource.medium, r'^(cpc|ppc|cpa|cpm|cpv|cpp|display)') THEN 'paid'
    WHEN trafficSource.medium LIKE 'organic' THEN 'organic'
    WHEN trafficSource.medium LIKE '(none)' THEN 'direct'
    WHEN trafficSource.medium LIKE 'referral' THEN 'referral'
    WHEN trafficSource.medium LIKE 'social' THEN 'social'
    ELSE 'other'
  END AS segment,
  COALESCE(totals.newVisits, 0) = 1 AS first_visit,
  device.deviceCategory AS device,
  COUNT(DISTINCT fullVisitorId) AS users,
  SUM(totals.visits) AS sessions,
  SUM(totals.pageviews) AS pageviews,
  COUNT(totals.bounces) AS bounces,
  SUM(totals.timeOnSite) AS session_duration,
  SUM(totals.transactions) AS transactions,
  SUM(totals.totalTransactionRevenue)/1000000 AS revenue
FROM
  `xxxxxxxxxxx.ga_sessions_20180930`
GROUP BY
  1,2,3,4

如您所见,该查询仅适用于基于trafficSource.medium的所有细分。我的问题是我还需要根据登录页面创建细分。

我用一个子查询加入了原始表,以获取未嵌套的登录页面(请检查下面的查询 -> 为简单起见,删除了其他维度和指标)。

该策略不起作用:现在我的所有指标(例如会话)都与 Google Analytics 中的指标不匹配。

SELECT
  CASE
    WHEN REGEXP_CONTAINS(trafficSource.medium,  
r'^(cpc|ppc|cpa|cpm|cpv|cpp|display)') THEN 'paid'
    WHEN trafficSource.medium LIKE 'organic' AND landing_blog LIKE 'not blog' THEN 'organic not blog'
    WHEN trafficSource.medium LIKE 'organic' AND landing_blog LIKE 'blog' THEN 'organic landing blog'
    WHEN trafficSource.medium LIKE '(none)' THEN 'direct'
    WHEN trafficSource.medium LIKE 'referral' THEN 'referral'
    WHEN trafficSource.medium LIKE 'social' THEN 'social'
    ELSE 'other'
  END AS segment,
  SUM(totals.visits) AS sessions
FROM
  `xxxxxxxxxxxx.ga_sessions_20180930` AS ga
JOIN (
  SELECT
    visitId,
    CASE
      WHEN REGEXP_CONTAINS(hits.page.pagePath,  r'^/blog/') THEN 'blog'
      ELSE 'not blog'
    END AS landing_blog
    -- Uses the knowledge that hits are stored in chronological order
  FROM
    `xxxxxxxxxxxx.ga_sessions_20180930`,
    UNNEST (hits) AS hits
  WHERE
    hits.hitNumber = 1
    AND hits.type LIKE 'PAGE') AS landing
ON
  ga.visitId = landing.visitId
GROUP BY 1

在更正第二个查询或建议不同的策略方面,我将不胜感激。

标签: google-bigquery

解决方案


工作人员帮我找到了错误。这是最终的查询:

SELECT
  SUBSTR(date,1,6) AS date,
  CASE
    WHEN REGEXP_CONTAINS(trafficSource.medium, r'^(cpc|ppc|cpa|cpm|cpv|cpp|display)') THEN 'paid'
    WHEN trafficSource.medium LIKE 'organic'
  AND landing_blog LIKE 'not blog' THEN 'organic not blog'
    WHEN trafficSource.medium LIKE 'organic' AND landing_blog LIKE 'blog' THEN 'organic landing blog'
    WHEN trafficSource.medium LIKE '(none)' THEN 'direct'
    WHEN trafficSource.medium LIKE 'referral' THEN 'referral'
    WHEN trafficSource.medium LIKE 'social' THEN 'social'
    ELSE 'other'
  END AS segment,
  COALESCE(totals.newVisits, 0) = 1 AS first_visit,
  device.deviceCategory AS device,
  COUNT(DISTINCT ga.fullVisitorId) AS users,
  SUM(totals.visits) AS sessions,
  SUM(totals.pageviews) AS pageviews,
  COUNT(totals.bounces) AS bounces,
  SUM(totals.timeOnSite) AS session_duration,
  SUM(totals.transactions) AS transactions,
  SUM(totals.totalTransactionRevenue)/1000000 AS revenue
FROM
  `xxxx.ga_sessions_2018*` AS ga
LEFT JOIN (
  SELECT
    visitId,
    fullVisitorId,
    CASE
      WHEN REGEXP_CONTAINS(hits.page.pagePath, r'^/blog/|^/lab/') THEN 'blog'
      ELSE 'not blog'
    END AS landing_blog
  FROM
    `xxxxx.ga_sessions_2018*`,
    UNNEST (hits) AS hits
  WHERE
    hits.isEntrance = TRUE) AS landing
ON
  ga.visitId = landing.visitId
  AND ga.fullVisitorId = landing.fullVisitorId
GROUP BY
  1, 2, 3, 4

推荐阅读