google-bigquery - BigQuery:根据着陆页重新创建 GA 细分 - 指标不一致
问题描述
我想在 BigQuery 上创建一个表格,其中包含我们的主要Google Analytics 细分及其指标(会话、用户、跳出率、交易、收入、网站停留时间),按维度(设备、新闻/返回)细分。
以下查询可以正常工作,并且可以通过 Google Analytics 找到结果:
SELECT
SUBSTR(date,1,6) AS date,
CASE
WHEN REGEXP_CONTAINS(trafficSource.medium, r'^(cpc|ppc|cpa|cpm|cpv|cpp|display)') THEN 'paid'
WHEN trafficSource.medium LIKE 'organic' THEN 'organic'
WHEN trafficSource.medium LIKE '(none)' THEN 'direct'
WHEN trafficSource.medium LIKE 'referral' THEN 'referral'
WHEN trafficSource.medium LIKE 'social' THEN 'social'
ELSE 'other'
END AS segment,
COALESCE(totals.newVisits, 0) = 1 AS first_visit,
device.deviceCategory AS device,
COUNT(DISTINCT fullVisitorId) AS users,
SUM(totals.visits) AS sessions,
SUM(totals.pageviews) AS pageviews,
COUNT(totals.bounces) AS bounces,
SUM(totals.timeOnSite) AS session_duration,
SUM(totals.transactions) AS transactions,
SUM(totals.totalTransactionRevenue)/1000000 AS revenue
FROM
`xxxxxxxxxxx.ga_sessions_20180930`
GROUP BY
1,2,3,4
如您所见,该查询仅适用于基于trafficSource.medium的所有细分。我的问题是我还需要根据登录页面创建细分。
我用一个子查询加入了原始表,以获取未嵌套的登录页面(请检查下面的查询 -> 为简单起见,删除了其他维度和指标)。
该策略不起作用:现在我的所有指标(例如会话)都与 Google Analytics 中的指标不匹配。
SELECT
CASE
WHEN REGEXP_CONTAINS(trafficSource.medium,
r'^(cpc|ppc|cpa|cpm|cpv|cpp|display)') THEN 'paid'
WHEN trafficSource.medium LIKE 'organic' AND landing_blog LIKE 'not blog' THEN 'organic not blog'
WHEN trafficSource.medium LIKE 'organic' AND landing_blog LIKE 'blog' THEN 'organic landing blog'
WHEN trafficSource.medium LIKE '(none)' THEN 'direct'
WHEN trafficSource.medium LIKE 'referral' THEN 'referral'
WHEN trafficSource.medium LIKE 'social' THEN 'social'
ELSE 'other'
END AS segment,
SUM(totals.visits) AS sessions
FROM
`xxxxxxxxxxxx.ga_sessions_20180930` AS ga
JOIN (
SELECT
visitId,
CASE
WHEN REGEXP_CONTAINS(hits.page.pagePath, r'^/blog/') THEN 'blog'
ELSE 'not blog'
END AS landing_blog
-- Uses the knowledge that hits are stored in chronological order
FROM
`xxxxxxxxxxxx.ga_sessions_20180930`,
UNNEST (hits) AS hits
WHERE
hits.hitNumber = 1
AND hits.type LIKE 'PAGE') AS landing
ON
ga.visitId = landing.visitId
GROUP BY 1
在更正第二个查询或建议不同的策略方面,我将不胜感激。
解决方案
工作人员帮我找到了错误。这是最终的查询:
SELECT
SUBSTR(date,1,6) AS date,
CASE
WHEN REGEXP_CONTAINS(trafficSource.medium, r'^(cpc|ppc|cpa|cpm|cpv|cpp|display)') THEN 'paid'
WHEN trafficSource.medium LIKE 'organic'
AND landing_blog LIKE 'not blog' THEN 'organic not blog'
WHEN trafficSource.medium LIKE 'organic' AND landing_blog LIKE 'blog' THEN 'organic landing blog'
WHEN trafficSource.medium LIKE '(none)' THEN 'direct'
WHEN trafficSource.medium LIKE 'referral' THEN 'referral'
WHEN trafficSource.medium LIKE 'social' THEN 'social'
ELSE 'other'
END AS segment,
COALESCE(totals.newVisits, 0) = 1 AS first_visit,
device.deviceCategory AS device,
COUNT(DISTINCT ga.fullVisitorId) AS users,
SUM(totals.visits) AS sessions,
SUM(totals.pageviews) AS pageviews,
COUNT(totals.bounces) AS bounces,
SUM(totals.timeOnSite) AS session_duration,
SUM(totals.transactions) AS transactions,
SUM(totals.totalTransactionRevenue)/1000000 AS revenue
FROM
`xxxx.ga_sessions_2018*` AS ga
LEFT JOIN (
SELECT
visitId,
fullVisitorId,
CASE
WHEN REGEXP_CONTAINS(hits.page.pagePath, r'^/blog/|^/lab/') THEN 'blog'
ELSE 'not blog'
END AS landing_blog
FROM
`xxxxx.ga_sessions_2018*`,
UNNEST (hits) AS hits
WHERE
hits.isEntrance = TRUE) AS landing
ON
ga.visitId = landing.visitId
AND ga.fullVisitorId = landing.fullVisitorId
GROUP BY
1, 2, 3, 4
推荐阅读
- reactjs - 使用 axios 将数据发布到 Mongo 数据库(React 前端)
- google-oauth - Google Home“无法更新设置。请检查您的连接。”
- python - 无法在 pandas 数据框中使用适当的值填充缺失的小时数
- javascript - Highchart的gridLineDashStyle在底部只有一项时缺少网格线作为波纹管演示
- php - php 使用 curl_multi 多次卷曲到同一个 API
- api - 使用 Python 的 Bittrex API 无效签名响应
- python-3.x - 如果 list_of_list[0] 中的元素不起作用,则在非 python 模式下将列表列表传递给 numba 函数
- java - 油漆组件不会运行
- node.js - 发布请求的“正文”始终为 unicode
- altbeacon - Altbeacon 和 bootstrap 始终默认为默认信标格式