sql - 确定每组的重复率
问题描述
编辑:以前的示例数据包括我需要在解决方案中计算的重复访问列。
我正在尝试确定
total_visits = total visits per website per sub_group
duplicate_visits = visits-1
duplication_rate = duplicate_visits/ total_visits
distinct_users_subgroup = distinct users per website per sub_group
distinct_users_total = distinct users per website
对于下面的示例数据,我希望成为合适的代表。该表在website
和sub_group
级别分组,并给出了visits
每个用户的数量。如何编写查询以获得以下结果?
数据:
+---------+---------+-----------+--------+
| website | user_id | sub_group | visits |
+---------+---------+-----------+--------+
| web1 | 1 | A | 1 |
| web1 | 1 | B | 3 |
| web1 | 1 | C | 4 |
| web1 | 1 | D | 1 |
| web1 | 2 | A | 3 |
| web1 | 2 | B | 4 |
| web1 | 3 | B | 2 |
| web1 | 3 | C | 1 |
| web1 | 3 | D | 1 |
| web2 | 1 | A | 3 |
| web2 | 1 | B | 1 |
| web2 | 1 | C | 2 |
| web2 | 4 | B | 1 |
| web2 | 4 | D | 2 |
| web2 | 5 | C | 3 |
| web2 | 5 | D | 4 |
+---------+---------+-----------+--------+
结果:
+---------+-----------+--------------+------------------+----------------+-------------------------+----------------------+
| website | sub_group | total_visits | duplicate_visits | duplicate_rate | distinct_users_subgroup | distinct_users_total |
+---------+-----------+--------------+------------------+----------------+-------------------------+----------------------+
| web1 | A | 4 | 2 | 0.50 | 2 | 3 |
| web1 | B | 9 | 6 | 0.67 | 3 | 3 |
| web1 | C | 5 | 3 | 0.60 | 2 | 3 |
| web1 | D | 2 | 0 | 0.00 | 2 | 3 |
| web2 | A | 3 | 2 | 0.67 | 1 | 3 |
| web2 | B | 2 | 0 | 0.00 | 2 | 3 |
| web2 | C | 5 | 3 | 0.60 | 2 | 3 |
| web2 | D | 6 | 4 | 0.67 | 2 | 3 |
+---------+-----------+--------------+------------------+----------------+-------------------------+----------------------+
解决方案
嗨 statq 恢复到一个表可能有点混乱,所以我将分部分进行 - 目前我只有一个访问数据库来运行查询,所以我将保持 sql 非常简单(取决于你可以使用的系统其他使代码更紧凑的 SQL 动词):
=> 该表称为“teste”
第一个获取除最后两列之外的所有列的 SQL:
SELECT website,
sub_group,
Sum(visits) AS total_visits,
Count(*) AS duplicate_visits,
1-Count(*)/Sum(visits) AS duplicate_rate
FROM teste
GROUP BY website, sub_group
第二个 sql 获取倒数第二列的数据
SELECT website,
sub_group,
count(user_id) AS distinct_users_subgroup
FROM (SELECT website,
sub_group,
user_id
FROM teste
GROUP BY website, sub_group, user_id)
GROUP BY website, sub_group
最后一列的第三个 sql:
SELECT website,
Count(user_id) AS distinct_users_total
FROM (SELECT website,
user_id
FROM teste
GROUP BY website, user_id )
GROUP BY website
可以将这 3 个结果合并到一个结果框架中,尽管它在我手头的简化 SQL 数据库中看起来非常混乱(我无法测试这部分,所以可能有错字)
SELECT a.*,
b.distinct_users_subgroup,
c.distinct_users_total
FROM (SELECT website,
sub_group,
Sum(visits) AS total_visits,
Count(*) AS duplicate_visits,
1-Count(*)/Sum(visits) AS duplicate_rate
FROM teste
GROUP BY website, sub_group) as a
INNER JOIN (SELECT website,
sub_group,
count(user_id) AS distinct_users_subgroup
FROM (SELECT website,
sub_group,
user_id
FROM teste
GROUP BY website, sub_group, user_id)
GROUP BY website, sub_group) as b on a.website = b.website and a.sub_group = b.sub_group
INNER JOIN (SELECT website,
Count(user_id) AS distinct_users_total
FROM (SELECT website,
user_id
FROM teste
GROUP BY website, user_id )
GROUP BY website) as c on a.website = c.website
推荐阅读
- angular6 - 如何获取子组件的引用并从子组件发出值
- amazon-s3 - 无法使用使用 s3 控件的 create_job 函数进行批处理操作
- nginx - LoadBalancer - Kubernetes 或 Nginx 或两者兼而有之
- c# - SQL 表值参数仅在其数据复制到临时表时有效
- php - Filetype() 和 is_dir() 显示不同的结果
- python - 如何使用 Pandas/Matplotlib 在 X 轴上绘制日期,在 Y 轴上绘制时间,并以 HH:MM 格式显示时间作为刻度标签?
- github - Github 页面:页面构建失败,#troubleshooting-build-errors
- laravel - 如何在 Laravel 中进行 Facebook 身份验证后重定向给定、正确和干净的 URL?
- reactjs - 在 reactjs 应用程序中获取 Jscript 编译错误
- python - 返回熊猫中指定值的列名