首页 > 解决方案 > 自联接和效率与子查询

问题描述

假设我们有一个名为 的表 actions(date, uid, pid, action, description) 。该表的示例如下所示:

Table: actions
date         uid        pid         action            description
'2018-10-19' 1234        12           'view'         
'2018-10-19' 1234        12           'report'        'SPAM' 
'2018-10-19' 5678        23           'reaction'      'LOVE'

有一个表也叫 reviewers(date, rid, pid) . 审阅者是删除帖子的人。审阅者不是用户。该表的示例如下所示:

Table: reviewers
 date         rid                    pid
'2018-10-19'  567                    12
'2018-10-19'  890                    45

用户查看(采取任何行动)的日常内容中有多少实际上是垃圾邮件?

以下工作是否可行:

案例一:“看”是指任何动作

select u.date, count(distinct r.pid)/count(distinct uu.pid))*100
from actions u join actions uu
on u.pid = uu.pid
inner join reviewers r
on u.pid = r.pid
where u.description = 'SPAM'
group by 1

案例 2:“看”意味着 action = 'view'

 select u.date, count(distinct r.pid)/count(distinct uu.pid))*100
    from actions u join actions uu       
    on u.pid = uu.pid
    inner join reviewers r
    on u.pid = r.pid
    where u.description = 'SPAM'
    and uu.action = 'VIEW'
    group by 1

标签: sql

解决方案


我不确定为什么reviewers需要考虑,或者是否可以在该表中重复 pid,但我认为这将满足您的需求(样本中的 50.0%)

select 
    count(distinct (case when description = 'SPAM' and r.pid IS NOT NULL then pid end)) * 100.0
     /
    count(distinct pid) 
from actions a
left join (
     select distinct pid from reviewers 
     ) r on r.pid = a.pid
;

推荐阅读