sql - BigQuery 不存在...不等于 (!=) 并且存在...等于给出不同的结果
问题描述
我迷路了,并试图找出我为相同逻辑获得不同输出的原因。我想找出每个调用集的所有调用(变体和非变体),并省略任何使用非 PASS 过滤器的调用。
查询在链接中给出:
#standardSQL
SELECT
call.name AS call_name,
COUNT(1) AS number_of_calls
FROM
`bigquery-public-data.human_genome_variants.platinum_genomes_deepvariant_variants_20180823` v, v.call
WHERE
NOT EXISTS (SELECT 1 FROM UNNEST(call.FILTER) AS call_filter WHERE call_filter != 'PASS')
GROUP BY
call_name
ORDER BY
call_name
排 | 呼叫名称 | number_of_calls |
---|---|---|
1 | NA12877 | 29795946 |
2 | NA12878 | 26118774 |
3 | NA12889 | 29044992 |
4 | NA12890 | 28717437 |
5 | NA12891 | 31395995 |
6 | NA12892 | 25349974 |
这将返回过滤器为 PASS 的行数。
但是,当我尝试类似的逻辑时,我会得到不同的结果。
#standardSQL
SELECT
call.name AS call_name,
COUNT(1) AS number_of_calls
FROM
`bigquery-public-data.human_genome_variants.platinum_genomes_deepvariant_variants_20180823` v, v.call
WHERE
EXISTS (SELECT 1 FROM UNNEST(call.FILTER) AS call_filter WHERE call_filter = 'PASS')
GROUP BY
call_name
ORDER BY
call_name
排 | 呼叫名称 | number_of_calls |
---|---|---|
1 | NA12877 | 4488086 |
2 | NA12878 | 4503443 |
3 | NA12889 | 4423974 |
4 | NA12890 | 4529950 |
5 | NA12891 | 4425316 |
6 | NA12892 | 4497085 |
为什么是这样?我的逻辑错了吗?有人可以解释为什么在这种情况下空数组是“PASS”吗?感谢您花时间帮助我!
解决方案
感谢@shawnt00 提供解决方案。我将用一些较小的例子来证明额外的计数确实是空数组。
查询 1
with smalltable as (
select ["PASS"] as filter, 'NA12877' as name union all
select ["PASS"], 'NA12877' union all
select ["PASS"], 'NA12879' union all
select ["PASS",'RefCall'], 'NA12877' union all
select ["RefCall"], 'NA12877' union all
select ["RefCall"], 'NA12877' union all
select ["RefCall"], 'NA12879' union all
select ["RefCall"], 'NA12879' union all
select ["RefCall"], 'NA12879' union all
select ["RefCall"], 'NA12879', union all
select [], 'NA12879'
)
select name, count(1) as count
from smalltable
where not exists (select 1 from unnest(filter) as f where f != 'PASS')
group by name
输出:
排 | 姓名 | 数数 |
---|---|---|
1 | NA12877 | 2 |
2 | NA12879 | 2 |
正如我们所见,查询将 [] 视为“PASS”。我不确定为什么?
查询 2
with smalltable as (
select ["PASS"] as filter, 'NA12877' as name union all
select ["PASS"], 'NA12877' union all
select ["PASS"], 'NA12879' union all
select ["PASS",'RefCall'], 'NA12877' union all
select ["RefCall"], 'NA12877' union all
select ["RefCall"], 'NA12877' union all
select ["RefCall"], 'NA12879' union all
select ["RefCall"], 'NA12879' union all
select ["RefCall"], 'NA12879' union all
select ["RefCall"], 'NA12879', union all
select [], 'NA12879'
)
select name, count(1) as count
from smalltable
where exists (select 1 from unnest(filter) as f where f = 'PASS')
group by name
输出:
排 | 姓名 | 数数 |
---|---|---|
1 | NA12877 | 3 |
2 | NA12879 | 1 |
正如预期的那样,这不计算空数组。
推荐阅读
- python - tkinter.messagebox 之后如何进入
- python - 运行 python kafka 生产者时“self.async”上的 SyntaxError
- javascript - 在 Div 上使用 onclick 事件不起作用
- r - R中rbinom()的参数
- python - Python Pandas:如果没有数据则跳过然后继续写入excel
- reactjs - 反应模态组件接收相同的类
- python - 使用python从招聘广告中提取技能和职责等特征
- ios - 错误:仍在上传屏幕截图。关于 Apple App 提交
- sql - 计算相似子字符串 SQL 查询
- r - 在 R 中设置工作目录的最佳实践