sql - PostgreSQL 查询:我想查找 DNA 序列中每个碱基的计数
问题描述
我有 5 个随机 DNA 序列(20 个长度的 DNA 碱基),我想找到碱基数。
在第一部分中,我准备了一个 DNA 长度函数来生成 5 x 20 DNA 碱基长度的序列。但我想找到基数。序列中有多少“A”,序列中有多少“C”,序列中有多少“G”,序列中有多少“T”。
prepare dna_length(int) as
with t1 as (select chr(65) as s union select chr(67) union select chr(71) union select chr(84) )
, t2 as ( select s, row_number() over() as rn from t1)
, t3 as ( select generate_series(1,$1) as i,round(random() * 4 + 0.5) as rn )
, t4 as ( select t2.s from t2 join t3 on (t2.rn=t3.rn))
select array_to_string(array(select s from t4),'') as dna;
with t1 as (
select 1 as rn, 'A' as s
union select 2, 'C'
union select 3, 'T'
union select 4, 'G'
), t2 as (
select generate_series(1, 5) as sample
), t3 as (
select t2.sample, generate_series(1,20) as i,
round(random() * 4 + 0.5) as rn
from t2
), t4 as (
select t3.sample, t3.i, t3.rn, t1.s
from t3
join t1 on t1.rn = t3.rn
)
select sample, string_agg(s, '' order by i)
from t4
group by sample
order by sample;
现在它看起来像这样:
id DNA
1 ACTGCTGCAGTCGTACGTAC
2 TGCAGTCGTAGCTGACGTAG
3 GCAGTGACCAACGTGTGACA
4 TGACGTGTCGAGACGAAGAG
5 CGTGTGAGAGTCGTAGAGTG
结果应如下所示:
id DNA A C G T
1 ACTGCTGCAGTCGTACGTAC 4 6 5 5
2 TGCAGTCGTAGCTGACGTAG 4 4 6 6
3 GCAGTGACCAACGTGTGACA 6 5 6 4
4 TGACGTGTCGAGACGAAGAG 4 3 8 3
5 CGTGTGAGAGTCGTAGAGTG 4 2 9 5
解决方案
您可以在最终查询中进行条件计数:
with ...
select
sample,
string_agg(s, '' order by i) dna,
count(*) filter(where s = 'A') a,
count(*) filter(where s = 'C') c,
count(*) filter(where s = 'G') g,
count(*) filter(where s = 'T') t
from t4
group by sample
order by sample;
推荐阅读
- authorization - 有没有办法在 .Net 5 的 AuthorizationHandler 中重定向?
- javascript - 如何停止 Servlet 从 Html 重定向
- android - 重复协程工作者
- c# - 在组合框 wpf 中检查空值时异常不起作用
- python - 尝试使用 selenium (Python) 单击验证码框时出错
- python - 如何将命名变量传递给python中的bash脚本?
- java - 如何从嵌套的 json volley/android 中检索数据
- python-3.x - 尝试在 python 3.9.4 上安装(py -m pip install tables)时出现问题
- go - Sentry CaptureMessage 不刷新
- python - Python Pyinstaller MSYS 问题与 PyQtWebKit