首页 > 解决方案 > 为分组数据集选择随机值

问题描述

我不是 SQL 专家。但是我正在使用以下查询:

select count(*) as countis, avclassfamily
from malwarehashesandstrings
where behaviouralbinary IS true and
       avclassfamily != 'SINGLETON'
group by avclassfamily
ORDER BY countis desc
LIMIT 50;

我想从按 avclassfamily 列分组的malwarehashsha256 列中选择 3 个随机散列。

以下查询有效,问题结束:

select count(*) as countis,avclassfamily from malwarehashesandstrings where behaviouralbinary IS true and avclassfamily != 'SINGLETON' group by avclassfamily ORDER BY countis desc LIMIT 50;

virustotal=# select m.avclassfamily, m.cnt,
        array_agg(malwarehashsha256)
 from (select malwarehashesandstrings.*,
              count(*) over (partition by avclassfamily) as cnt,
              row_number() over (partition by avclassfamily order by random()) as seqnum
       from malwarehashesandstrings
       where behaviouralbinary and
             avclassfamily <> 'SINGLETON'
      ) as m
 where seqnum <= 3
 group by m.avclassfamily, m.cnt ORDER BY m.cnt DESC LIMIT 50;

标签: sqlpostgresqlselectgroup-by

解决方案


如果我理解正确,您可以使用row_number()

select m.*
from (select m.*,
             row_number() over (partition by m.avclassfamily order by random()) as seqnum
      from malwarehashesandstrings m
      where m.behaviouralbinary and
            m.avclassfamily <> 'SINGLETON'
     ) m
where seqnum <= 3;

如果您希望在现有查询的列中使用它,一种方法是:

select m.avgclassfamily, m.cnt,
       array_agg(m.malwarehashsha256)
from (select m.*,
             count(*) over (partition by m.avgclassfamily) as cnt,
             row_number() over (partition by m.avclassfamily order by random()) as seqnum
      from malwarehashesandstrings m
      where m.behaviouralbinary and
            m.avclassfamily <> 'SINGLETON'
     ) m
where seqnum <= 3
group by m.avgclassfamily, m.cnt;

推荐阅读