sql - 使用 max() 和 group by 后获取多行
问题描述
我正在尝试根据具有最活跃属性xml_id
的任何一个段将一个段分配给一个。我已经为此编写了一个查询:xml_id
我正在使用这样的选择创建一个表:
create table schema.table4 as
select
yyyy_mm_dd,
xml_id,
segment as xml_segment,
max(property_count)
from(
select
t1.yyyy_mm_dd,
t2.xml_id,
t3.segment,
count(t1.hotel_id) as property_count
from(
select
yyyy_mm_dd,
hotel_id
from
schema.table1
where
is_active = 1
and yyyy_mm_dd = "2020-10-01"
) t1
left join(
select
yyyy_mm_dd,
hotel_id,
xml_id
from
schema.table2
where
yyyy_mm_dd = "2020-10-01"
and xml_id is not null
) t2 on t2.hotel_id = t1.hotel_id and t2.yyyy_mm_dd = t1.yyyy_mm_dd
inner join
schema.table3 t3 on t3.hotel_id = t1.hotel_id
group by
1,2,3
) x
group by
1,2,3
但是,当我查询此表时,我可以看到每个xml_id
. 我本来预计只会MAX()
插入带有 property_count 的段。怎么不是这样?
select
xml_id, count(*)
from
schema.table4
显示多行,其中xml_id
有两行和三行。我需要它只有一行,段应该是最高的那一个property_count
。
下面的一些示例输出。xml_id
继承它具有最多属性的段。
t1:
| yyyy_mm_dd | hotel_id | is_active |
|------------|----------|-----------|
| 2020-10-01 | 1 | 1 |
| 2020-10-01 | 2 | 1 |
| 2020-10-01 | 3 | 1 |
| 2020-10-01 | 4 | 1 |
| 2020-10-01 | 5 | 1 |
| 2020-10-01 | 6 | 1 |
| 2020-10-01 | 7 | 0 |
t2:
| yyyy_mm_dd | hotel_id | xml_id |
|------------|----------|--------|
| 2020-10-01 | 1 | 444 |
| 2020-10-01 | 2 | 444 |
| 2020-10-01 | 3 | 444 |
| 2020-10-01 | 4 | 920 |
| 2020-10-01 | 5 | 920 |
| 2020-10-01 | 6 | 920 |
| 2020-10-01 | 7 | null |
t3:
| hotel_id | segment |
|----------|---------|
| 1 | Home |
| 2 | Core |
| 3 | Core |
| 4 | Core |
| 5 | Home |
| 6 | Home |
| 7 | Chain |
预期输出:
| yyyy_mm_dd | xml_id | segment |
|------------|--------|---------|
| 2020-10-01 | 444 | Core |
| 2020-10-01 | 920 | Home |
解决方案
如果我理解正确,您想按日期计算活动的数量,xml_id
并且segment
. 这是此计数的基本查询:
select t2.yyyy_mm_dd, t2.xml_id, t3.segment, count(*) as num_actives
from t2 join
t1
on t2.hotel_id = t1.hotel_id and
t2.yyyy_mm_dd = t1.yyyy_mm_dd join
t3
on t3.hotel_id = t2.hotel_id
where t1.is_active = 1
group by t2.yyyy_mm_dd, t2.xml_id, t3.segment;
要获得最高值,您可以使用窗口函数:
select t.*
from (select t2.yyyy_mm_dd, t2.xml_id, t3.segment, count(*) as num_actives,
row_number() over (partition by t2.yyyy_mm_dd, t2.xml_id order by count(*) desc) as seqnum
from t2 join
t1
on t2.hotel_id = t1.hotel_id and
t2.yyyy_mm_dd = t1.yyyy_mm_dd join
t3
on t3.hotel_id = t2.hotel_id
where t1.is_active = 1
group by t2.yyyy_mm_dd, t2.xml_id, t3.segment
) t
where seqnum = 1;
推荐阅读
- pine-script - RSI 背离策略进入
- python-3.x - 如何:Azure CosmosDB 表更新触发的 Azure python 函数(新项目/修改项目/删除项目)
- java - 如何使用泛型(java)解决重载方法
- python - 在 Python 中将文件上传到 AWS S3 存储桶文件夹会导致正则表达式错误
- python - 当原始矩阵的第二个索引为奇数时,为什么 NumPy 的 rfft2 的 irfft2 会导致矩阵少一列?
- python - 根据不同的行分组和求和
- python - 我无法从一对一关系中的字段中获取值,tabularinline
- google-cloud-speech - 谷歌云语音,无口音的词
- regex - Hugo标签之间的正则表达式匹配
- layout - ASP.Net Core MVC 布局问题