首页 > 解决方案 > 关于对删除重复项的 Sql 查询

问题描述

So I have to do a query where I am asked to find the count of directors who, for every pair of movies' genres, have directed both. I thought I should take two instances of genre id and two instances of directors id and find those where g_id's are different but dir_id are the same,so I tried something like this

select distinct g1.genre_id as genre1,
            g2.genre_id as genre2,
   count(distinct mhd1.director_id) as directors_count
from genre g1, genre g2, movie_has_genre mhg1,movie_has_genre 
     mhg2,movie_has_director mhd1,movie_has_director mhd2    
where 
(g2.genre_id <> g1.genre_id) and  (mhg1.genre_id = g1.genre_id) 
and (mhg2.genre_id = g2.genre_id) and (mhd1.movie_id = mhg1.movie_id)
and  (mhd2.movie_id = mhg2.movie_id) and (mhd1.director_id =mhd2.director_id)
group by g1.genre_id, g2.genre_id;  

Base is

actor(actor_id,first_name,last_name,gender)

director(director_id,first_name,last_name)

role(movie_id,actor_id,role)

genre(genre_id,name)

movie(movie_id,title,year,rank)

movie_has_director(movie_id,director_id)

movie_has_genre(genre_id,movie_id)

but it is not working. What am I missing there? Thank you
EDIT problem seems to be I get both (a,b) and (b,a) pairs while I should get only (a,b) with a

标签: mysqlsqlmysql-workbench

解决方案


首先,您需要一种交叉连接来获得两种类型的所有组合:genre g1 join genre g2 on g2.id > g1.id. 然后,您需要同时加入(movie_has_genre和) ,并且只保留导演相同的行 ( )。其余的是基本的,并且:movie_has_directorg1g2md2.director_id = md1.director_idGROUP BYCOUNT

select g1.name as genre1,
       g2.name as genre2,
       count(distinct md1.director_id) as directors_count
from genre g1
join genre g2 on g2.id > g1.id
join movie_has_genre mg1 on mg1.genre_id = g1.id
join movie_has_genre mg2 on mg2.genre_id = g2.id
join movie_has_director md1 on md1.movie_id = mg1.movie_id
join movie_has_director md2 on md2.movie_id = mg2.movie_id
                           and md2.director_id = md1.director_id
group by g1.id, g2.id

推荐阅读