首页 > 技术文章 > SQL求差集

boss-he 2015-07-24 14:52 原文

数据库环境:SQL SERVER 2008R2

Sql Server有提供求集合差集的函数——EXCEPT。先看看EXCEPT的用法,

{ <query_specification> | ( <query_expression> ) } 
{ EXCEPT }
{ <query_specification> | ( <query_expression> ) }

从 EXCEPT 操作数左边的查询中返回右边的查询未返回的所有非重复值。
上面是摘自MSDN对EXCEPT函数的用法介绍。

在这里,我们的要求有点特别,集合B中存在多少条集合A的记录,那么,在集合A中剔除集合B中对应的记录条数。
假如A表有数据如下:
id    name
1     a
1     a
2     b

B表数据如下:
id    name
1     a
3     c

根据需求,B表中有一条记录和A表有重复,因此,在A表中,把该重复记录的一条去掉,
结果数据如下:
id    name
1     a
2     b
需求已经清晰了,现在开始来实现实现的方法是:分别给a表和b表的重复记录编号,
只要在b表中存在和a表编号、id、name一样的记录,即在a表进行过滤。
先准备基础数据
WITH    a
          AS ( SELECT   1 AS id ,
                        'a' AS NAME
               UNION ALL
               SELECT   1 AS id ,
                        'a' AS NAME
               UNION ALL
               SELECT   2 AS id ,
                        'b' AS NAME
               UNION ALL
               SELECT   3 AS id ,
                        'c' AS NAME
               UNION ALL
               SELECT   3 AS id ,
                        'c' AS NAME
               UNION ALL
               SELECT   1 AS id ,
                        'a' AS NAME
               UNION ALL
               SELECT   4 AS id ,
                        'd' AS NAME
             ),
        b
          AS ( SELECT   3 AS id ,
                        'c' AS NAME
               UNION ALL
               SELECT   1 AS id ,
                        'a' AS NAME
               UNION ALL
               SELECT   2 AS id ,
                        'b' AS NAME
               UNION ALL
               SELECT   3 AS id ,
                        'c' AS NAME
               UNION ALL
               SELECT   1 AS id ,
                        'a' AS NAME
             )
View Code

分别来看一下a表和b表的数据

a表       b表

第一种方式,用NOT EXISTS来实现

SELECT  id ,
            NAME
    FROM    ( SELECT    id ,
                        ROW_NUMBER() OVER ( PARTITION BY id, NAME ORDER BY id ) AS nid ,
                        NAME
              FROM      a
            ) a
    WHERE   NOT EXISTS ( SELECT NULL
                         FROM   ( SELECT    id ,
                                            ROW_NUMBER() OVER ( PARTITION BY id,
                                                              NAME ORDER BY id ) AS nid ,
                                            NAME
                                  FROM      b
                                ) b
                         WHERE  b.nid = a.nid
                                AND b.id = a.id
                                AND b.NAME = a.NAME )
View Code


第二种实现方式,通过EXCEPT来实现

SELECT  id ,
            NAME
    FROM    ( SELECT    id ,
                        ROW_NUMBER() OVER ( PARTITION BY id, NAME ORDER BY id ) AS nid ,
                        NAME
              FROM      a
              EXCEPT
              SELECT    id ,
                        ROW_NUMBER() OVER ( PARTITION BY id, NAME ORDER BY id ) AS nid ,
                        NAME
              FROM      b
            ) a
View Code

方法1和方法2本质上是一样的思路,只不过写法不同而已。

我们来看下结果

(本文完)



推荐阅读