首页 > 技术文章 > Sql批量插入时如果遇到相同的数据怎么处理

nxzblogs 2019-11-05 18:54 原文

测试数据

-- 创建测试表1 
CREATE TABLE `testtable1` ( 
`Id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT, 
`UserId` INT(11) DEFAULT NULL, 
`UserName` VARCHAR(10) DEFAULT NULL, 
`UserType` INT(11) DEFAULT NULL, 
PRIMARY KEY (`Id`), 
UNIQUE KEY `IX_UserId` (`UserId`) 
) ENGINE=INNODB DEFAULT CHARSET=utf8; 

-- 创建测试表2 
CREATE TABLE `testtable2` ( 
`Id` INT(11) NULL AUTO_INCREMENT, 
`UserId` INT(11) DEFAULT NULL, 
`UserName` VARCHAR(10) DEFAULT NULL, 
`UserType` INT(11) DEFAULT NULL, 
PRIMARY KEY (`Id`), 
UNIQUE KEY `IX_UserId` (`UserId`) 
) ENGINE=INNODB DEFAULT CHARSET=utf8; 

-- 插入测试数据1 
INSERT INTO testtable1(Id,UserId,UserName,UserType) 
VALUES(1,101,'aa',1),(2,102,'bbb',2),(3,103,'ccc',3); 

-- 插入测试数据2 
INSERT INTO testtable2(Id,UserId,UserName,UserType) 
VALUES(1,201,'aaa',1),(2,202,'bbb',2),(3,203,'ccc',3),(4,101,'xxxx',5);

可以看到上边的数据中会有userid为重复的数据 userid=101

mysql> show tables;
+---------------+
| Tables_in_dev |
+---------------+
| testtable1    |
| testtable2    |
+---------------+
mysql> select * from testtable1;
+----+--------+----------+----------+
| Id | UserId | UserName | UserType |
+----+--------+----------+----------+
|  1 |    101 | aa       |        1 |
|  2 |    102 | bbb      |        2 |
|  3 |    103 | ccc      |        3 |
+----+--------+----------+----------+
3 rows in set (0.04 sec)

mysql> select * from testtable2;
+----+--------+----------+----------+
| Id | UserId | UserName | UserType |
+----+--------+----------+----------+
|  1 |    201 | aaa      |        1 |
|  2 |    202 | bbb      |        2 |
|  3 |    203 | ccc      |        3 |
|  4 |    101 | xxxx     |        5 |
+----+--------+----------+----------+
4 rows in set (0.04 sec)

### 当执行以下sql时,会报1062错误,提示有重复的key
mysql> insert into testtable1 (userid,username,usertype)
    -> select userid,username,usertype from testtable2;
1062 - Duplicate entry '101' for key 'IX_UserId'

  • 如果想让上边的sql执行成功的话,可以使用IGNORE关键字
mysql> insert ignore into testtable1 (userid,username,usertype)
    -> select userid,username,usertype from testtable2;
Query OK, 3 rows affected (0.12 sec)
Records: 4  Duplicates: 1  Warnings:1
mysql> select * from testtable1;
+----+--------+----------+----------+
| Id | UserId | UserName | UserType |
+----+--------+----------+----------+
|  1 |    101 | aa       |        1 |
|  2 |    102 | bbb      |        2 |
|  3 |    103 | ccc      |        3 |
| 11 |    201 | aaa      |        1 |
| 12 |    202 | bbb      |        2 |
| 13 |    203 | ccc      |        3 |
+----+--------+----------+----------+
6 rows in set (0.05 sec)

查询sql,显示testtable2表中的数据插入到了表1中(除了重复key的那条信息)

另外注意到主键id为11,12,13开始的,这个是因为之前insert的sql失败导致的自增主键不连续


导入并覆盖重复数据,REPLACE INTO

上边那个是没有插入重复key的数据

  • 回滚之前testtable1表的数据

    mysql> truncate table testtable1;
    Query OK, 0 rows affected (0.62 sec)
    
    mysql> select * from testtable1;
    Empty set
    mysql> -- 插入测试数据1 
    INSERT INTO testtable1(Id,UserId,UserName,UserType) 
    VALUES(1,101,'aa',1),(2,102,'bbb',2),(3,103,'ccc',3); 
    Query OK, 3 rows affected (0.09 sec)
    
    mysql> replace into testtable1 (userid,username,usertype)
        -> select userid,username,usertype from testtable2;
    Query OK, 5 rows affected (0.10 sec)
    Records: 4  Duplicates: 1  Warnings: 0
    mysql> select * from testtable1;
    +----+--------+----------+----------+
    | Id | UserId | UserName | UserType |
    +----+--------+----------+----------+
    |  2 |    102 | bbb      |        2 |
    |  3 |    103 | ccc      |        3 |
    |  4 |    201 | aaa      |        1 |
    |  5 |    202 | bbb      |        2 |
    |  6 |    203 | ccc      |        3 |
    |  7 |    101 | xxxx     |        5 |
    +----+--------+----------+----------+
    

    可以看到表1中的101的username被覆盖为表2中的数据,这个是因为replace是现将原来表一中重复的数据删除掉,然后再执行插入新的数据

    导入重复数据,保留未指定的值

    mysql> insert into testtable1 (userid,username,usertype) 
        -> select userid,username,usertype from testtable2
        -> on duplicate key update
        -> testtable1.username = testtable2.username;
    

    以上sql对于重复的数据,只是将username进行了覆盖,其他的值还是表一中的数据

推荐阅读