首页 > 解决方案 > Vertica 数据库添加了具有相同主键的重复条目

问题描述

我正在运行Verticaon的 docker 映像windows。我已经vertica用这个模式创建了一个表(student_idis primary key

dbadmin@d1f942c8c1e0(*)=> \d testschema.student;
                                        List of Fields by Tables
   Schema   |  Table  |   Column   |    Type     | Size | Default | Not Null | Primary Key | Foreign Key
------------+---------+------------+-------------+------+---------+----------+-------------+-------------
 testschema | student | student_id | int         |    8 |         | t        | t           |
 testschema | student | name       | varchar(20) |   20 |         | f        | f           |
 testschema | student | major      | varchar(20) |   20 |         | f        | f           |
(3 rows)

student_id是主键。我正在测试使用命令从csv文件中加载数据。copy

首先我用insert-insert into testschema.student values (1,'Jack','Biology');

然后我csv在目录中创建了一个文件/home/dbadmin/vertica_test-

vi student.csv 
2,Kate,Sociology
3,Claire,English
4,Jack,Biology
5,Mike,Comp. Sci

然后我运行了这个命令

copy testschema.students from '/home/dbadmin/vertica_test/student.csv' delimiter ',' rejected data as table students_rejected;

我测试了结果

select * from testschema.student - shows 5 rows
select * from students_rejected; - no rows

然后我创建另一个csv包含错误数据的文件/home/dbadmin/vertica_test directory

 vi student_bad.csv
bad_data_type_for_student_id,UnaddedStudent, UnaddedSubject
6,Cassey,Physical Education

我从坏csv文件中添加了数据

copy testschema.students from '/home/dbadmin/vertica_test/student.csv' delimiter ',' rejected data as table students_rejected;

然后我测试了输出

select * from testschema.student - shows 6 rows <-- only one row got added. all ok
select * from students_rejected; - shows 1 row  <-- bad row's entry is here. all ok

一切看起来都不错

然后我再次添加了错误数据,没有拒绝数据选项

copy testschema.students from '/home/dbadmin/vertica_test/student_bad.csv' delimiter ',' ;

但现在学生 id 为 6 的条目又被添加了!!

student_id |  name  |       major
------------+--------+--------------------
          1 | Jack   | Biology
          2 | Kate   | Sociology
          3 | Claire | English
          4 | Jack   | Biology
          5 | Mike   | Comp. Sci
          6 | Cassey | Physical Education <--
          6 | Cassey | Physical Education <--

这不应该被拒绝吗?

标签: vertica

解决方案


如果您students使用这种类型的命令创建了您的:

DROP TABLE IF EXISTS students;
CREATE TABLE students (
  student_id int 
, name       varchar(20)
, major      varchar(20)
, CONSTRAINT pk_students PRIMARY KEY(student_id)                                                                                                                                         
);

也就是说,如果没有显式关键字ENABLED,则禁用主键约束。students也就是说,您可以愉快地插入重复项,但如果您稍后想通过主键列连接到表,则会遇到错误。

启用主键约束...

[...]
, CONSTRAINT pk_students PRIMARY KEY(student_id) ENABLED                                                                                                                                         
[...]

我想你会得到想要的效果。

整个场景:

DROP TABLE IF EXISTS students;
CREATE TABLE students (
  student_id int
, name       varchar(20)
, major      varchar(20)
, CONSTRAINT pk_students PRIMARY KEY(student_id) ENABLED
);

INSERT INTO students
          SELECT 1,'Jack'  ,'Biology'
UNION ALL SELECT 2,'Kate'  ,'Sociology'
UNION ALL SELECT 3,'Claire','English'
UNION ALL SELECT 4,'Jack'  ,'Biology'
UNION ALL SELECT 5,'Mike'  ,'Comp. Sci'
UNION ALL SELECT 6,'Cassey','Physical Education'
;
-- out  OUTPUT 
-- out --------
-- out       6

COMMIT;

COPY students FROM STDIN DELIMITER ','
REJECTED DATA AS TABLE students_rejected;
6,Cassey,Physical Education
\.
-- out vsql:/home/gessnerm/._vfv.sql:4: ERROR 6745:  
-- out  Duplicate key values: 'student_id=6' 
-- out  -- violates constraint 'dbadmin.students.pk_students'
SELECT * FROM students;
-- out  student_id |  name  |       major        
-- out ------------+--------+--------------------
-- out           1 | Jack   | Biology
-- out           2 | Kate   | Sociology
-- out           3 | Claire | English
-- out           4 | Jack   | Biology
-- out           5 | Mike   | Comp. Sci
-- out           6 | Cassey | Physical Education
SELECT * FROM students_rejected;
-- out  node_name | file_name | session_id | transaction_id | statement_id | batch_number | row_number | rejected_data | rejected_data_orig_length | rejected_reason 
-- out -----------+-----------+------------+----------------+--------------+--------------+------------+---------------+---------------------------+-----------------
-- out (0 rows)

唯一可靠的检查似乎是ANALYZE_CONSTRAINTS()电话...

ALTER TABLE students ALTER CONSTRAINT pk_students DISABLED;
-- out Time: First fetch (0 rows): 7.618 ms. All rows formatted: 7.632 ms
COPY students FROM STDIN DELIMITER ','
REJECTED DATA AS TABLE students_rejected;
6,Cassey,Physical Education
\.
-- out Time: First fetch (0 rows): 31.790 ms. All rows formatted: 31.791 ms

SELECT * FROM students;
-- out  student_id |  name  |       major        
-- out ------------+--------+--------------------
-- out           1 | Jack   | Biology
-- out           2 | Kate   | Sociology
-- out           3 | Claire | English
-- out           4 | Jack   | Biology
-- out           5 | Mike   | Comp. Sci
-- out           6 | Cassey | Physical Education
-- out           6 | Cassey | Physical Education
SELECT * FROM students_rejected;
-- out  node_name | file_name | session_id | transaction_id | statement_id | batch_number | row_number | rejected_data | rejected_data_orig_length | rejected_reason 
-- out -----------+-----------+------------+----------------+--------------+--------------+------------+---------------+---------------------------+-----------------
-- out (0 rows)
SELECT ANALYZE_CONSTRAINTS('students');
-- out  Schema Name | Table Name | Column Names | Constraint Name | Constraint Type | Column Values 
-- out -------------+------------+--------------+-----------------+-----------------+---------------
-- out  dbadmin     | students   | student_id   | pk_students     | PRIMARY         | ('6')
-- out (1 row)

推荐阅读