vertica - Vertica 数据库添加了具有相同主键的重复条目
问题描述
我正在运行Vertica
on的 docker 映像windows
。我已经vertica
用这个模式创建了一个表(student_id
is primary key
)
dbadmin@d1f942c8c1e0(*)=> \d testschema.student;
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
------------+---------+------------+-------------+------+---------+----------+-------------+-------------
testschema | student | student_id | int | 8 | | t | t |
testschema | student | name | varchar(20) | 20 | | f | f |
testschema | student | major | varchar(20) | 20 | | f | f |
(3 rows)
student_id
是主键。我正在测试使用命令从csv
文件中加载数据。copy
首先我用insert
-insert into testschema.student values (1,'Jack','Biology');
然后我csv
在目录中创建了一个文件/home/dbadmin/vertica_test
-
vi student.csv
2,Kate,Sociology
3,Claire,English
4,Jack,Biology
5,Mike,Comp. Sci
然后我运行了这个命令
copy testschema.students from '/home/dbadmin/vertica_test/student.csv' delimiter ',' rejected data as table students_rejected;
我测试了结果
select * from testschema.student - shows 5 rows
select * from students_rejected; - no rows
然后我创建另一个csv
包含错误数据的文件/home/dbadmin/vertica_test directory
vi student_bad.csv
bad_data_type_for_student_id,UnaddedStudent, UnaddedSubject
6,Cassey,Physical Education
我从坏csv
文件中添加了数据
copy testschema.students from '/home/dbadmin/vertica_test/student.csv' delimiter ',' rejected data as table students_rejected;
然后我测试了输出
select * from testschema.student - shows 6 rows <-- only one row got added. all ok
select * from students_rejected; - shows 1 row <-- bad row's entry is here. all ok
一切看起来都不错
然后我再次添加了错误数据,没有拒绝数据选项
copy testschema.students from '/home/dbadmin/vertica_test/student_bad.csv' delimiter ',' ;
但现在学生 id 为 6 的条目又被添加了!!
student_id | name | major
------------+--------+--------------------
1 | Jack | Biology
2 | Kate | Sociology
3 | Claire | English
4 | Jack | Biology
5 | Mike | Comp. Sci
6 | Cassey | Physical Education <--
6 | Cassey | Physical Education <--
这不应该被拒绝吗?
解决方案
如果您students
使用这种类型的命令创建了您的:
DROP TABLE IF EXISTS students;
CREATE TABLE students (
student_id int
, name varchar(20)
, major varchar(20)
, CONSTRAINT pk_students PRIMARY KEY(student_id)
);
也就是说,如果没有显式关键字ENABLED
,则禁用主键约束。students
也就是说,您可以愉快地插入重复项,但如果您稍后想通过主键列连接到表,则会遇到错误。
启用主键约束...
[...]
, CONSTRAINT pk_students PRIMARY KEY(student_id) ENABLED
[...]
我想你会得到想要的效果。
整个场景:
DROP TABLE IF EXISTS students;
CREATE TABLE students (
student_id int
, name varchar(20)
, major varchar(20)
, CONSTRAINT pk_students PRIMARY KEY(student_id) ENABLED
);
INSERT INTO students
SELECT 1,'Jack' ,'Biology'
UNION ALL SELECT 2,'Kate' ,'Sociology'
UNION ALL SELECT 3,'Claire','English'
UNION ALL SELECT 4,'Jack' ,'Biology'
UNION ALL SELECT 5,'Mike' ,'Comp. Sci'
UNION ALL SELECT 6,'Cassey','Physical Education'
;
-- out OUTPUT
-- out --------
-- out 6
COMMIT;
COPY students FROM STDIN DELIMITER ','
REJECTED DATA AS TABLE students_rejected;
6,Cassey,Physical Education
\.
-- out vsql:/home/gessnerm/._vfv.sql:4: ERROR 6745:
-- out Duplicate key values: 'student_id=6'
-- out -- violates constraint 'dbadmin.students.pk_students'
SELECT * FROM students;
-- out student_id | name | major
-- out ------------+--------+--------------------
-- out 1 | Jack | Biology
-- out 2 | Kate | Sociology
-- out 3 | Claire | English
-- out 4 | Jack | Biology
-- out 5 | Mike | Comp. Sci
-- out 6 | Cassey | Physical Education
SELECT * FROM students_rejected;
-- out node_name | file_name | session_id | transaction_id | statement_id | batch_number | row_number | rejected_data | rejected_data_orig_length | rejected_reason
-- out -----------+-----------+------------+----------------+--------------+--------------+------------+---------------+---------------------------+-----------------
-- out (0 rows)
唯一可靠的检查似乎是ANALYZE_CONSTRAINTS()
电话...
ALTER TABLE students ALTER CONSTRAINT pk_students DISABLED;
-- out Time: First fetch (0 rows): 7.618 ms. All rows formatted: 7.632 ms
COPY students FROM STDIN DELIMITER ','
REJECTED DATA AS TABLE students_rejected;
6,Cassey,Physical Education
\.
-- out Time: First fetch (0 rows): 31.790 ms. All rows formatted: 31.791 ms
SELECT * FROM students;
-- out student_id | name | major
-- out ------------+--------+--------------------
-- out 1 | Jack | Biology
-- out 2 | Kate | Sociology
-- out 3 | Claire | English
-- out 4 | Jack | Biology
-- out 5 | Mike | Comp. Sci
-- out 6 | Cassey | Physical Education
-- out 6 | Cassey | Physical Education
SELECT * FROM students_rejected;
-- out node_name | file_name | session_id | transaction_id | statement_id | batch_number | row_number | rejected_data | rejected_data_orig_length | rejected_reason
-- out -----------+-----------+------------+----------------+--------------+--------------+------------+---------------+---------------------------+-----------------
-- out (0 rows)
SELECT ANALYZE_CONSTRAINTS('students');
-- out Schema Name | Table Name | Column Names | Constraint Name | Constraint Type | Column Values
-- out -------------+------------+--------------+-----------------+-----------------+---------------
-- out dbadmin | students | student_id | pk_students | PRIMARY | ('6')
-- out (1 row)
推荐阅读
- cyclejs - Cycle.js:获取流,期待 Blob
- javascript - 使用 Reactjs 在画布上绘制图像后无法将画布转换为图像
- amazon-web-services - AWS EFS 上的客户端连接数?
- spring - 为什么 sortBy 的排序不同于 Kotlin 中的流排序?
- node.js - .env 为多个文件返回 undefined
- java - 以错误的顺序显示消息
- go - 文件不是 `goimports`-ed 与 -local somePath
- android - 如何仅以 10% 的增量调用 NotificationManager
- wordpress - 如何将动画 gif 作为特色图像添加到 woocommerce 产品中?
- r - R 数据框 - 全部转换
为空白(“”)字符列