首页 > 解决方案 > postgresql如何在冲突时更新另一个表

问题描述

我有两张桌子

CREATE TABLE people (
 name VARCHAR(100) NOT NULL,
 company_id int8 NOT NULL,
);

CREATE TABLE company (
 id int8 NOT NULL,
);

我想将数据从 csv 复制到 DB。这是我的脚本

BEGIN
  CREATE TEMP TABLE tmp_company
                ON COMMIT DROP AS SELECT * FROM company WITH NO DATA;
  \COPY tmp_company FROM 'company.csv' WITH CSV HEADER
                DELIMITER as ',';
  INSERT INTO company
                SELECT * FROM tmp_company
                ON CONFLICT DO NOTHING;

  CREATE TEMP TABLE tmp_people
                ON COMMIT DROP AS SELECT * FROM people WITH NO DATA;
  \COPY tmp_people FROM 'people.csv' WITH CSV HEADER
                DELIMITER as ',';
  INSERT INTO people
                SELECT * FROM tmp_people
                ON CONFLICT DO NOTHING;
COMMIT;

如果在company表中找到现有的公司 id,我应该做 company.id+=1 并替换新的 company_id 为相关people记录。

例子:

company.csv
id
1
5

people.csv
name,company_id
tom,1
paul,5

existing company table data
id
1
2

existing people table data
name,company_id
tom,1
paul,2

After copying data from csv to DB, the data should look like

company table data
id
1
2
3 <-- from csv data, as 1,2 are used, set id=3
5

people table data
name,company_id
tom,1
paul,2
tom,3 <-- from csv data
paul,5 <-- from csv data

我怎样才能做到这一点?我想知道我是否可以在之后添加逻辑ON CONFLICT...

编辑 1:这两个表的大小接近 5TB。两个 csv 包含 5M 条记录。

标签: sqlpostgresql

解决方案


首先,您应该使用bigserial数据类型而不是int8表的列 id,company以便在插入新行时自动增加 id。

然后,您应该在表之间创建一个外键peoplecompany使用选项ON UPDATE CASCADE,以便id表列中的任何更改company都将自动传播到company_id表中的列people

CREATE TABLE company (
 id bigserial NOT NULL
);

CREATE TABLE people (
 name VARCHAR(100) NOT NULL,
 company_id int8 NOT NULL,
 CONSTRAINT fkey FOREIGN KEY company_id REFERENCES company(id) ON UPDATE CASCADE
);

推荐阅读