首页 > 解决方案 > 如何使用修改过的熊猫数据框的 csv 更新 postgreSQL?

问题描述

以下代码有效:

import pandas as pd
import csv
import psycopg2

df = pd.read_csv(r'https://developers.google.com/adwords/api/docs/appendix/geo/geotargets-2021-02-24.csv')
df=df.rename(columns = {'Criteria ID':'Criteria_ID','Canonical Name':'Canonical_Name','Parent ID':'Parent_ID','Country Code':'Country_Code','Target Type':'Target_Type'})
df = df.loc[df['Country_Code']=='IN']
df.to_csv(r'C:\Users\Harshal\Desktop\tar.csv',index=False)

conn = psycopg2.connect(host='1.11.11.111',
                   dbname='postgres',
                   user='postgres',
                   password='myPassword',
                   port='1234')  
cur = conn.cursor()
f = open('C:\Users\Harshal\Desktop\tar.csv', 'r')
cur.copy_expert("""copy geotargets_india from stdin with (format csv, header, delimiter ',', quote '"')""", f)
conn.commit()
conn.close()
f.close()

但是我不想保存更改的数据框,而是直接将其上传到 postgreSQL 表中。我试过cur.copy_expert("""copy geotargets_india from stdin with (format csv, header, delimiter ',', quote '"')""", df)了,但它会引发错误。注意:cur.copy_expert("""copy geotargets_india from stdin with (format csv, header, delimiter ',', quote '"')""", f)无法避免,因为我在某些条件下保存 csv。我的表结构:

create table public.geotargets_india(
Criteria_ID integer not null,
Name character varying(50) COLLATE pg_catalog."default" NOT NULL,
Canonical_Name character varying(100) COLLATE pg_catalog."default" NOT NULL,
Parent_ID NUMERIC(10,2),
Country_Code character varying(10) COLLATE pg_catalog."default" NOT NULL,
Target_Type character varying(50) COLLATE pg_catalog."default" NOT NULL,
Status character varying(50) COLLATE pg_catalog."default" NOT NULL
)

在此处输入图像描述

编辑:我试过

import pandas as pd
import csv
import psycopg2
from sqlalchemy import create_engine

df = pd.read_csv(r'https://developers.google.com/adwords/api/docs/appendix/geo/geotargets-2021-02-24.csv')
df=df.rename(columns = {'Criteria ID':'Criteria_Id','Canonical         Name':'Canonical_Name','Parent ID':'Parent_ID','Country Code':'Country_Code','Target Type':'Target_Type'})
df = df.loc[df['Country_Code']=='IN']
df['Canonical_Name']=df['Canonical_Name'].str.replace(',', " ")
engine = create_engine('postgresql+psycopg2://postgres:myPassword@1.11.11.111:1234/postgres')
df.to_sql(
 'geotargets_india',
  con=engine,
  schema=None, 
  if_exists='append', 
  index=False
)

但出现错误:UndefinedColumn: column "Criteria_Id" of relation "geotargets_india" does not exist LINE 1: INSERT INTO geotargets_india ("Criteria_Id", "Name", "Canoni...

EDIT2:如果我删除表并且新表创建的脚本如下所示,则上述代码有效:

CREATE TABLE public.geotargets_india
(
"Criteria_Id" bigint,
"Name" text COLLATE pg_catalog."default",
"Canonical_Name" text COLLATE pg_catalog."default",
"Parent_ID" double precision,
"Country_Code" text COLLATE pg_catalog."default",
"Target_Type" text COLLATE pg_catalog."default",
"Status" text COLLATE pg_catalog."default"
)

为什么它不能使用预定义的表模式?

标签: pythonpandaspostgresqlpsycopg2

解决方案


我尝试了你的代码并更正了一些行,我的工作,

import pandas as pd
from sqlalchemy import create_engine

df = pd.read_csv(r'https://developers.google.com/adwords/api/docs/appendix/geo/geotargets-2021-02-24.csv', delimiter=',')
print(df)
df=df.rename(columns = {'Criteria ID':'Criteria_Id','Canonical Name':'Canonical_Name','Parent ID':'Parent_ID','Country Code':'Country_Code','Target Type':'Target_Type'})
df = df.loc[df['Country_Code']=='IN']
df['Canonical_Name']=df['Canonical_Name'].str.replace(',', " ")
engine = create_engine('postgresql+psycopg2://collaborateur1:nG@e3P@tapp581lv:2345/base_project')
df.to_sql('geotargets_india',con = engine,schema=None,if_exists='append',index=False)

我添加了分隔符 ',' 并更正了 'Canonical Name'


推荐阅读