首页 > 解决方案 > 使用 pyscopg2 将不同长度的行插入 Postgres

问题描述

我正在一个 for 循环中构建多个不同的 pandas 数据框,它们具有不同数量的列,具体取决于我正在抓取的网站上可用的数据。

我遇到的问题是,当我在初始循环结束时循环数据帧的行以使用 psycopg2 将它们插入到 postgres 中时,每个循环的列名长度和行数都会发生变化,这意味着我需要一个动态查询。一定数量的列将始终存在并且是字符类型,并且可能/可能不存在的列都是数字类型。

这是我已经尝试过的:

con = pypg.connect(user = pg_user, password = pg_pass,
                   host = "pg_host", database = "db",
                   port = "5432")

cursor = con.cursor()

# dt = pandas dataframe with n columns
cols = [i for i in dt.columns if i not in ["column1","column2","column3"]] 

# these columns are always in dt, want to convert others to numeric

for col in cols:
    dt[col]=pd.to_numeric(dt[col])

# Build the string insertion vectors for the correct number of columns
col_insert = "%s, %s, %s,"
data_insert = "%s, %s, %s,"

sql_colnames = tuple(dt.columns)

for i in range(1, (len(sql_colnames) - 2), 1):
  if i != (len(sql_colnames) - 3):
    data_insert = data_insert + " %d,"
    col_insert = col_insert + " %s,"
  elif i == (len(sql_colnames) - 3):
       data_insert = data_insert + " %d"
       col_insert = col_insert + " %s"

# Iterate through the rows of the dataframe and insert them into postgres
for index, row in all_odds_dt.iterrows():
    row_ = tuple(row)
    qry_data = sql_colnames + row_prices
    qry = "INSERT INTO odds_portal_prices (" + col_insert + ") VALUES(" + data_insert + ")" % qry_data

cursor.execute(qry)

我尝试运行查询时收到的错误是

  File "<ipython-input-351-14d7e958b2a7>", line 4, in <module>
    qry = "INSERT INTO odds_portal_prices (" + col_insert + ") VALUES(" + data_insert + ")" % qry_data
TypeError: not all arguments converted during string formatting

我检查了向量的长度以确保它与组合qry_data中的元素数量相匹配。col_insertdata_insert

提前感谢您的帮助。

标签: pythonpython-3.xpandaspostgresqlpsycopg2

解决方案


通过参数化,您可以简化大部分处理过程,而无需担心字符串和数字类型之间值的字符串格式。但是,首选str.format用于构建准备好的语句,但只能在任何循环之外构建一次。

注意:psycopg2 的参数占位符%s不要与 和 的 Python 字符串格式符号%s混淆%d

### CONVERT NUMERIC COLUMNS WITH apply()
num_cols = dt.columns.difference(["column1","column2","column3"]).values
dt[num_cols] = dt[num_cols].apply(pd.to_numeric)

### BUILD PREPARED STATEMENT (NO DATA)
sql = ("INSERT INTO dbo.Employee_Photo ({sql_cols}) VALUES ({placeholders})"
         .format(sql_cols = ", ".join([i for i in dt.columns]), 
                 placeholders = ", ".join(["%s" for i in dt.columns]))
      )

# EXECUTE PARAMETERIZED QUERY BINDING DF VALUES
cursor.executemany(sql, dt.values.tolist())   
con.commit()

推荐阅读