首页 > 解决方案 > 使用 Python Pandas Dataframe 将数据插入 Sql 服务器的问题

问题描述

我正在尝试从 REST API 中提取数据并将其插入 SQL Server。如果我们让脚本一起执行 PhotoBinary,Filetype 它可以工作,但是只要我添加一个整数 ID,我们就会得到下面的错误。此外,如果我只是让它从 API 中提取 ID,它就可以工作。

我正在尝试提取 3 条信息

  1. EmployeeID,它是一个 int。
  2. 图像的二进制字符串表示
  3. 原始文件的文件类型 例如:.jpg

目标表设置为:

Create table Employee_Photo
( 
    EmployeeID  int,
    PhotoBinary varchar(max),
    FileType varchar(10)
)

我得到的错误是:

Traceback (most recent call last):
  File "apiphotopullwithid.py", line 64, in <module>
    cursor.execute("INSERT INTO dbo.Employee_Photo([EmployeeID],[PhotoBinary],[FileType]) values (?,?,?)", row['EMPID'],row['Photo'],row['PhotoType'])
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 5 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)')
import json
import pandas as pd
import sqlalchemy
import pyodbc
import requests

url = "https://someurl.com/api/PersonPhoto"

headers = {
    'Accept': "application/json",
    'Authorization': "apikey XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    'Content-Type': "application/json",
    'cache-control': "no-cache"
}

response = requests.request("GET", url, headers=headers)
data = json.loads(response.text)


ID,Photo,PhotoType = [],[],[]

for device in data['PersonPhoto']:
    ID.append(device[u'ID'])

    Photo.append(device[u'Photo'])

    PhotoType.append(device[u'PhotoType'])


df = pd.DataFrame([ID,Photo,PhotoType]).T
df.columns = ['EMPID','Photo','PhotoType']
df = df.astype({'EMPID':'Int64'})



connStr = pyodbc.connect(
    "DRIVER={SQL Server};"
    "SERVER=SQLTest;"
    "Database=Intranet123;"
    "Trusted_Connection=yes;"
    #"UID=ConnectME;"
    #"PWD={Password1}"
)
cursor = connStr.cursor()

for index,row in df.iterrows():
cursor.execute("INSERT INTO dbo.Employee_Photo([EmployeeID],[PhotoBinary],[FileType]) values (?,?,?)", row['EMPID'],row['Photo'],row['PhotoType']) 
    connStr.commit()
    cursor.close()
connStr.close()

标签: pythonsql-serverpandas

解决方案


在大多数 Python 数据库 API 中,包括遵循PEP 249规范的 pyodbc,其中的参数参数cursor.execute()通常是一个序列(即元组、列表)。因此,将所有值绑定到一个可迭代的而不是作为三个单独的参数值:

sql = "INSERT INTO dbo.Employee_Photo ([EmployeeID],[PhotoBinary],[FileType]) VALUES (?,?,?)"

# TUPLE
cursor.execute(sql, (row['EMPID'], row['Photo'], row['PhotoType']))

# LIST
cursor.execute(sql, [row['EMPID'], row['Photo'], row['PhotoType']])

顺便说一句,避免显式iterrows循环并executemany使用 Pandas 的DataFrame.values使用隐式循环:

# EXECUTE PARAMETERIZED QUERY
sql_cols = ['EMPID', 'Photo', 'PhotoType']
cursor.executemany(sql, df[sql_cols].values.tolist())   
conn.commit()

实际上,您甚至不需要 Pandas 作为中间层(仅将库用于数据科学)并与原始返回的 json 交互:

# NESTED LIST OF TUPLES
vals = [(int(device[u'ID']), device[u'Photo'], device[u'PhotoType']) \
           for device in data['PersonPhoto']]

cursor.executemany(sql, vals)   
conn.commit()

推荐阅读