首页 > 解决方案 > 为 .executemany 提供 DataFrame 值以进行 UPDATE

问题描述

我需要通过 python 更新 MSSQL 中的 excel 数据。因此,首先我阅读了 excel 文件(其中只有一行)并对数据类型进行了必要的更改:

import pyodbc 
import pandas as pd
df = pd.read_excel('test_new.xlsx')
first_col = df.pop('number')
df['number'] = first_col
df['sys_created_on'] = df['sys_created_on'].dt.strftime('%Y-%m-%d %H:%M:%S')
df['opened_at'] = df['opened_at'].dt.strftime('%Y-%m-%d %H:%M:%S')
df['resolved_at'] = df['resolved_at'].dt.strftime('%Y-%m-%d %H:%M:%S')
df['sys_updated_on'] = df['sys_updated_on'].dt.strftime('%Y-%m-%d %H:%M:%S')
df.fillna('', inplace=True)
records = df.to_records(index=False)
result = list(records)
result

上面的语句给出了结果。结果是一个列表

[('Sha', 'po1301', 'Closed', 'TLS - Admin', 15585, 'Store Network', 'Switch', 'CATS0005587', '', 'mic.vin.com ', 'pis - opo', 'LEC NIN', '2020-08-11 23:56:03', '1 Pro 1 LLC', '', '', 'Switch', '', '', '', 'Phone', 'oshi3001', 'Resolved', '"Approved Tech\r\n1. troubleshooting.\r\n"\r\n\r\n\r\nIMPORTANT - TECH SHOULD CALL NOC ON ARRIVAL', 'No', False, '1 - High', '0-6 hours', False, 'mada gill', '7-ELEVEN STORE - 38578', '', '2020-08-11 23:56:03', 'john win', '1 - Critical', 6, '', 'Not Solved (Not Reproducible)', 'duplicate case, see notes', 59022, '2020-08-12 16:19:45', '', 'UNKNOWN', '3 - Low', 'Store is hard down', 1, 'Power Verification', 'svd', '1 - High', '2020-08-12 16:19:45', False, 'INC1090099')] 

在这部分之后,我创建了一个 SQL 更新语句:

conn = pyodbc.connect()
cursor = conn.cursor()
cursor.executemany("Update proj.[mid].[datahub] SET assigned_to = ?, [caller_id#user_name] = ?, state = ?, assignment_group = ?, business_duration = ?, business_service = ?, category = ?, u_category_structure = ?, u_resolution_ci = ?, resolved_by = ?, u_resolver_group = ?, u_service_provider = ?, sys_created_on = ?, caller_id = ?, u_caller_first_name = ?, u_caller_last_name = ?, u_vendor_category = ?, rfc = ?, closed_at = ?, closed_by = ?, contact_type = ?, sys_created_by = ?, u_customer_portal_state = ?, description = ?, u_first_call_resolution = ?, u_first_time_fix = ?, impact = ?, u_incident_age = ?, knowledge = ?, reopened_by = ?, location = ?, u_on_behalf_of = ?, opened_at = ?, opened_by = ?, priority = ?, reassignment_count = ?, u_resolution_method = ?, close_code = ?, close_notes = ?, calendar_stc = ?, resolved_at = ?, u_sd_esclation = ?, sla_due = ?, severity = ?, short_description = ?, reopen_count = ?, subcategory = ?, sys_updated_by = ?, urgency = ?, sys_updated_on = ?, [assigned_to#active] = ? WHERE number = ?", result)
cursor.commit()

它给出错误: ProgrammingError:executemany 的第二个参数必须是序列、迭代器或生成器。

由于某些原因,当它确实是一个列表时,它不会将结果变量作为列表。如果我在 cursor.executemany() 中打印语句然后执行它,它会更新数据,但是使用包含列表的结果变量,它不起作用。

请帮忙。

标签: pythonsql-serverpandasdataframepyodbc

解决方案


result = print(list(tuple(records)))

将结果打印到控制台,但print函数本身返回None. 相反,您应该省略records调用并像这样分配result......

result = list(df.itertuples(index=False))

...然后打印出来,如果你愿意

print(result)

编辑:完整的工作示例

cnxn = pyodbc.connect("DSN=mssqlLocal64")
crsr = cnxn.cursor()

# set up test environment
crsr.execute("CREATE TABLE #tmp (txt varchar(10), id int primary key)")
crsr.execute(
    "INSERT INTO #tmp (txt, id) VALUES "
    "('old_foo', 1), ('old_bar', 2)"
)
print(crsr.execute("SELECT * FROM #tmp").fetchall())
"""console output:
[('old_foo', 1), ('old_bar', 2)]
"""

# test
df = pd.DataFrame([("new_foo", 1), ("new_bar", 2)], columns=["txt", "id"])
result = list(df.itertuples(index=False))
crsr.executemany("UPDATE #tmp SET txt = ? WHERE id = ?", result)
print(crsr.execute("SELECT * FROM #tmp").fetchall())
"""console output:
[('new_foo', 1), ('new_bar', 2)]
"""

推荐阅读