python - pandas: iterate over dataframe, do SQL query for each row
问题描述
I have a dataframe and a 5 million row local Postgres database. In each row of the dataframe, I want to add a column that is the result of a query against the Postgres database.
This is what I have right now:
for index, row in df_tf.iterrows():
row = dict(row)
id = row['National ID']
q = 'select name from companies where company_number=%s'
cursor.execute(q, [company_number])
results = cursor.fetchall()
if len(results):
row['name'] = result[0][0]
writer.writerow(row)
else:
row['name'] = ''
writer.writerow(row)
So I'm iterating over the rows and writing the results to a local CSV.
Is there a way I could do this more neatly, and keep the results in a local dataframe?
I know I could load the Postgres data into pandas and join directly, but it's rather large and slow, so I would prefer to use a Postgres query.
解决方案
我想我的第一眼看起来是这样的(未经测试):
import pandas
import psycopg2
import csv
import contextlib
def get_company_name(cursor, company_number):
query = 'SELECT name FROM companies WHERE company_number=%s;'
cursor.execute(query, [company_number])
results = cursor.fetchone()
return results[0] if results else ''
df_tf = pandas.DataFrame("...")
with contextlib.ExitStack() as ctx:
connection = ctx.enter_context(psycopg2.connect("..."))
cursor = ctx.enter_context(connection.cursor())
file_out = ctx.enter_context(open("results.csv", "w"))
writer = csv.DictWriter(file_out, fieldnames=["National ID", "Name"])
writer.writeheader()
for _, row in df_tf.iterrows():
row = dict(row)
row['Name'] = get_company_name(cursor, row['National ID'])
writer.writerow(row)
根据数据框中的数据,缓存来自get_company_name()
. 我想有更好的答案,但这是我会尝试的。
推荐阅读
- c++ - 如何使用 .sdf 文件修复 RDkit 中的“OSError:文件错误:输入文件错误”?
- xml - 在 XSLT 上将名称标记为文本和表格标签重复
- html - 如何允许用户手动调整垂直元素?
- c++ - 如何将 c 字符串转换为指针数组,其中每个指针都是 char 数组中的一个单词?
- vue.js - webpack 包大小大于应有的大小
- quicksort - 快速排序比归并排序慢
- python - 如何在嵌入之后或之前执行过采样?
- python - Pygame 事件处理程序相互干扰?
- javascript - JQuery/Javascript 动画 div 缩小和缩小
- arrays - VB.Net 格式数组日期