首页 > 解决方案 > 将 Dataframe 写入 PostGres:单个单元格中的多行

问题描述

(初学者)

我正在尝试使用 psycopg2 将所有数据从 DF 写入我已经在 Postgres DB 中创建的表中。我已将每一列都输入为字符串。

我目前的方法允许我将数据写入表而不会出错,但它将所有行聚合到数据库中的同一个单元格中。我试过使用 executemany() 但我读过用 execute() 循环更有效。

表: 如您所见,同一条记录被插入到表中,整个数据集被注入到一个单元格中。PostgreSQL 表

生成DF的代码:


import csv
import sys
import pandas as pd
import numpy as np
from pandas import DataFrame
import datetime as dt
from datetime import datetime
import praw
import itertools
import time


# This script extracts data from the Reddit using the PRAW wrapper 
# from a list of Subreddits and appends them into a csv object

#Define Output Directory for csv Files

output_directory = "my/path/"

#Datetime value that will be appended to csv file name

today = dt.datetime.now()

#Create container for PRAW data and intercept fields from the Subreddit class

list_of_items = []
fields = ('id','title', 'url','selftext','name', 'created_utc', 'num_comments','permalink')

#Define list of Subreddits to query using PRAW

subs = ['Toyota','ToyotaTundra','ToyotaTacoma','Prius','4Runner','ToyotaHighlander','ToyotaSupra','cars','ToyotaPickup']

#Authenticate PRAW with Client Secret, User Agent, and ID

r = praw.Reddit(client_id='id',
                client_secret='secret',
                user_agent='agent')

# Function that initiates a call to each subreddit in the defined list 
# and appends the data to a dict and dumps the csv file into our directory.
for i in subs:
    for submission in r.subreddit(i).new(limit=10):
        to_dict = vars(submission)
        sub_dict = {field:to_dict[field] for field in fields}
        list_of_items.append(sub_dict)
        output=DataFrame.from_dict(list_of_items)

output.to_csv(str(output_directory)+'reddit_data'+str(today)+'.csv',sep='|',index=False,quoting=csv.QUOTE_ALL)

将 DF 写入我的 postgres DB 中的表

#Write Data Frame to Pg table

import csv
import psycopg2
conn = psycopg2.connect("dbname=db user=postgres password=pw, port=port")
cur = conn.cursor()
for i in output:
    cur.execute("""INSERT INTO submissions (id,title, url,selftext,name, created_utc, num_comments,permalink) VALUES (%s, %s,%s, %s,%s, %s,%s, %s,%s);""" , ((str(output['id']), str(output['title']), str(output['url']),str(output['selftext']),str(output['name']),str(output['created_utc']),str(output['num_comments']),str(output['permalink']))))

conn.commit()
cur.close()
conn.close()

DF 包含我从 API 提取的数据,并且文本很重,有很多空格、不规则字符等,但我认为这没有什么区别。

标签: pythonpandaspostgresqldataframepsycopg2

解决方案


推荐阅读