首页 > 解决方案 > csv.DictReader 和 Session.bulk_save_objects?

问题描述

带有 SQLAlchemy 1.3 (MySQL 8) 的 Python 3.8

表定义:

在此处输入图像描述

在您询问之前,我无法提供 csv,因为它包含敏感数据。这是一个以 0x0A (\n) 作为换行符的“普通”csv 文件。在文本模式下打开时,DictReader工作但bulk_save_objects失败。

首先,我从 S3 下载文件:

def download_files(bucket, keys):
    filenames = []
    for key in keys:
        filename = key.split('/')[-1]
        filenames.append(filename)
        with open(f'data/{filename}', mode='wb') as f:
            bucket.download_fileobj(key, f)
    return filenames

没有mode='wb'

TypeError: write() argument must be str, not bytes

然后,我解析文件并保存:

def parse_files(filenames):
    objects = []
    for filename in filenames:
        with open(f'data/{filename}', mode='rb') as csvfile:
            reader = csv.DictReader(csvfile)
            for row in reader:
                row['movie_studios'] = row['movie_studios'].strip('"[]')
                obj = Outbound(filename=filename, **row)
                objects.append(obj)
    session.bulk_save_objects(objects)
    session.commit()

mode='rb'

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

没有mode='rb'和有[encoding='utf-8', ]newline=''

UnicodeEncodeError: 'charmap' codec can't encode characters in position 30-33: character maps to <undefined>

没有mode='rb'和有encoding='ascii', newline=''

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6658: ordinal not in range(128)

我试过 io.TextIOWrapper:

with open(f'data/{filename}', mode='rb') as f:
    with io.TextIOWrapper(f, encoding='utf-8', newline='') as csvfile:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 30-33: character maps to <undefined>

不奇怪...

这个角色似乎失败了:

在此处输入图像描述

我该如何解决这个问题?

标签: python-3.xcsvsqlalchemy

解决方案


我会检查 SQLAlchemy 的连接编码,以确保它使用utf-8而不是asci. 例如,指定charset它在他们的示例中显示的内容:

e = create_engine(
    "mysql+pymysql://scott:tiger@localhost/test?charset=utf8mb4")

推荐阅读