首页 > 解决方案 > 如何解决UnicodeEncodeError:'ascii',同时处理pandas列,同时将列转换为str datattype?

问题描述

我正在研究 bitbucket,代码是通过 Airflow 编排的。关于将列类型转换为str类型。它抛出以下错误。

回溯(最后一次调用):文件“/usr/local/airflow/scripts/orc.py”,第 86 行,在 cur.executemany("INSERT INTO oracle_table_name VALUES (:1,:2,:3,:4, :5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15)",rows) UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10:序数不在范围内(128)命令退出并返回代码 1 Bash 命令失败

在本地 IDE 上运行相同的代码时,无需转换数据类型即可正常工作。

import pandas as pd
import psycopg2
import cx_Oracle
import sys
import datetime as dt
from sqlalchemy import types, create_engine
import datetime

start_time = datetime.datetime.now()

host = sys.argv[1]
username = sys.argv[2]
password = sys.argv[3]
database = sys.argv[4]

try:
    print("Start of postgresql")
    conn = psycopg2.connect(host=host, database=database, user=username, password=password)
    sql = "SELECT * FROM postgresql_table_name "
    print('database connection established with RDS')
    data_df = pd.read_sql_query(sql, con=conn)
    print('data fetched into dataframe')
    print(data_df.head())
    conn.close()
    print('database connection closed')
    print("col conv starts")

    print("before")
    print(data_df.dtypes)
    all_columns = ['a','b','c','d']
    data_df[all_columns] = data_df[all_columns].str.decode(encoding='ASCII')
    all_columns = ['e','f','g','h']
    data_df[all_columns] = data_df[all_columns].astype('float64')
    data_df['i'] = data_df['i'].astype('int')
    print("date conv starts")
    data_df['j'] = pd.to_datetime(data_df['j'], errors='coerce', format='%d/%m/%Y')
    data_df['k'] = pd.to_datetime(data_df['k'], errors='coerce', format='%d/%m/%Y')
    data_df['l'] = pd.to_datetime(data_df['l'], errors='coerce', format='%d/%m/%Y')
    data_df['j'] = data_df['j'].dt.date
    data_df['k'] = data_df['k'].dt.date
    data_df['l'] = data_df['l'].dt.date
    print("after")
    print(data_df.dtypes)

except psycopg2.OperationalError as e:
    print('Unable to connect!\n{0}').format(e)
    sys.exit(1)

print("oracle starts")
u_name = sys.argv[5]
pwd = sys.argv[6]
host_name = sys.argv[7]
port_no = sys.argv[8]
service_name = sys.argv[9]

try:
    dsn = cx_Oracle.makedsn(host_name, port_no, service_name=service_name)
    con_ora = cx_Oracle.connect(user=u_name, password=pwd, dsn=dsn)
    cur = con_ora.cursor()
    rows = [tuple(x) for x in data_df.values]
    cur.executemany("INSERT INTO oracle_table_name VALUES (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15)",rows)
    con_ora.commit()
    cur.execute('commit')
    cur.close()
    con_ora.close()

except cx_Oracle.Error as e:
    print('oracle error occurred ' + str(e))

print("Time taken = ", datetime.datetime.now() - start_time)

我尝试了类似的方法 -

data_df['a'] = data_df['a'].apply(lambda x: unidecode(unicode(x, encoding="utf-8")))
data_df['a'] = data_df['a'].str.decode(encoding='ASCII')
data_df[all_columns] = data_df[all_columns].apply(lambda val: unicodedata.normalize('NFKD',val).encode('ascii', 'ignore').decode())

以上方法均无效!

如何解决问题?任何帮助将不胜感激!

标签: python-3.xutf-8asciiairflow

解决方案


上面的代码是正确的。它后来通过更改语法起作用。


推荐阅读