python-3.x - 如何解决UnicodeEncodeError:'ascii',同时处理pandas列,同时将列转换为str datattype?
问题描述
我正在研究 bitbucket,代码是通过 Airflow 编排的。关于将列类型转换为str类型。它抛出以下错误。
回溯(最后一次调用):文件“/usr/local/airflow/scripts/orc.py”,第 86 行,在 cur.executemany("INSERT INTO oracle_table_name VALUES (:1,:2,:3,:4, :5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15)",rows) UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10:序数不在范围内(128)命令退出并返回代码 1 Bash 命令失败
在本地 IDE 上运行相同的代码时,无需转换数据类型即可正常工作。
import pandas as pd
import psycopg2
import cx_Oracle
import sys
import datetime as dt
from sqlalchemy import types, create_engine
import datetime
start_time = datetime.datetime.now()
host = sys.argv[1]
username = sys.argv[2]
password = sys.argv[3]
database = sys.argv[4]
try:
print("Start of postgresql")
conn = psycopg2.connect(host=host, database=database, user=username, password=password)
sql = "SELECT * FROM postgresql_table_name "
print('database connection established with RDS')
data_df = pd.read_sql_query(sql, con=conn)
print('data fetched into dataframe')
print(data_df.head())
conn.close()
print('database connection closed')
print("col conv starts")
print("before")
print(data_df.dtypes)
all_columns = ['a','b','c','d']
data_df[all_columns] = data_df[all_columns].str.decode(encoding='ASCII')
all_columns = ['e','f','g','h']
data_df[all_columns] = data_df[all_columns].astype('float64')
data_df['i'] = data_df['i'].astype('int')
print("date conv starts")
data_df['j'] = pd.to_datetime(data_df['j'], errors='coerce', format='%d/%m/%Y')
data_df['k'] = pd.to_datetime(data_df['k'], errors='coerce', format='%d/%m/%Y')
data_df['l'] = pd.to_datetime(data_df['l'], errors='coerce', format='%d/%m/%Y')
data_df['j'] = data_df['j'].dt.date
data_df['k'] = data_df['k'].dt.date
data_df['l'] = data_df['l'].dt.date
print("after")
print(data_df.dtypes)
except psycopg2.OperationalError as e:
print('Unable to connect!\n{0}').format(e)
sys.exit(1)
print("oracle starts")
u_name = sys.argv[5]
pwd = sys.argv[6]
host_name = sys.argv[7]
port_no = sys.argv[8]
service_name = sys.argv[9]
try:
dsn = cx_Oracle.makedsn(host_name, port_no, service_name=service_name)
con_ora = cx_Oracle.connect(user=u_name, password=pwd, dsn=dsn)
cur = con_ora.cursor()
rows = [tuple(x) for x in data_df.values]
cur.executemany("INSERT INTO oracle_table_name VALUES (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15)",rows)
con_ora.commit()
cur.execute('commit')
cur.close()
con_ora.close()
except cx_Oracle.Error as e:
print('oracle error occurred ' + str(e))
print("Time taken = ", datetime.datetime.now() - start_time)
我尝试了类似的方法 -
data_df['a'] = data_df['a'].apply(lambda x: unidecode(unicode(x, encoding="utf-8")))
data_df['a'] = data_df['a'].str.decode(encoding='ASCII')
data_df[all_columns] = data_df[all_columns].apply(lambda val: unicodedata.normalize('NFKD',val).encode('ascii', 'ignore').decode())
以上方法均无效!
如何解决问题?任何帮助将不胜感激!
解决方案
上面的代码是正确的。它后来通过更改语法起作用。
推荐阅读
- python - Hypercorn / Quart中的uWSGI“缓冲区大小”等效项
- tags - 按名称/值查找图像中的 exif 标签
- buffer - ipfs.cat 在 React-native 中返回“未定义”对象
- github - 无法将一些 ref 推送到我的远程存储库
- python - Discord.py 重写在 DM 中从用户消息中更改昵称
- apache - 在 Silverstripe 站点中重定向域
- python - 文本格式化python
- sql - 有什么方法可以使用 PL/SQL 输出每一行的列的结果以在 OracleDB 中显示不同的值?
- http - 为什么以及如何 HTTP 响应的状态码很重要?
- php - Reading Nested JSON in a PHP Response from form.submit and NOT a Store