首页 > 解决方案 > Apache Nifi ExecuteStreamCommand UnicodeEncodeError 中的 python 代码:不允许代理

问题描述

我在 csv 文件中有一些中文单词。我通过 nifi ExecuteStreamCommand 处理器来输入 csv 文件。我得到了 UnicodeEncodeError: surrogates not allowed。我确保 csv 文件是 utf-8。问题应该是关于中文单词,因为当我删除这些单词时,我没有错误。

示例 csv:

SAP_MATERIAL,GENERIC_ARTICLE,DIM1,DIM2,EAN_UPC,CURRENT_YEAR,CURRENT_SEASON,ARTICLE_DESC,CHINESE_DESC,STYLE,GENDER_CODE,LOCAL_GENDER,LOCAL_GENDER_DESC,SBU_CODE,SBU_DESC,COLLECTION,COLLECTION_DESC,BRAND,SBU_SUB_CODE,SBU_SUB_DESC,CN_RETAIL_PRODUCT_TYPE_DESC,SBU_DESC_CN,SBU_SUB_DESC_CN,TBL_CATEGORY,AGE_GROUP,SAP_GENDER_DESC,COLOR_GROUP_DESC,keep,df_cal
TB027657626105500M,TB0276576261,055,00M,885641855420,2012,SS,"[TB0276576261]KENNBNK GLADIATOR RED,FQ",,027657,W,,,906,TB Tree Footwear,X,,TI,9AR,TB_FT_Women,,TB Tree 鞋履,TB_鞋履_女款,Footwear,,W - WOMEN,RED,1,333
TB027657626105500M,TB0276576261,055,00M,885641855420,2012,SS,"[TB0276576261]KENNBNK GLADIATOR RED,FQ",,027657,W,,,906,TB Tree Footwear,X,,TI,9AR,TB_FT_Women,,TB Tree 鞋履,TB_鞋履_女款,Footwear,,W - WOMEN,RED,2,333

这是我的代码:

#!/usr/bin/python3.6

import sys
import pandas as pd
import numpy as np
import io

df = pd.read_csv(sys.stdin)
df = df.drop_duplicates(
        subset=df.columns.difference(['keep']),keep = False)
df = df[(df.keep == '2')]
df.drop(['keep','df_cal'],axis = 1,inplace = True)


df.to_csv(sys.stdout,index = None)

以下是一些图片,以帮助大家了解情况。

这是输入 ExecuteStreamCommand 处理器之前的文件: 在此处输入图像描述 这是错误: 在此处输入图像描述 ExecuteStreamCommand 处理器设置: 在此处输入图像描述 mergeContent 处理器设置: 在此处输入图像描述

我试图更新我的代码,但仍然遇到同样的错误

#!/usr/bin/python3.6

import sys
import pandas as pd
import numpy as np
import io

df = pd.read_csv(sys.stdin)
df = df.drop_duplicates(
        subset=df.columns.difference(['keep']),keep = False)
df = df[(df.keep == '2')]
df.drop(['keep','df_cal'],axis = 1,inplace = True)
for column in df:
    df[column] = df[column].astype(str).str.encode('utf-8')

df.to_csv(sys.stdout,index = None)

如何修复错误?任何帮助将不胜感激。

标签: pythonpandasunicodeapache-nifistdin

解决方案


推荐阅读