首页 > 解决方案 > 使用 np.savetxt() 时如何修复 UnicodeEncodeError?

问题描述

我正在尝试使用将数组保存为文本文件np.savetxt()。但是我收到一个错误: UnicodeEncodeError: 'latin-1' codec can't encode character '\u1ec7' in position 15: ordinal not in range(256)

我检查了字符 '\u1ec7' 和它的拉丁小写字母 E 以及下面的圆环和点。

我尝试使用从数组中的文本中删除它,x = x.replace("[^a-zA-Z#]", " ")但它仍然给出错误。

这个错误到底是什么,可以做些什么来解决它?这是我的代码:

duplicate = X_train[y_train == 1]
not_duplicate = X_train[y_train == 0]

p = np.dstack([duplicate['question1'], duplicate['question2']]).flatten()
n = np.dstack([not_duplicate['question1'], not_duplicate['question2']]).flatten()

print ("Number of data points in class 1 (duplicate pairs) :",len(p))
print ("Number of data points in class 0 (non duplicate pairs) :",len(n))

#Saving the np array into a text file
np.savetxt('train_p.txt', p, delimiter=' ', fmt='%s', encoding = 'latin-1')
np.savetxt('train_n.txt', n, delimiter=' ', fmt='%s', encoding = 'latin-1')

变种'p' -

array(['how can i solve an encrypted  text  ',
       'where should i start to solve this encrypted  text  ',
       'how do i skip a class ', ..., 'how do know that you are in love ',
       'which is most beautiful place to visit  in kerala ',
       'which place in kerala is most beautiful '], dtype=object)

标签: pythonarraysnumpyreplace

解决方案


看起来像简单地省略encoding参数一样有效:

In [171]: '\u1ec7'                                                              
Out[171]: 'ệ'
In [172]: txt = ' '.join(['abc',_,_,'def',_])                                   
In [173]: txt                                                                   
Out[173]: 'abc ệ ệ def ệ'

作品:

In [174]: np.savetxt('test.txt', [txt], fmt='%s')                               
In [175]: cat test.txt                                                          
abc ệ ệ def ệ

不:

In [176]: np.savetxt('test.txt', [txt], fmt='%s', encoding='latin-1')           
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-176-8ba623098d70> in <module>
----> 1 np.savetxt('test.txt', [txt], fmt='%s', encoding='latin-1')

<__array_function__ internals> in savetxt(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments, encoding)
   1450     file : str or file
   1451         Filename or file object to read.
-> 1452     regexp : str or regexp
   1453         Regular expression used to parse the file.
   1454         Groups in the regular expression correspond to fields in the dtype.

UnicodeEncodeError: 'latin-1' codec can't encode character '\u1ec7' in position 4: ordinal not in range(256)

的默认值encodingNone,它被传递给io.open函数:

In [185]: f = open('test','w', encoding=None)                                   
In [186]: f                                                                     
Out[186]: <_io.TextIOWrapper name='test' mode='w' encoding='UTF-8'>

推荐阅读