python - 使用 np.savetxt() 时如何修复 UnicodeEncodeError?
问题描述
我正在尝试使用将数组保存为文本文件np.savetxt()
。但是我收到一个错误:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u1ec7' in position 15: ordinal not in range(256)
我检查了字符 '\u1ec7' 和它的拉丁小写字母 E 以及下面的圆环和点。
我尝试使用从数组中的文本中删除它,x = x.replace("[^a-zA-Z#]", " ")
但它仍然给出错误。
这个错误到底是什么,可以做些什么来解决它?这是我的代码:
duplicate = X_train[y_train == 1]
not_duplicate = X_train[y_train == 0]
p = np.dstack([duplicate['question1'], duplicate['question2']]).flatten()
n = np.dstack([not_duplicate['question1'], not_duplicate['question2']]).flatten()
print ("Number of data points in class 1 (duplicate pairs) :",len(p))
print ("Number of data points in class 0 (non duplicate pairs) :",len(n))
#Saving the np array into a text file
np.savetxt('train_p.txt', p, delimiter=' ', fmt='%s', encoding = 'latin-1')
np.savetxt('train_n.txt', n, delimiter=' ', fmt='%s', encoding = 'latin-1')
变种'p' -
array(['how can i solve an encrypted text ',
'where should i start to solve this encrypted text ',
'how do i skip a class ', ..., 'how do know that you are in love ',
'which is most beautiful place to visit in kerala ',
'which place in kerala is most beautiful '], dtype=object)
解决方案
看起来像简单地省略encoding
参数一样有效:
In [171]: '\u1ec7'
Out[171]: 'ệ'
In [172]: txt = ' '.join(['abc',_,_,'def',_])
In [173]: txt
Out[173]: 'abc ệ ệ def ệ'
作品:
In [174]: np.savetxt('test.txt', [txt], fmt='%s')
In [175]: cat test.txt
abc ệ ệ def ệ
不:
In [176]: np.savetxt('test.txt', [txt], fmt='%s', encoding='latin-1')
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-176-8ba623098d70> in <module>
----> 1 np.savetxt('test.txt', [txt], fmt='%s', encoding='latin-1')
<__array_function__ internals> in savetxt(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments, encoding)
1450 file : str or file
1451 Filename or file object to read.
-> 1452 regexp : str or regexp
1453 Regular expression used to parse the file.
1454 Groups in the regular expression correspond to fields in the dtype.
UnicodeEncodeError: 'latin-1' codec can't encode character '\u1ec7' in position 4: ordinal not in range(256)
的默认值encoding
是None
,它被传递给io.open
函数:
In [185]: f = open('test','w', encoding=None)
In [186]: f
Out[186]: <_io.TextIOWrapper name='test' mode='w' encoding='UTF-8'>
推荐阅读
- python - 如何在 Python 3 中的列表中向后减去?
- android - 如何使用 MAC 地址 [Android] 获取本地网络中任何设备的动态 ip?
- laravel-blade - Laravel 5.6 错误“函数名必须是字符串”,内置身份验证
- angular - 角度 5 背景图像不绑定某些 android 设备?
- amazon-web-services - AWS-Auto Scaling 上的 AWS cloudwatch 自定义指标
- android - 无法解决:play-services-tasks
- php - 为什么我们通过箭头符号'->'从那个子对象调用Yii2静态组件方法?
- php - 当我们按数字多个字符串时,为什么在 php 中会发生这种情况,它总是给出零(0)?
- sql - While Loop 未在 SQL Server 的存储过程中完成其执行
- python-3.x - 在python中使用opencv3 grabcut函数出现意外结果