首页 > 解决方案 > 从 CSV/List 读取时出现 UnicodeDecodeError:意外的数据结束

问题描述

所以我正在尝试使用一种叫做 DeepMoji 的东西来对一个满是推文的 csv 进行评分。推文必须以 Unicode 编码。我已经能够使它与一个小数据集一起工作,但是对于我拥有超过 200,000 个点的数据集,我收到了这个错误:UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 254: unexpected end of数据。

我尝试过的代码和解决方案如下,但给出了同样的错误,有人有什么想法吗?

TEST_SENTENCES = []
with open('Cleaned_Data3.csv', 'rU') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        TEST_SENTENCES.append(row["Tweet"])
    try:
        [x.encode('utf-8') for x in TEST_SENTENCES]
    except:
        for rows in TEST_SENTENCES: #attempt to fix the problem 
            str=unicode(str, errors='replace')

这是完整的错误代码。

Traceback (most recent call last):
  File "C:\Users\pjame\Desktop\DeepMoji-master\examples\score_texts_emojis.py", line 24, in <module>
    for row in reader:
  File "C:\Python27\lib\site-packages\unicodecsv\py2.py", line 217, in next
    row = csv.DictReader.next(self)
  File "C:\Python27\lib\csv.py", line 108, in next
    row = self.reader.next()
  File "C:\Python27\lib\site-packages\unicodecsv\py2.py", line 128, in next
    for value in row]
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 254: unexpected end of data

标签: pythonlistcsvunicodedecode

解决方案


推荐阅读