首页 > 解决方案 > python: UnicodeDecodeError: 'utf-8' codec can't decode byte

问题描述

I am trying to read the latest jstack file and search for "RUNNABLE", "BLOCKED", and "TIMED_WAITING". This was working before but after a few runs and trying to modify some of the list words it stopped working and started to see the following error on the output. I tried encoding to utf-8 but received the same error. When I tried encoding to ISO-8859-1 it worked but the count is not correct

import os

def wordcount(filename, listwords):
try:
  # file = open(filename, encoding ='ISO-8859-1')
  #  file = open(filename, encoding ='utf-8')
    file = open(filename, "r")
    read = file.readlines()
    file.close()

    for word in listwords:
        #lower = word.lower()
        count = 0
        for sentence in read:
            line = sentence.split()
            for each in line:
                line2 = each.upper()
                #line2 = line2.strip("java.lang.Thread.State: ")
                if word == line2:
                    count += 1

        print (word, ":", count)
    except FileExistsError:
        print ("Thread dump is not there")

path = '/Users/YEscobar/Desktop/jstack'
filePath = [os.path.join(path, fname) for fname in os.listdir(path)]
lastFile = sorted(filePath, key=os.path.getctime)[-1]


wordcount (lastFile,["RUNNABLE","BLOCKED", "TIMED_WAITING"])

console output

/Users/YEscobar/.virtualenvs/python_workstation1/bin/python /Users/YEscobar/Library/Preferences/PyCharmCE2018.2/scratches/test6.py
Traceback (most recent call last):
 File "/Users/YEscobar/Library/Preferences/PyCharmCE2018.2/scratches/test6.py", line 32, in <module>
   wordcount (lastFile,["RUNNABLE","BLOCKED","TIMED_WAITING"])
 File "/Users/YEscobar/Library/Preferences/PyCharmCE2018.2/scratches/test6.py", line 9, in wordcount
   read = file.readlines()
 File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode
   (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdb in position 20: invalid continuation byte

console Output with uncommented encoding = ISO-8859-1

RUNNABLE : 2
BLOCKED : 0
TIMED_WAITING : 3

Grep on the console

grep -o RUNNABLE jstack.20180802-202002.log | wc -l
      14
grep -o BLOCKED jstack.20180802-202002.log | wc -l
      0
grep -o TIMED_WAITING jstack.20180802-202002.log | wc -l
      24

标签: python-3.xpython-unicode

解决方案


推荐阅读