首页 > 技术文章 > 读取csv文件的字符编码错误

z-712 2020-04-14 19:11 原文

"D:\Program Files\Python36-32\python.exe" D:/PyCharm_Project/bishe/process/read_csv.py
Traceback (most recent call last):
  File "pandas\_libs\parsers.pyx", line 1130, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas\_libs\parsers.pyx", line 1254, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas\_libs\parsers.pyx", line 1269, in pandas._libs.parsers.TextReader._string_convert
  File "pandas\_libs\parsers.pyx", line 1459, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 2: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/PyCharm_Project/bishe/process/read_csv.py", line 12, in <module>
    df = pd.read_csv(csv_path)
  File "D:\Program Files\Python36-32\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "D:\Program Files\Python36-32\lib\site-packages\pandas\io\parsers.py", line 454, in _read
    data = parser.read(nrows)
  File "D:\Program Files\Python36-32\lib\site-packages\pandas\io\parsers.py", line 1133, in read
    ret = self._engine.read(nrows)
  File "D:\Program Files\Python36-32\lib\site-packages\pandas\io\parsers.py", line 2037, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
  File "pandas\_libs\parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas\_libs\parsers.pyx", line 952, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 1084, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas\_libs\parsers.pyx", line 1137, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas\_libs\parsers.pyx", line 1254, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas\_libs\parsers.pyx", line 1269, in pandas._libs.parsers.TextReader._string_convert
  File "pandas\_libs\parsers.pyx", line 1459, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 2: invalid start byte

解决办法哩,如下:

if __name__ == '__main__':
    csv_path = r'E:/data_backup/shuju/1540880931324.csv'
    # df = pd.read_csv(csv_path) 报错
    df = pd.read_csv(csv_path,engine="python") #不报错
    print(df.head(10))

碎碎念:黑鸭子组合的《茉莉花》很好听呀,大家累的时候可以听听呀!

参考文章
https://www.jb51.net/article/142060.htm
https://www.cnblogs.com/zhanshan/archive/2018/07/26/9370032.html

推荐阅读