首页 > 解决方案 > 将多个 CSV 文件转换为 UTF-8 编码

问题描述

我需要将多个 CSV 文件(具有不同编码)转换为 UTF-8。

这是我的代码:

#find encoding and if not in UTF-8 convert it

import os
import sys
import glob
import chardet
import codecs

myFiles = glob.glob('/mypath/*.csv')

csv_encoding = []

for file in myFiles:
  with open(file, 'rb') as opened_file:
     bytes_file=opened_file.read()
     result=chardet.detect(bytes_file)
     my_encoding=result['encoding']
     csv_encoding.append(my_encoding)
        
print(csv_encoding)

for file in myFiles:
  if csv_encoding in ['utf-8', 'ascii']:
    print(file + ' in utf-8 encoding')
  else:
    with codecs.open(file, 'r') as file_for_conversion:
      read_file_for_conversion = file_for_conversion.read()
    with codecs.open(file, 'w', 'utf-8') as converted_file:
       converted_file.write(read_file_for_conversion)
    print(file +' converted to utf-8')

当我尝试运行此代码时,出现以下错误: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 5057: invalid continuation byte

有人能帮我吗?谢谢!!!

标签: pythonutf-8

解决方案


您需要zip列表myFiles并使csv_encoding它们的值对齐:

for file, encoding in zip(myFiles, csv_encoding):
    ...

您需要在open()调用中指定该值:

    ...
    with codecs.open(file, 'r', encoding=encoding) as file_for_conversion:

注意:在 Python 3 中,不需要使用该codecs模块来打开文件。只需使用内置open函数并使用参数指定编码encoding即可。


推荐阅读