首页 > 解决方案 > Python Openpyxl 将 CSV 转换为 XLSX 并从包含数字的单元格中删除“$”

问题描述

我必须将由第三方生成并包含字符串、整数和价格(有时带有 $ 符号)的混合文件读取到 XLSX 文件中。这是存储在我的 csv 文件 a_test_f.csv 中的示例数据:

ColA,ColB
1,$11.00
2,22
3,"$1,000.56"
4,44

这是我写的代码。我的问题是,这是执行此转换的最有效方式吗?是否有替代方法可以使用更少的处理能力/内存?这一点尤其重要,因为真正的 csv 文件将包含数千条记录和数百列,并且每天必须执行数万次转换操作。

import csv
import openpyxl

#
# Convert the data in csv file format that contains a mix of
# strings, integers and dollar amounts into xlsx file format
#

csvfile  = 'a_test_f.csv'
xlsxfile = 'new_xlsx_f.xlsx'

wb = openpyxl.Workbook()
ws = wb.active

# remove $ and , from numbers
class Clean:
    def __init__(self, data=''):
        self.__obj = data
    def __repr__(self):
        return f"{self.__obj}"
    def getData(self):
        return self.__obj

    def dollar(self):
        try:
            return Clean(data=self.__obj.replace('$',''))
        except TypeError as err:
            print(err)

    def comma(self):
        try:
            return Clean(data=self.__obj.replace(',',''))
        except TypeError as err:
            print(err)

    def digit(self):
        try:
            float(self.__obj)
            return True
        except ValueError:
            return False            

with open(csvfile) as f:
    reader = csv.reader(f, delimiter=',', quotechar='"')
    row_count=1
    for row in reader:
        for i in range(len(row)):
            if Clean(data=row[i]).dollar().comma().digit():
                content = float(repr(Clean(data=row[i]).dollar().comma()))
            else:
                content = row[i]                
            ws.cell(row=row_count,column=i+1).value = content
        row_count +=1

wb.save(xlsxfile)

print('Finished!')

标签: pythoncsvopenpyxlxlsxfile-conversion

解决方案


按照 Charlie 的建议,我使用 Functions 而不是 Class 重写了转换,然后尝试使用 Class 和 Functions 方法处理 csv 文件中的一百万个项目。结果:

  • Function 和 Class 方法使用等量的 CPU 和内存
  • Class 方法比使用 Functions 慢 9.4%

函数取胜。谢谢查理!

函数方法如下:

import csv
import openpyxl

#
# Convert the data in csv file format that contains a mix of
# strings, integers and dollar amounts into xlsx file format
#

csvfile  = 'large_test_export.csv'
xlsxfile = 'new_xlsx_f.xlsx'

wb = openpyxl.Workbook()
ws = wb.active

# remove $ and , from numbers

def strip_stuff(a_string):
    try:
        temp = a_string.replace(',','')
    except TypeError as err:
        print(err)      
    
    try:
        temp2 = temp.replace('$','')
    except TypeError as err:
        print(err)      

    try:
        temp3 = float(temp2)
        return temp3
    except ValueError as err:
        return temp2


def is_number(b_string):
    temp = strip_stuff(b_string)
    try:
        float (temp)
        return True
    except ValueError:
        return False

with open(csvfile) as f:
    reader = csv.reader(f, delimiter=',', quotechar='"')
    row_count=1
    for row in reader:
        for i in range(len(row)):
            if is_number(row[i]):
                content = strip_stuff(row[i])
            else:
                content = row[i]                
            ws.cell(row=row_count,column=i+1).value = content
        row_count +=1

wb.save(xlsxfile)

print('Finished!')

推荐阅读