首页 > 解决方案 > 如何使用 Python 转义特定 .csv 列中的所有单双引号?

问题描述

我有一个带有特定(文本)列的 .csv 文件,其单元格偶尔会包含双引号 (")。在 ArcMap 中转换为 shapefile 时,这些单双引号会导致错误的转换。它们必须被“转义”。

我需要一个脚本来编辑 .csv 以便它:

  1. 用“”替换“”的所有实例。
  2. 将每个单元格用双引号引起来。

我的脚本:

import csv

with open(Source_CSV, 'r') as file1, open('OUTPUT2.csv','w') as file2:
    reader = csv.reader(file1)  

    # Write column headers without quotes
    headers = reader.next()
    str1 = ''.join(headers)
    writer = csv.writer(file2)
    writer.writerow(headers)

    # Write all other rows with quotes
    writer = csv.writer(file2, quoting=csv.QUOTE_ALL)
    for row in reader:
        writer.writerow(row)

此脚本成功完成所有列 的上述两项任务。

例如,这个原始的 .csv:

Column 1, Column 2, Column 3, Column 4 
Fred, Flintstone, 5'10", black hair 
Wilma, Flintstone, five feet seven inches, red hair 
Barney, Rubble, 5 feet 2" inches, blond hair 
Betty, Rubble, 5 foot 7, black hair

变成这样:

Column 1, Column 2, Column 3, Column 4
"Fred"," Flintstone"," 5'10"""," black hair"
"Wilma"," Flintstone"," five feet seven inches"," red hair"
"Barney"," Rubble"," 5 feet 2"" inches"," blond hair"
"Betty"," Rubble"," 5 foot 7"," black hair"

但是,如果我只想在第3 列(实际上偶尔有双引号的那一列)中完成此操作,该怎么办?

换句话说,我怎么能得到这个……?

Column 1, Column 2, Column 3, Column 4
Fred, Flintstone," 5'10""", black hair
Wilma, Flintstone," five feet seven inches", red hair
Barney, Rubble," 5 feet 2"" inches", blond hair
Betty, Rubble," 5 foot 7", black hair

标签: pythonpython-2.7csv

解决方案


仅引用包含双引号的字段就足够了吗?如果是这样,模块的默认行为csv将起作用,尽管我skipinitialspace=True在解析输入文件时添加了它,因此它不会将逗号后面的空格视为重要。

同样根据csv模块文档,我以二进制模式打开了文件。

import csv

with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
    reader = csv.reader(file1,skipinitialspace=True)  
    writer = csv.writer(file2)

    for row in reader:
        writer.writerow(row)

输入:

Column 1, Column 2, Column 3, Column 4
Fred, Flintstone, 5'10", black hair
Wilma, Flintstone, five feet seven inches, red hair
Barney, Rubble, 5 feet 2" inches, blond hair
Betty, Rubble, 5 foot 7, black hair

输出:

Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,five feet seven inches,red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,5 foot 7,black hair

如果您需要引用第 3 列的每一行,则可以手动进行。我已将csv模块设置为不引用任何内容,并将引号字符设置为不应出现在输入中的不可打印控制字符:

import csv

with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
    reader = csv.reader(file1,skipinitialspace=True)
    writer = csv.writer(file2,quoting=csv.QUOTE_NONE,quotechar='\x01')

    # Write column headers without quotes
    headers = reader.next()
    writer.writerow(headers)

    # Write 3rd column with quotes
    for row in reader:
        row[2] = '"' + row[2].replace('"','""') + '"'
        writer.writerow(row)

输出:

Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,"five feet seven inches",red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,"5 foot 7",black hair

推荐阅读