首页 > 解决方案 > 来自 CSV 的 Python 2 替换特定列

问题描述

我有一些 CSV 文件,格式为 ID、时间戳、客户 ID、电子邮件等。我想将电子邮件列填充为空,其他列保持不变。我正在使用 Python 2.7,并且仅限于使用 Pandas。谁能帮我?谢谢大家的帮助

我的代码在下面,但这不是那么高效和可靠,如果某些原始字符具有奇怪的字符,它将破坏逻辑。

new_columns = [

    '\xef\xbb\xbfID', 'timestamp', 'CustomerID', 'Email', 'CountryCode', 'LifeCycle', 'Package', 'Paystatus', 'NoUsageEver', 'NoUsage', 'VeryLowUsage',
    'LowUsage', 'NormalUsage', 'HighUsage', 'VeryHighUsage', 'LastStartDate', 'NPS 0-8', 'NPS Score (Q2)', 'Gender(Q38)', 'DOB(Q39)',
    'Viaplay users(Q3)', 'Primary Content (Q42)', 'Primary platform(Q4)', 'Detractor (strong) (Q5)', 'Detractor open text(Q22)',
    'Contact Detractor (Q21)', 'Contact Detractor (Q20)', 'Contact Detractor (Q43)', 'Contact Detractor(Q26)', 'Contact Detractor(Q27)',
    'Contact Detractor(Q44)', 'Improvement areas(Q7)', 'Improvement areas (Q40)', 'D2 More value for money(Q45)', 'D2 Sport content(Q8)',
    'D2 Series content(Q9)', 'D2 Film content(Q10)', 'D2 Children content(Q11)', 'D2 Easy to start and use(Q12)',
    'D2 Technical and quality(Q13)',
    'D2 Platforms(Q14)', 'D2 Service and support(Q15)', 'D3 Sport content(Q16)', 'Missing Sport Content (Q41)',
    'D3 Series and films content(Q17)',
    'NPS 9-10', 'Recommendation drivers(Q28)', 'R2 Sport content(Q29)', 'R2 Series content(Q30)', 'R2 Film content(Q31)',
    'R2 Children content(Q32)', 'R2 Easy to start and use(Q33)', 'R2 Technical and quality(Q34)', 'R2 Platforms(Q35)',
    'R2 Service and support(Q36)',
    'Promoter open text(Q37)'

]

        with open(file_path, 'r') as infile:
            print file_path
            reader = csv.reader(infile, delimiter=";")
            first_row = next(reader)
            for row in reader:
                output_row = []
                for column_name in new_columns:
                    ind = first_row.index(column_name)
                    data = row[ind]
                    if ind == first_row.index('Email'):
                        data = ''
                    output_row.append(data)
                writer.writerow(output_row)

之前的文件格式 在此处输入图像描述

之后的文件格式 在此处输入图像描述

标签: pythonpython-2.7

解决方案


因此,您正在重新排序列并清除电子邮件列:

    with open(file_path, 'r') as infile:
        print file_path
        reader = csv.reader(infile, delimiter=";")
        first_row = next(reader)
        for row in reader:
            output_row = []
            for column_name in new_columns:
                ind = first_row.index(column_name)
                data = row[ind]
                if ind == first_row.index('Email'):
                    data = ''
                output_row.append(data)
            writer.writerow(output_row)

我建议将搜索移出每行处理first_row.index(column_name)first_row.index('Email')

    with open(file_path, 'r') as infile:
        print file_path
        reader = csv.reader(infile, delimiter=";")
        first_row = next(reader)

        email = first_row.index('Email')       
        indexes = []
        for column_name in new_columns:
            ind = first_row.index(column_name)
            indexes.append(ind)

        for row in reader:
            output_row = []
            for ind in indexes:
                data = row[ind]
                if ind == email:
                    data = ''
                output_row.append(data)
            writer.writerow(output_row)

email是输入中电子邮件列的索引。indexes是输入中列的索引列表,按new_columns.

未经测试。


推荐阅读