python - 来自 CSV 的 Python 2 替换特定列
问题描述
我有一些 CSV 文件,格式为 ID、时间戳、客户 ID、电子邮件等。我想将电子邮件列填充为空,其他列保持不变。我正在使用 Python 2.7,并且仅限于使用 Pandas。谁能帮我?谢谢大家的帮助
我的代码在下面,但这不是那么高效和可靠,如果某些原始字符具有奇怪的字符,它将破坏逻辑。
new_columns = [
'\xef\xbb\xbfID', 'timestamp', 'CustomerID', 'Email', 'CountryCode', 'LifeCycle', 'Package', 'Paystatus', 'NoUsageEver', 'NoUsage', 'VeryLowUsage',
'LowUsage', 'NormalUsage', 'HighUsage', 'VeryHighUsage', 'LastStartDate', 'NPS 0-8', 'NPS Score (Q2)', 'Gender(Q38)', 'DOB(Q39)',
'Viaplay users(Q3)', 'Primary Content (Q42)', 'Primary platform(Q4)', 'Detractor (strong) (Q5)', 'Detractor open text(Q22)',
'Contact Detractor (Q21)', 'Contact Detractor (Q20)', 'Contact Detractor (Q43)', 'Contact Detractor(Q26)', 'Contact Detractor(Q27)',
'Contact Detractor(Q44)', 'Improvement areas(Q7)', 'Improvement areas (Q40)', 'D2 More value for money(Q45)', 'D2 Sport content(Q8)',
'D2 Series content(Q9)', 'D2 Film content(Q10)', 'D2 Children content(Q11)', 'D2 Easy to start and use(Q12)',
'D2 Technical and quality(Q13)',
'D2 Platforms(Q14)', 'D2 Service and support(Q15)', 'D3 Sport content(Q16)', 'Missing Sport Content (Q41)',
'D3 Series and films content(Q17)',
'NPS 9-10', 'Recommendation drivers(Q28)', 'R2 Sport content(Q29)', 'R2 Series content(Q30)', 'R2 Film content(Q31)',
'R2 Children content(Q32)', 'R2 Easy to start and use(Q33)', 'R2 Technical and quality(Q34)', 'R2 Platforms(Q35)',
'R2 Service and support(Q36)',
'Promoter open text(Q37)'
]
with open(file_path, 'r') as infile:
print file_path
reader = csv.reader(infile, delimiter=";")
first_row = next(reader)
for row in reader:
output_row = []
for column_name in new_columns:
ind = first_row.index(column_name)
data = row[ind]
if ind == first_row.index('Email'):
data = ''
output_row.append(data)
writer.writerow(output_row)
解决方案
因此,您正在重新排序列并清除电子邮件列:
with open(file_path, 'r') as infile:
print file_path
reader = csv.reader(infile, delimiter=";")
first_row = next(reader)
for row in reader:
output_row = []
for column_name in new_columns:
ind = first_row.index(column_name)
data = row[ind]
if ind == first_row.index('Email'):
data = ''
output_row.append(data)
writer.writerow(output_row)
我建议将搜索移出每行处理first_row.index(column_name)
。first_row.index('Email')
with open(file_path, 'r') as infile:
print file_path
reader = csv.reader(infile, delimiter=";")
first_row = next(reader)
email = first_row.index('Email')
indexes = []
for column_name in new_columns:
ind = first_row.index(column_name)
indexes.append(ind)
for row in reader:
output_row = []
for ind in indexes:
data = row[ind]
if ind == email:
data = ''
output_row.append(data)
writer.writerow(output_row)
email
是输入中电子邮件列的索引。indexes
是输入中列的索引列表,按new_columns
.
未经测试。
推荐阅读
- reactjs - 将为 jsx 编写的“enquireScreen”js 转换为打字稿?
- php - Laravel 气闸与 Laravel 护照
- html - 我的网站在 chrome 中表现怪异,但在 Firefox/Internet Explorer 上运行良好
- algorithm - 这个函数的时间复杂度是多少?
- html - 如何在不使用“text-align:center”的情况下将文本放在 div 的中心?
- java - JTable 显示值错误/不显示整个值
- php - PHPUnit 模拟函数所以它不被调用
- python - 在 HDF 中附加多索引数据帧
- python - 如何在python中选择和排序列的元素?
- java - Java Processbuilder 输出流