python - 如何在 csv 文件的列中填充空值?
问题描述
我需要读取一个 csv 文件并根据此人的地址填写“电话和电子邮件”列中的空/空值,然后写入一个新的 csv 文件。例如:如果一个人“Jonas Kahnwald”没有电话号码或电子邮件地址,但与上面或下面的人有相同的地址,说“Hannah Kahnwald”,那么我们应该用那个人的填充空/空值细节。
我将无法使用 python pandas,因为其余的代码/程序完全基于 python 2.7(不幸的是),所以我只需要编写一个函数或逻辑来单独捕获这些信息。
输入格式/表格如下所示,带有空单元格(csv 文件):
FirstName,LastName,Phone,Email,Address
Hannah,Kahnwald,1457871452,hannkahn@gmail.com,145han street
Micheal,Kahnwald,6231897383,,145han street
Jonas,Kahnwald,,,145han street
Mikkel,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
Magnus,Nielsen,,magnusneil@kyle.co,887neil ave
Ulrich,Nielsen,,,887neil ave
katharina,Nielsen,,,887neil ave
Elisabeth,Doppler,5439001211,elsisop@amaz.com,211elis park
Peter,Doppler,,,211elis park
bartosz,Tiedmannn,6263172828,tiedman@skype.com,828alex street
Alexander,washington,,,321notsame street
claudia,Tiedamann,,,828alex street
输出格式应如下所示:
Hannah,Kahnwald,1457871452,hannkahn@gmail.com,145han street
Micheal,Kahnwald,6231897383,hannkahn@gmail.com,145han street
Jonas,Kahnwald,1457871452,hannkahn@gmail.com,145han street
Mikkel,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
Magnus,Nielsen,4509213887,magnusneil@kyle.co,887neil ave
Ulrich,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
katharina,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
Elisabeth,Doppler,5439001211,elsisop@amaz.com,211elis park
Peter,Doppler,5439001212,elsisop@amaz.com,211elis park
bartosz,Tiedmannn,6263172828,tiedman@skype.com,828alex street
Alexander,washington,,,321notsame street
claudia,Tiedamann,6263172828,tiedman@skype.com,828alex street
import csv,os
def get_info(file path):
data = []
with open(file, 'rb') as fin:
csv_reader = csv.reader(fin)
next(reader)
for each in csv_reader:
FirstName = each[0]
LN = each[1]
Phone = "some function or logic"
email = " some function or logic"
Address = each[4]
login = ""
logout = ""
data.append([FirstName,LN,Phone,email,Address,login,logout])
f.close()
return data
解决方案
这是一个显着更新的版本,它尝试从文件中的其他条目中填充缺失的数据,但前提是它们具有相同的Address
字段。为了使搜索更快,它构建了一个内部使用的字典,称为attr_dict
其中包含具有特定地址的所有记录。它还namedtuple
在内部使用 s 来使代码更具可读性。
请注意,在检索丢失的信息时,它将使用它找到的存储在此内部字典中的第一个Address
条目中的数据。另外,我认为您提供的样本数据并不包含所有可能的情况,因此需要进行额外的测试。
import csv
from collections import namedtuple
def get_info(file_path):
# Read data from file and convert to list of namedtuples, also create address
# dictionary to use to fill in missing information from others at same address.
with open(file_path, 'rb') as fin:
csv_reader = csv.reader(fin, skipinitialspace=True)
header = next(csv_reader)
Record = namedtuple('Record', header)
newheader = header + ['Login', 'Logout'] # Add names of new columns.
NewRecord = namedtuple('NewRecord', newheader)
addr_dict = {}
data = [newheader]
for rec in (Record._make(row) for row in csv_reader):
if rec.Email or rec.Phone: # Worth saving?
addr_dict.setdefault(rec.Address, []).append(rec) # Remember it.
login, logout = "", "" # Values for new columns.
data.append(NewRecord._make(rec + (login, logout)))
# Try to fill in missing data from any other records with same Address.
for i, row in enumerate(data[1:], 1):
if not (row.Phone and row.Email): # Info missing?
# Try to copy it from others at same address.
updated = False
for other in addr_dict.get(row.Address, []):
if not row.Phone and other.Phone:
row = row._replace(Phone=other.Phone)
updated = True
if not row.Email and other.Email:
row = row._replace(Email=other.Email)
updated = True
if row.Phone and row.Email: # Info now filled in?
break
if updated:
data[i] = row
return data
INPUT_FILE = 'null_cols.csv'
OUTPUT_FILE = 'fill_cols.csv'
data = get_info(INPUT_FILE)
with open(OUTPUT_FILE, 'wb') as fout:
writer = csv.DictWriter(fout, data[0]) # First elem has column names.
writer.writeheader()
for row in data[1:]:
writer.writerow(row._asdict())
print('Done')
Excel 中的结果截图:
推荐阅读
- django - Django 获取多态值作为列表
- electron - sqlite3_1.default.database 不是构造函数
- amazon-web-services - AWS Route 53 API 搜索功能
- python - 散景图像的动态更新
- html - 刷新站点和滚动时隐藏类
- javascript - 承诺拒绝在 WebshareTarget 处理程序中获取 formData
- ruby-on-rails - Ruby 命名参数类似于 javascript
- html - 标题和段落文本在同一行
- html - 为什么 Media-Query 不能与 media-feature max-width 一起使用?
- php - 将 Bref Lambda 自定义运行时与 AWS CDK 一起使用时出现错误 libncurses.so.6