python - 在不使用任何库的情况下将 .csv 文件提取到 2D 列表
问题描述
作为作业的一部分,我必须在不使用任何库的情况下提取 .csv 文件。前 3 个元素如下:-
"ID","Name","Sex","Age","Height","Weight","Team","NOC","Games","Year","Season","City","Sport","Event","Medal"
"1","A Dijiang","M",24,180,80,"China","CHN","1992 Summer",1992,"Summer","Barcelona","Basketball","Basketball Men's Basketball",NA
"2","A Lamusi","M",23,170,60,"China","CHN","2012 Summer",2012,"Summer","London","Judo","Judo Men's Extra-Lightweight",NA
"3","Gunnar Nielsen Aaby","M",24,NA,NA,"Denmark","DEN","1920 Summer",1920,"Summer","Antwerpen","Football","Football Men's Football",NA
我尝试按如下方式实现它:
csv_data = []
with open('olympic.csv') as csv_file:
for line in csv_file:
line = line.strip()
line = line.split(',')
temp = []
for element in line:
if element[0] == '"' or element[-1] == '"':
temp.append(element[1 : -1])
else:
temp.append(element)
csv_data.append(temp)
这给出了大致正确的答案,但问题是当名称和事件列中包含“,”字符时,例如
"," in Name column
"5965","Dionisio Augustine, II","M",24,153,65,"Federated States of Micronesia","FSM","2016 Summer",2016,"Summer","Rio de Janeiro","Swimming","Swimming Men's 50 metres Freestyle",NA
"7208","Carlos Zenon Balderas, Jr.","M",19,175,60,"United States","USA","2016 Summer",2016,"Summer","Rio de Janeiro","Boxing","Boxing Men's Lightweight",NA
"," in Event column
"2304","Michael Albasini","M",31,172,67,"Switzerland","SUI","2012 Summer",2012,"Summer","London","Cycling","Cycling Men's Road Race, Individual",NA
"250","Saeid Morad Abdevali","M",22,170,80,"Iran","IRI","2012 Summer",2012,"Summer","London","Wrestling","Wrestling Men's Welterweight, Greco-Roman",NA
在不使用标准库的情况下,有没有合适的方法来解决这个问题?
解决方案
是的...那么也许您将不得不处理转义的引号字符,然后(为什么不呢?)在列中使用换行符...
这就是为什么在现实生活中,最好的策略是使用库,而不是重新发明轮子(实际上是一个复杂的发条)。
您可以尝试使用正则表达式来捕获列值。对于引用的列,幼稚的列可能类似于 '"([^"]+)"';对于未引用的列(数字?),可能带有lookaraounds: '(?<,)(\d+)(?=,) '......然后试图把所有东西放在一起。
或者(作为一个班级作业,效率和速度可能不是强制性的)你可以编写一个状态机:一次读取一个字符,并相应地采取行动:如果它是一个 '"' 继续读取另一个 '"',否则读取直到下一个逗号,依此类推...
推荐阅读
- random-forest - 哪个指标用于不平衡分类问题?
- netlogo - 如何提取代理集中的代理数量?(网络标志)
- c# - 使用 Plesk Obsidian 在 Windows Server 2019 上的 Asp.net MVC 和 Wordpress 站点的登录身份验证失败
- flutter - 如何在 Flutter 中以编程方式从外部打开滑动小部件
- terraform-provider-aws - 如何使 AWS Api Gateway 部署依赖于使用 Terraform 的动态列表
- java - 如何使用 Spring data JPA crud 方法实现原子“如果不存在则保存”逻辑?
- javascript - 传递 PayPal Buttons 项目对象
- ios - CALayers() 在快速添加贴纸/图像时位置错误
- java - PySpark 吞下 JVM “OOM heapspace”异常
- firebase - 使用 Firebase 作为调度程序