python - 如何在python代码中查找文件中重复行的总数
问题描述
如何查找文件中重复行的总数以及如何编写python代码
import csv
csv_data = csv.reader(file('T:\DataDump\Book1.csv'))
next(csv_data)
already_seen = set()
for row in csv_data:
Address = row[6]
if Address in already_seen:
print('{} is a duplicate Address'.format(Address))
else:
print('{} is a unique Address'.format(Address))
already_seen.add(Address)
解决方案
尝试使用 pandas 而不是 csv 模块
import pandas as pd
csv_data = pd.read_csv('T:/DataDump/Book1.csv')
shape_original = csv_data.shape
print(f"Number of rows: {shape_original[0]}")
#Below how to drop duplicates
csv_data_no_duplicates = csv_data.drop_duplicates(keep="first")
shape_new = csv_data_no_duplicates.shape
print(f"Number of rows: {shape_new[0]}")
number_duplicates = shape_original[0] - shape_new[0]
我做了这个例子来尝试它是否有效:
thisdict = {
"brand": ["Ford","Renault","Ford"],
"model": ["Mustang","Laguna","Mustang"],
"year": ["1964","1978","1964"]
}
data = pd.DataFrame.from_dict(thisdict)
data_no_duplicates = data.drop_duplicates(keep="first")
print(data_no_duplicates.head())
推荐阅读
- javascript - 正则表达式难题
- javascript - 从第一个列表中删除重复项
- reactjs - React Native:如何在模拟器上复制错误信息?
- swift - 如何快速加载本地主机
- android - 自定义 Android 应用
- python - 有没有办法使用python烧瓶和MongoEngine将字典列表插入mongodb而不使用for循环?
- vue.js - NuxtJS i18n [vue-router] 名为 about_us___en 的路由不存在
- pyomo - 为什么 ipopt 没有正确优化 - 非线性站点分配?
- laravel - 我有一个来自另一台设备的项目,当我尝试在我的设备中安装时出现错误
- admob - 有价值的库存框架网站