首页 > 解决方案 > 如果行在python中重复,则删除所有行

问题描述

我尝试删除重复的行,但出现错误代码:'Series' object has no attribute 'remove'。

我可以知道如何替换“删除”命令或修复属性错误吗?

如果该行在 allMYemail.csv 中重复,则必须删除该行。有我的代码:

import csv
import re
import json
import pandas as pd

df1 = pd.read_csv('allMYemail.csv')
df2 = pd.read_csv('MYallmatchagain.csv')

emailSet = set()
for i, row in df1.dropna().iterrows():
    emailSet.add(row['0'])
# print(emailSet)
output = []
for i,row in df2.iterrows():
    # print(row)
    Birthdate = row['Birthdate']
    Gender = row['Gender']
    Mobile2 = row['Mobile2']
    Salutation = row['Salutation']
    email = row['email']
    firstName = row['firstName']
    lastName = row['lastName']
    name = row['name']
    areaCode = row['areaCode']
    errorCode = row['errorCode']
    localNumber = row['localNumber']
    Status = row['Status']
    Domain = row['Domain']
    ReturnCode = row['ReturnCode']
    matched = False
    for emails in emailSet:
        if emails == email:
            matched = True
            break
    if matched:
        row.remove('Birthdate')
        row.remove('Gender')
        row.remove('Mobile2')
        row.remove('Salutation')
        row.remove('email')
        row.remove('firstName')
        row.remove('lastName')
        row.remove('name')
        row.remove('areaCode')
        row.remove('errorCode')
        row.remove('localNumber')
        row.remove('Status')
        row.remove('Domain')
        row.remove('ReturnCode')
    else:
        pass
    output_obj = {}
    output_obj['Birthdate'] = Birthdate 
    output_obj['Gender'] = Gender
    output_obj['Mobile2'] = Mobile2 
    output_obj['Salutation'] = Salutation 
    output_obj['email'] = email 
    output_obj['firstName'] = firstName 
    output_obj['lastName'] = lastName 
    output_obj['name'] = name
    output_obj['areaCode'] = areaCode
    output_obj['errorCode'] = errorCode 
    output_obj['localNumber'] = localNumber 
    output_obj['Status'] = Status 
    output_obj['Domain'] = Domain
    output_obj['ReturnCode'] = ReturnCode 
    output.append(output_obj)
df = pd.read_json(json.dumps(output))
# print(json.dumps(output))
df.to_csv(r'MYfinish.csv', index = None)

任何帮助将不胜感激。

标签: pythonpandasdataframecsv

解决方案


由于您的问题不清楚它想要做什么,如果您只想删除一个 df 中完全重复的行,那么 @Renaud 的解决方案将完成这项工作。如果您想根据单个列“电子邮件”中的重复项删除行,请尝试以下操作:

def firstline(d):
   return(d.reset_index(drop=True).loc[0])

result_df = df.groupby('email').apply(firstline)

推荐阅读