首页 > 解决方案 > 为什么我的集合仍然有重复值?

问题描述

所以我使用了一个类函数来创建一个数据框来做一些数据清理,这是我的代码:

class Item():
    __name = ""
    __cost = 0
    __gender = ""
    __prime = ""

    def has_all_properties(self):
        return bool(self.__name and not math.isnan(self.__cost) and self.__gender and self.__prime)
    def clean(self):
        return bool(self.__name and self.__cost <=20 and self.__gender == "male" and self.__prime == "yes")
    
    def __init__(self, name, cost, gender, prime):
        self.__name = name
        self.__cost = cost
        self.__gender = gender
        self.__prime = prime
    def __eq__(self, other):
        self.__name == other.__name
        self.__cost == other.__cost
        self.__gender == other.__gender
        self.__prime == other.__prime
   
    def __hash__(self):
        return hash((self.__name, self.__cost, self.__gender, self.__prime))

    def __repr__(self):
        return f"Item({self.__name},{self.__cost},{self.__gender},{self.__prime})"

    def __tuple__(self): 
        return self.__name, self.__cost, self.__gender, self.__prime

mylist = {Item(*k) for k in array}
print(mylist)
filtered = filter(Item.has_all_properties, mylist)
clean = filter(Item.clean, filtered)
result = list(clean)
#print(result)

数组看起来像这样:

array = [['comic', 20.0, 'male', 'yes'], ['paint', 14.0, 'male', 'no'], ['pen', 5.0, 'female', 'yes'], ['phone case', 9.0, 'female', 'no'], ['headphone', 40.0, 'male', 'yes'], [None, 17.0, 'male', 'yes'], ['pencil ', 40.0, 'female', 'yes'], ['coat', nan, 'male', 'yes'], ['underwear', 15.0, 'male', 'yes'], ['shorts', 17.0, 'female', 'no'], ['goggles', 25.0, 'male', 'no'], ['comic', 20.0, 'male', 'yes'], ['watch', 55.0, 'male', 'yes'], ['notebook', 10.0, 'female', 'no'], ['mug', 58.0, 'male', 'no'], ['UNO', 15.0, None, None]... and so on]

所以,在数组中,有重复的元素,比如['comic', 20.0, 'male', 'yes']所以我希望使用一个集合来删除多余的元素并只保留一个。但是,当我使用set并添加 a时__eq__ and __hash__,结果mylist仍然具有重复值。

我应该如何修复python代码,提前谢谢你。

标签: pythondataframehashdata-cleaning

解决方案


__eq__方法需要返回所有比较的结果。

    def __eq__(self, other):
        return self.__name == other.__name and \
            self.__cost == other.__cost and \
            self.__gender == other.__gender and \
            self.__prime == other.__prime

推荐阅读