首页 > 解决方案 > 类型错误:不可散列类型:“列表”

问题描述

我正在尝试使用 Python 读取包含数千个电子邮件地址的 CSV 文件,然后创建所有重复项的列表。这是我到目前为止所拥有的:

import csv

input_file='combined.csv'
original_list=[]
duplicate_list=[]

def readcsv(input_file):
    ifile = open(combined, "rU")
    reader = csv.reader(ifile, delimiter=";")

    rownum = 0
    for row in reader:
        original_list.append (row)
        rownum += 1

    ifile.close()
    original_list.sort()
    return original_list

(readcsv(input_file))

seen_set = set()
duplicate_set = set(x for x in original_list if x in seen_set or seen_set.add(x))
unique_set = seen_set - duplicate_set

print (duplicate_set)
print (unique_set)

标签: pythoncsvarrayslist

解决方案


而不是(由于评论中解释的原因,即使没有TypeError仍然是糟糕的python):

seen_set = set()
duplicate_set = set(x for x in original_list if x in seen_set or seen_set.add(x))
unique_set = seen_set - duplicate_set

从字面上看,你需要的只是

# first just use set to grab all the possible elements (make lists hashable by
# passing through tuple) -- this is a set comprehension 
seen_set = {tuple(x) for x in original_list}

# the duplicates are just ones with counts > 1
duplicate_set = {t for t in seen_set if original_list.count(list(t)) > 1}

unique_set = seen_set - duplicate_set

您的功能也可以简单地写成

def readcsv(input_file):
    ifile = open(combined, "rU")
    reader = csv.reader(ifile, delimiter=";")
    return sorted(reader)  # don't mutate global variables!

original_list = readcsv(input_file)

推荐阅读