首页 > 解决方案 > 如何过滤过滤的数据?

问题描述

例如,有些客户有发票编号。一些客户有多个发票号码。

我已经通过执行以下操作过滤掉了唯一数量的客户:

m = list(set(map(lambda x: x.Name + data)))
print("There are", len(m), "customers")

我怎么说有多少女性和男性?如果客户重复多次,则性别只应计算一次。

csv中的样本数据
列如下:state, first name, last name, gender, age, invoiceNo.

state   firstname   lastname    gender  age invoiceNo
TX  Jane    DOE Female  52  36524
TX  Jane    DOE Female  52  65142
NY  John    Williams    Male    68  24536

我如何找到平均年龄?

 m = customer(row[0], row[1] + " " + row[2], row[3], int(row[4]), int(row[5]))
    data.append(m)

m = list(set(map(lambda x: x.Name + data))

标签: pythoncsvfilter

解决方案


这就是我将如何做你问的各种事情。请注意,我已添加回您从程序的早期版本中删除的类定义,并通过添加 aunique_id()__repr__()方法对其进行了修改。

import csv
from pprint import pprint

class Customer:
    def __init__(self, state, name, gender, age, invoice):
        self.State = state
        self.Name = name
        self.Gender = gender
        self.Age = age
        self.Invoice = invoice

    def unique_id(self):
        """ Return identifer unique to this customer. """
        return (self.State, self.Name, self.Gender, self.Age)

    def __repr__(self):
        classname = type(self).__name__
        return (f'{classname}(State={self.State!r}, Name={self.Name!r}, '
                f'Gender={self.Gender!r}, Age={self.Age!r}, invoice={self.Invoice!r})')


filename = 'salesinfo.csv'
data = []

with open(filename, 'r', newline='') as file:
    reader = csv.reader(file, delimiter='\t')
    next(reader)  # Skip header.
    for row in reader:
        if not row:
            continue
        customer = Customer(row[0], row[1]+" "+row[2], row[3], int(row[4]), int(row[5]))
        data.append(customer)

#pprint(data); print()  # Show what was read.

# Determine number of unique customers (by calling class unique_id() method).
m = list(set(map(lambda c: getattr(c, 'unique_id')(), data)))
print("There are", len(m), "customers")

# Determine how many of each gender there are *and* the overall average age.
seen = set()  # To avoid counting a customer more than once.
genders = dict()
average_age = 0
for customer in data:
    unique_id = customer.unique_id()
    if unique_id not in seen:
        genders[customer.Gender] = genders.setdefault(customer.Gender, 0) + 1
        average_age += customer.Age
        seen.add(unique_id)

average_age = average_age / len(m)

pprint(genders)  # Total number of each gender.
print(f"Average customer's age: {average_age:.1f}")

推荐阅读