首页 > 解决方案 > 如果我们有很多类型要比较,我们如何简化列表理解?

问题描述

我正在使用 CSV 文件根据他们的种族/民族计算学生的平均分数。这是我读取文件的代码:

# read file
with open("StudentsPerformance.csv") as file:
    data = file.read().split("\n")

header = data[0]
students = data[1:]

# remove last student (empty student)
students.pop()

# get total number of students
total_student = len(students)

# split header
header = header.split(",")
subjects = header[5:]

# split each student in list
for i in range(len(students)):
    students[i] = students[i].split(",")

我的列数据文件读完后是这样的(比赛是第二列,如果是最后三列则评分):

['male', 'group D', 'high school', 'free/reduced', 'none', '74', '70', '69']
['male', 'group E', 'some high school', 'standard', 'completed', '74', '64', '60']
['male', 'group E', "associate's degree", 'free/reduced', 'none', '64', '56', '52']
['female', 'group D', 'high school', 'free/reduced', 'completed', '65', '61', '71']
['male', 'group E', "associate's degree", 'free/reduced', 'completed', '46', '43', '44']
['female', 'group C', 'some high school', 'free/reduced', 'none', '48', '56', '51']
['male', 'group C', 'some college', 'free/reduced', 'completed', '67', '74', '70']
['male', 'group D', 'some college', 'free/reduced', 'none', '62', '57', '62']
['male', 'group D', "associate's degree", 'free/reduced', 'completed', '61', '71', '73']
['male', 'group C', "bachelor's degree", 'free/reduced', 'completed', '70', '75', '74']
['male', 'group C', "associate's degree", 'standard', 'completed', '98', '87', '90']
['male', 'group D', 'some college', 'free/reduced', 'none', '70', '63', '58']
['male', 'group A', "associate's degree", 'standard', 'none', '67', '57', '53']
...

然后,我找到了比赛类型:

race = []
race_type = []
race_number = []
 
race_type = list(dict.fromkeys(race))
race_type.sort()

print(race_type)

for i in range(len(race_type)):
    race_number.append(race.count(race_type[i]))

print(race_number)

我有结果:

['group A', 'group B', 'group C', 'group D', 'group E']
[89, 190, 319, 262, 140]

所以,我有 5 种种族:A 组:89,B 组:190,C 组:319,D 组:262,E 组:140。

然后,我使用一些代码来计算基于种族的平均分数:

def compute_average_score(student_group):
    return sum([score for student in student_group for score in list(map(int, student[-3:]))]) / (len(student_group) * 3)

a_group = [x for x in students if x[1] == race_type[0]]
b_group = [x for x in students if x[1] == race_type[1]]
c_group = [x for x in students if x[1] == race_type[2]]
d_group = [x for x in students if x[1] == race_type[3]]
e_group = [x for x in students if x[1] == race_type[4]]


print(round(compute_average_score(a_group), 2))
print(round(compute_average_score(b_group), 2))
print(round(compute_average_score(c_group), 2))
print(round(compute_average_score(d_group), 2))
print(round(compute_average_score(e_group), 2))

然后,我得到正确的结果:

62.99
65.47
67.13
69.18
72.75

但是当我使用一些代码来计算平均分时,我需要输入每个组的数字:0、1、2、3、4、5。

然后,我发现代码很不方便,如果我有大约十到二十个不同的种族类型,那么可能会花费很多时间并且重复很多代码。就像我需要向每个组输入数字:0、1、2、...、20。

那么,我怎样才能减少这个计算的计算呢?我考虑过使用 for 循环,但它似乎不太可行。

谢谢您的帮助。

标签: pythonlistfilefor-loop

解决方案


使用字典分组习语对累积分数和总计数进行分组。所以给定数据:

>>> data = [['male', 'group D', 'high school', 'free/reduced', 'none', '74', '70', '69'],
... ['male', 'group E', 'some high school', 'standard', 'completed', '74', '64', '60'],
... ['male', 'group E', "associate's degree", 'free/reduced', 'none', '64', '56', '52'],
... ['female', 'group D', 'high school', 'free/reduced', 'completed', '65', '61', '71'],
... ['male', 'group E', "associate's degree", 'free/reduced', 'completed', '46', '43', '44'],
... ['female', 'group C', 'some high school', 'free/reduced', 'none', '48', '56', '51'],
... ['male', 'group C', 'some college', 'free/reduced', 'completed', '67', '74', '70'],
... ['male', 'group D', 'some college', 'free/reduced', 'none', '62', '57', '62'],
... ['male', 'group D', "associate's degree", 'free/reduced', 'completed', '61', '71', '73'],
... ['male', 'group C', "bachelor's degree", 'free/reduced', 'completed', '70', '75', '74'],
... ['male', 'group C', "associate's degree", 'standard', 'completed', '98', '87', '90'],
... ['male', 'group D', 'some college', 'free/reduced', 'none', '70', '63', '58'],
... ['male', 'group A', "associate's degree", 'standard', 'none', '67', '57', '53']]

然后分组:

>>> grouper = {}
>>> for row in data:
...     group = row[1]
...     scores = map(int, row[-3:])
...     cum_score, total = grouper.get(group, (0,0))
...     grouper[group] = cum_score + sum(scores), total + 3
...
>>> grouper
{'group D': (987, 15), 'group E': (503, 9), 'group C': (860, 12), 'group A': (177, 3)}

然后你可以这样计算:

>>> for group, (cum_score, total) in grouper.items():
...     print(f"For {group} average score is {cum_score/total}")
...
For group D average score is 65.8
For group E average score is 55.888888888888886
For group C average score is 71.66666666666667
For group A average score is 59.0

推荐阅读