python - 如果我们有很多类型要比较,我们如何简化列表理解?
问题描述
我正在使用 CSV 文件根据他们的种族/民族计算学生的平均分数。这是我读取文件的代码:
# read file
with open("StudentsPerformance.csv") as file:
data = file.read().split("\n")
header = data[0]
students = data[1:]
# remove last student (empty student)
students.pop()
# get total number of students
total_student = len(students)
# split header
header = header.split(",")
subjects = header[5:]
# split each student in list
for i in range(len(students)):
students[i] = students[i].split(",")
我的列数据文件读完后是这样的(比赛是第二列,如果是最后三列则评分):
['male', 'group D', 'high school', 'free/reduced', 'none', '74', '70', '69']
['male', 'group E', 'some high school', 'standard', 'completed', '74', '64', '60']
['male', 'group E', "associate's degree", 'free/reduced', 'none', '64', '56', '52']
['female', 'group D', 'high school', 'free/reduced', 'completed', '65', '61', '71']
['male', 'group E', "associate's degree", 'free/reduced', 'completed', '46', '43', '44']
['female', 'group C', 'some high school', 'free/reduced', 'none', '48', '56', '51']
['male', 'group C', 'some college', 'free/reduced', 'completed', '67', '74', '70']
['male', 'group D', 'some college', 'free/reduced', 'none', '62', '57', '62']
['male', 'group D', "associate's degree", 'free/reduced', 'completed', '61', '71', '73']
['male', 'group C', "bachelor's degree", 'free/reduced', 'completed', '70', '75', '74']
['male', 'group C', "associate's degree", 'standard', 'completed', '98', '87', '90']
['male', 'group D', 'some college', 'free/reduced', 'none', '70', '63', '58']
['male', 'group A', "associate's degree", 'standard', 'none', '67', '57', '53']
...
然后,我找到了比赛类型:
race = []
race_type = []
race_number = []
race_type = list(dict.fromkeys(race))
race_type.sort()
print(race_type)
for i in range(len(race_type)):
race_number.append(race.count(race_type[i]))
print(race_number)
我有结果:
['group A', 'group B', 'group C', 'group D', 'group E']
[89, 190, 319, 262, 140]
所以,我有 5 种种族:A 组:89,B 组:190,C 组:319,D 组:262,E 组:140。
然后,我使用一些代码来计算基于种族的平均分数:
def compute_average_score(student_group):
return sum([score for student in student_group for score in list(map(int, student[-3:]))]) / (len(student_group) * 3)
a_group = [x for x in students if x[1] == race_type[0]]
b_group = [x for x in students if x[1] == race_type[1]]
c_group = [x for x in students if x[1] == race_type[2]]
d_group = [x for x in students if x[1] == race_type[3]]
e_group = [x for x in students if x[1] == race_type[4]]
print(round(compute_average_score(a_group), 2))
print(round(compute_average_score(b_group), 2))
print(round(compute_average_score(c_group), 2))
print(round(compute_average_score(d_group), 2))
print(round(compute_average_score(e_group), 2))
然后,我得到正确的结果:
62.99
65.47
67.13
69.18
72.75
但是当我使用一些代码来计算平均分时,我需要输入每个组的数字:0、1、2、3、4、5。
然后,我发现代码很不方便,如果我有大约十到二十个不同的种族类型,那么可能会花费很多时间并且重复很多代码。就像我需要向每个组输入数字:0、1、2、...、20。
那么,我怎样才能减少这个计算的计算呢?我考虑过使用 for 循环,但它似乎不太可行。
谢谢您的帮助。
解决方案
使用字典分组习语对累积分数和总计数进行分组。所以给定数据:
>>> data = [['male', 'group D', 'high school', 'free/reduced', 'none', '74', '70', '69'],
... ['male', 'group E', 'some high school', 'standard', 'completed', '74', '64', '60'],
... ['male', 'group E', "associate's degree", 'free/reduced', 'none', '64', '56', '52'],
... ['female', 'group D', 'high school', 'free/reduced', 'completed', '65', '61', '71'],
... ['male', 'group E', "associate's degree", 'free/reduced', 'completed', '46', '43', '44'],
... ['female', 'group C', 'some high school', 'free/reduced', 'none', '48', '56', '51'],
... ['male', 'group C', 'some college', 'free/reduced', 'completed', '67', '74', '70'],
... ['male', 'group D', 'some college', 'free/reduced', 'none', '62', '57', '62'],
... ['male', 'group D', "associate's degree", 'free/reduced', 'completed', '61', '71', '73'],
... ['male', 'group C', "bachelor's degree", 'free/reduced', 'completed', '70', '75', '74'],
... ['male', 'group C', "associate's degree", 'standard', 'completed', '98', '87', '90'],
... ['male', 'group D', 'some college', 'free/reduced', 'none', '70', '63', '58'],
... ['male', 'group A', "associate's degree", 'standard', 'none', '67', '57', '53']]
然后分组:
>>> grouper = {}
>>> for row in data:
... group = row[1]
... scores = map(int, row[-3:])
... cum_score, total = grouper.get(group, (0,0))
... grouper[group] = cum_score + sum(scores), total + 3
...
>>> grouper
{'group D': (987, 15), 'group E': (503, 9), 'group C': (860, 12), 'group A': (177, 3)}
然后你可以这样计算:
>>> for group, (cum_score, total) in grouper.items():
... print(f"For {group} average score is {cum_score/total}")
...
For group D average score is 65.8
For group E average score is 55.888888888888886
For group C average score is 71.66666666666667
For group A average score is 59.0
推荐阅读
- webgl - 有没有办法直接通过 webGL 渲染广色域颜色(lch、lab 等)?
- java - 正则表达式替换嵌入在开放花括号中并后跟等号和数字的所有子字符串
- angular - 发送前更改表单的特定值
- c++ - lld 是否有任何 C++ API?
- android - 对于版本 < 3.6,Android Studio 编辑器中的默认字体是什么?
- git - Git中有暂存区的特定文件夹吗?
- javascript - 使用 Prism 创建像 Bootstrap 这样的 HTML 文档
- mongodb - Mongodb中数组内的列总和
- python - '%' 格式操作符的困惑
- istio - 使用 Istio 进行 Java 微服务分布式跟踪