首页 > 解决方案 > 如何计算具有特定属性的元素从不同系列

问题描述

我需要从运动员事件.csv 中计算参加夏季和冬季奥运会的运动员的百分比。

我已经尝试为每个运动员分配值,但我继续陷入无限循环。数据显示如下:

Name    Sex Age Height  Weight  Team    NOC Games   Year    Season  City    Sport   Event   Medal
A Dijiang   M   24  180 80  China   CHN 1992 Summer 1992    Summer  Barcelona   Basketball  Basketball Men's Basketball NA

没有实际的错误消息,只是一个无限循环

   df= pd.read_csv(r"C:\Users\Rorro\Desktop\desafio latam\athlete_events.csv")
   pjt = df.loc[:,"Name"]
   pjt = pjt.drop_duplicates()

   temp = df.loc[:,["Name","Season"]]
   total = 0
   for i in pjt:
      for l,r in temp.iterrows():
        if i == r["Name"] and r["Season"] == "Winter":
          for n,m in temp.iterrows():
            if i == m["Name"] and m["Season"] == "Summer":
                total+=1
            else:
                pass
        elif i == r["Name"] and r["Season"] == "Summer":
          for n,m in temp.iterrows():
            if i == m["Name"] and m["Season"] == "Winter":
                total+=1
            else:
                pass
        else:
           continue

打印(总计)

标签: pythonpandasdataframe

解决方案


那这个呢?

df = pd.DataFrame({'Season': ['winter', 'summer', 'winter', 'summer'],
                   'Name'  : ['a', 'b', 'c', 'a'],
                   'Year'  : [1992, 1996, 2004, 2000]})
print(df)

# Defines the wanted seasons
selection = (df['Season'] == 'summer') | (df['Season'] == 'winter')

# Defines the wanted years    
selection = selection & (df['Year'].isin([1992, 1996, 2000]))

names = df[selection]['Name']
print(names)

unique_count = len(names.unique())
print("\nDistinc itens: {}".format(unique_count))

印刷:

   Season Name  Year  
0  winter    a  1992  
1  summer    b  1996  
2  winter    c  2004  
3  summer    a  2000  

0    a  
1    b  
3    a  
Name: Name, dtype: object  

Distinc itens: 2

推荐阅读