首页 > 解决方案 > 创建一个数据框,包括按总和和总和分组

问题描述

我有以下数据框:

    Race Course                 Horse  Year  Month  Day  Amount Won/Lost
0       Aintree               Red Rum  2017      5   12   11.58      won
1   Punchestown               Camelot  2016     12   22  122.52      won
2       Sandown        Beef of Salmon  2016     11   17   20.00     lost
3           Ayr              Corbiere  2016     11    3   25.00     lost
4    Fairyhouse               Red Rum  2016     12    2   65.75      won
5           Ayr               Camelot  2017      3   11   12.05      won
6       Aintree         Hurricane Fly  2017      5   12   11.58      won
7   Punchestown        Beef or Salmon  2016     12   22  112.52      won
8       Sandown              Aldaniti  2016     11   17   10.00     lost
9           Ayr   Henry the Navigator  2016     11    1   15.00     lost
10   Fairyhouse               Jumanji  2016     10    2   65.75      won
11          Ayr           Came Second  2017      3   11   12.05      won
12      Aintree                Murder  2017      5   12    5.00     lost
13  Punchestown           King Arthur  2016      6   22   52.52      won
14      Sandown         Filet of Fish  2016     11   17   20.00     lost
15          Ayr                Denial  2016     11    3   25.00     lost
16   Fairyhouse          Don't Gamble  2016     12   12  165.75      won
17          Ayr               Ireland  2017      1   11   22.05      won

我正在尝试创建另一个数据框,其中仅包含所有比赛(行)的总和和所有赢得比赛的总和。理想情况下,它将如下所示:

total races     18
total won       11

然而,我所能做的就是按计数分组,计算总赢和总输。这是我尝试过的:

df = df.groupby(['Won/Lost']).size().add_prefix('total')

这就是它返回的内容:

Won/Lost
total lost     7
total won     11
dtype: int64

我处于死胡同,无法找到一个简单的解决方案。

标签: pythonpandasdataframe

解决方案


假设内容races.csv为:

Race Course,Horse,Year,Month,Day,Amount,Won/Lost
Aintree,Red Rum,2017,5,12,11.58,won
Punchestown,Camelot,2016,12,22,122.52,won
Sandown,Beef of Salmon,2016,11,17,20.00,lost
Ayr,Corbiere,2016,11,3,25.00,lost
Fairyhouse,Red Rum,2016,12,2,65.75,won
Ayr,Camelot,2017,3,11,12.05,won
Aintree,Hurricane Fly,2017,5,12,11.58,won
Punchestown,Beef or Salmon,2016,12,22,112.52,won
Sandown,Aldaniti,2016,11,17,10.00,lost
Ayr,Henry the Navigator,2016,11,1,15.00,lost
Fairyhouse,Jumanji,2016,10,2,65.75,won
Ayr,Came Second,2017,3,11,12.05,won
Aintree,Murder,2017,5,12,5.00,lost
Punchestown,King Arthur,2016,6,22,52.52,won
Sandown,Filet of Fish,2016,11,17,20.00,lost
Ayr,Denial,2016,11,3,25.00,lost
Fairyhouse,Don't Gamble,2016,12,12,165.75,won
Ayr,Ireland,2017,1,11,22.05,won

获取新数据框的步骤:

>>> races_df = pd.read_csv('races.csv')
>>> races_df
    Race Course                Horse  Year  Month  Day  Amount Won/Lost
0       Aintree              Red Rum  2017      5   12   11.58      won
1   Punchestown              Camelot  2016     12   22  122.52      won
2       Sandown       Beef of Salmon  2016     11   17   20.00     lost
3           Ayr             Corbiere  2016     11    3   25.00     lost
4    Fairyhouse              Red Rum  2016     12    2   65.75      won
5           Ayr              Camelot  2017      3   11   12.05      won
6       Aintree        Hurricane Fly  2017      5   12   11.58      won
7   Punchestown       Beef or Salmon  2016     12   22  112.52      won
8       Sandown             Aldaniti  2016     11   17   10.00     lost
9           Ayr  Henry the Navigator  2016     11    1   15.00     lost
10   Fairyhouse              Jumanji  2016     10    2   65.75      won
11          Ayr          Came Second  2017      3   11   12.05      won
12      Aintree               Murder  2017      5   12    5.00     lost
13  Punchestown          King Arthur  2016      6   22   52.52      won
14      Sandown        Filet of Fish  2016     11   17   20.00     lost
15          Ayr               Denial  2016     11    3   25.00     lost
16   Fairyhouse         Don't Gamble  2016     12   12  165.75      won
17          Ayr              Ireland  2017      1   11   22.05      won
>>>
>>> total_races = len(races_df)
>>>
>>> total_win = races_df[races_df['Won/Lost'] == 'won']['Won/Lost'].count()
>>>
>>> new_df = pd.DataFrame({'total_races': total_races, 'total_win': total_win}, index=pd.RangeIndex(1))
>>>
>>> new_df
   total_races  total_win
0           18         11

推荐阅读