首页 > 解决方案 > Creating summary table on groupby dataframe based on condition

问题描述

I have a pandas dataframe df that looks like

userid  trip_id segmentid   actual  prediction
  1       13       40          3       3
  1       6        2           1       1
  1       44       3           2       3
  2       70       19          1       1
  2       12       5           0       0

I need to create a summary dataframe dfsummary grouped on column userid, having three columns userid, correct_classified, incorrect_classified. If actual and prediction values are same then it is correct classified, otherwise incorrect classified.

I can count the correct_classfied on whole dataframe as

correct_classified = submission[(submission['Actual'] == submission['prediction'])]
incorrect_classified = submission[(submission['Actual'] != submission['prediction'])]

but don’t getting an idea to create summary table grouped on user id, that should look like this

userid  correct_classified  incorrect_classified
  1             2                    1
  2             2                    0

标签: pythonpandasdataframecounter

解决方案


You can use pd.crosstab after creating a conditional array:

flags = np.where(df['actual'].eq(df['prediction']), 'correct', 'incorrect')

res = pd.crosstab(df['userid'], flags)

print(res)

col_0   correct  incorrect
userid                    
1             2          1
2             2          0

推荐阅读