首页 > 解决方案 > Pandas: Getting multiple columns based on condition

问题描述

I have a data-frame df like this:

Date           Student_id    Subject     Subject_Scores
11/30/2020     1000101       Math           70
11/25/2020     1000101       Physics        75
12/02/2020     1000101       Biology        60
11/25/2020     1000101       Chemistry      49
11/25/2020     1000101       English        80
12/02/2020     1000101       Biology        60
11/25/2020     1000101       Chemistry      49
11/25/2020     1000101       English        80
12/02/2020     1000101       Sociology      50
11/25/2020     1000102       Physics        80
11/25/2020     1000102       Math           90
12/15/2020     1000102       Chemistry      63
12/15/2020     1000103       English        71

case:1

If I use df[df['Student_id]=='1000102']['Date'], this gives unique dates for that particular Student_id. How can I get the same for multiple columns with single condition.

I want to get multiple columns based on condition, how can I get output df something like this for Student_id = 1000102:

Date            Subject     
11/25/2020      Physics        
11/25/2020      Math           
12/15/2020      Chemistry      

I have tried this, but getting error:

df[df['Student_id']=='1000102']['Date', 'Subject'] And

df[df['Student_id']=='1000102']['Date']['Subject']

case:2

How can I use df.unique() in the above scenario(for multiple columns)

df[df['Student_id']=='1000102']['Date', 'Subject'].unique() #this gives error

How could this be possibly achieved.

标签: pythonpython-3.xpandasdataframe

解决方案


You can pass list to DataFrame.loc:

df1 = df.loc[df['Student_id']=='1000102', ['Date', 'Subject']]
print (df1)
          Date    Subject
9   11/25/2020    Physics
10  11/25/2020       Math
11  12/15/2020  Chemistry

If need unique values add DataFrame.drop_duplicates:

df2 = df.loc[df['Student_id']=='1000102', ['Date', 'Subject']].drop_duplicates()
print (df2)
          Date    Subject
9   11/25/2020    Physics
10  11/25/2020       Math
11  12/15/2020  Chemistry

If need Series.unique for each column separately:

df3 = df.loc[df['Student_id']=='1000102', ['Date', 'Subject']].apply(lambda x: x.unique())
print (df3)
Date         [11/25/2020, 12/15/2020]
Subject    [Physics, Math, Chemistry]
dtype: object

推荐阅读