首页 > 解决方案 > Understanding Python Pandas dataframes

问题描述

I am learning Pandas and have been facing difficulty understanding the pivot tables. Below is the sample program that I am running.

import pandas as pd

df = pd.read_csv('/Users/xxx/Desktop/df.csv')

print(df)

df = df.pivot_table(index='__timestamp', columns=[], values=['passed_count', 'failed_count'])

print(df)

And the programs prints below outputs-

   __timestamp failed_count  passed_count Unnamed: 3
0     27/05/18    0.019417       0.980583           
1     03/06/18    0.427136       0.839196           
2     10/06/18    0.839416       0.854015           
3     17/06/18    0.403846       0.913462           
4     24/06/18    1.429688       0.757812           
5     01/07/18    6.781457       0.701987           
6     08/07/18    0.324561       0.929825           
7     15/07/18    0.295082       0.970492           
8     22/07/18    0.849802       0.960474           
9     29/07/18    0.673333       0.923333           
10    05/08/18    0.276657       0.919308           
11    12/08/18    0.242105       0.821053           
12    19/08/18    0.176471       0.976471
       
             passed_count
__timestamp              
01/07/18         0.701987
03/06/18         0.839196
05/08/18         0.919308
08/07/18         0.929825
10/06/18         0.854015
12/08/18         0.821053
15/07/18         0.970492
17/06/18         0.913462
19/08/18         0.976471
22/07/18         0.960474
24/06/18         0.757812
27/05/18         0.980583
29/07/18         0.923333

I am not able to understand the absence of third column after doing the pivot_table(). Is it OK to give multiple values like I did above? What is the significance of the value option that is provided?

Edit:

As asked in the comments-

CSV file contents are-

__timestamp,failed_count,passed_count,
27/05/18,0.019417 ,0.980583, 
03/06/18,0.427136 ,0.839196, 
10/06/18,0.839416 ,0.854015, 
17/06/18,0.403846 ,0.913462, 
24/06/18,1.429688 ,0.757812, 
01/07/18,6.781457 ,0.701987, 
08/07/18,0.324561 ,0.929825, 
15/07/18,0.295082 ,0.970492, 
22/07/18,0.849802 ,0.960474, 
29/07/18,0.673333 ,0.923333, 
05/08/18,0.276657 ,0.919308, 
12/08/18,0.242105 ,0.821053, 
19/08/18,0.176471 ,0.976471,

Output of df.head(), immediately after reading the CSV is

      __timestamp failed_count  passed_count Unnamed: 3
0    27/05/18    0.019417       0.980583           
1    03/06/18    0.427136       0.839196           
2    10/06/18    0.839416       0.854015           
3    17/06/18    0.403846       0.913462           
4    24/06/18    1.429688       0.757812 

标签: pythonpandasdataframepivot-table

解决方案


正如我们在评论中发现的那样,pandas 的pivot_table函数会默默地忽略值列表中的任何非数字(在这种情况下str)列。并且该failed_count专栏被如此解释。


推荐阅读