首页 > 解决方案 > 如何从包含表格数据的熊猫计算相关矩阵

问题描述

这是我的输入文件:

inputfile_pd=pd.DataFrame([['2018-02-02',10, 2], ['2018-02-02',1, 3], ['2018-02-02',3, 4], ['2018-02-03',3, 2], ['2018-02-03',2, 3], ['2018-02-03',4, 4],  ['2018-02-04',4, 3],['2018-02-04',1, 4]], columns=['DateOfSale','Sales','Client_id'])

因此它看起来像:

   DateOfSale  Sales  Client_id
0  2018-02-02     10          2
1  2018-02-02      1          3
2  2018-02-02      3          4
3  2018-02-03      3          2
4  2018-02-03      2          3
5  2018-02-03      4          4
6  2018-02-04      4          3
7  2018-02-04      1          4

计算此表中具有各种 id 的客户的销售相关矩阵的最简单方法是什么?

我正在寻找的答案可能看起来像这样

           Client2_sales Client3_sales Client4_sales
Client2_sales   some val     some val      some val  
Client3_sales   some val     some val      some val  
Client4_sales   some val     some val      some val  

标签: pythonpandasmatrixcorrelation

解决方案


像这样的东西?

inputfile_pd.pivot('DateOfSale','Client_id').corr()

                Sales                    
Client_id           2         3         4
      Client_id                          
Sales 2           1.0 -1.000000 -1.000000
      3          -1.0  1.000000 -0.785714
      4          -1.0 -0.785714  1.000000

推荐阅读