首页 > 解决方案 > 将数据框转换为矩阵

问题描述

我想将数据框转换为矩阵。我以泰坦尼克号数据集为例。数据框如下所示:

       x         y   ppscore
0  pclass    pclass  1.000000
1  pclass  survived  0.000000
2  pclass      name  0.000000
3  pclass       sex  0.000000
4  pclass       age  0.088131
5  pclass     sibsp  0.000000
6  pclass     parch  0.000000
7  pclass    ticket  0.000000
8  pclass      fare  0.188278
9  pclass     cabin  0.064250

我想让它像这样的矩阵形状:

          pclass  survived       age     sibsp     parch      fare      body
pclass    1.000000 -0.312469 -0.408106  0.060832  0.018322 -0.558629 -0.034642
survived -0.312469  1.000000 -0.055513 -0.027825  0.082660  0.244265       NaN
age      -0.408106 -0.055513  1.000000 -0.243699 -0.150917  0.178739  0.058809
sibsp     0.060832 -0.027825 -0.243699  1.000000  0.373587  0.160238 -0.099961
parch     0.018322  0.082660 -0.150917  0.373587  1.000000  0.221539  0.051099
fare     -0.558629  0.244265  0.178739  0.160238  0.221539  1.000000 -0.043110
body     -0.034642       NaN  0.058809 -0.099961  0.051099 -0.043110  1.000000

感谢您的帮助 谢谢!

标签: python-3.xpandas

解决方案


我确信有更有效的方法,但这解决了我的问题:

#this is the method I wanted to compare  to the MIC
import ppscore as pps

df = pps.matrix(titanic)

这将创建以下数据框:

       x         y   ppscore
0  pclass    pclass  1.000000
1  pclass  survived  0.000000
2  pclass      name  0.000000
3  pclass       sex  0.000000
4  pclass       age  0.088131
5  pclass     sibsp  0.000000
6  pclass     parch  0.000000
7  pclass    ticket  0.000000
8  pclass      fare  0.188278
9  pclass     cabin  0.064250

接下来这个函数完成了这项工作:

def to_matrix(df):
    #since the data is symetrical, taking the sqrt gives us the required dimensions 
    leng=int(np.sqrt(len(df['ppscore'])))

    #create the values for the matrix
    val = df['ppscore'].values.reshape((leng,leng))
    #create the columns and index for the matrix
    X, ind_x = list(np.unique(data['x'],return_index=True))
    X = X[np.argsort(ind_x)]
    Y, ind_y = list(np.unique(data['x'],return_index=True))
    Y = Y[np.argsort(ind_y)]
    matrix = pd.DataFrame(val,columns=X,index=Y)
    return matrix

结果是:

                    longitude  latitude  housing_median_age  total_rooms  \
longitude                1.00      0.78                0.13         0.00   
latitude                 0.76      1.00                0.09         0.00   
housing_median_age       0.00      0.00                1.00         0.02   
total_rooms              0.00      0.00                0.00         1.00   
total_bedrooms           0.00      0.00                0.00         0.51   
population               0.00      0.00                0.00         0.33   
households               0.00      0.00                0.00         0.52   
median_income            0.00      0.00                0.00         0.00   
median_house_value       0.00      0.00                0.00         0.00   
ocean_proximity          0.24      0.29                0.05         0.00   

                    total_bedrooms  population  households  median_income  \
longitude                     0.00        0.00        0.00           0.01   
latitude                      0.00        0.00        0.00           0.02   
housing_median_age            0.02        0.00        0.00           0.00   
total_rooms                   0.48        0.31        0.46           0.00   
total_bedrooms                1.00        0.42        0.81           0.00   
population                    0.38        1.00        0.49           0.00   
households                    0.81        0.54        1.00           0.00   
median_income                 0.00        0.00        0.00           1.00   
median_house_value            0.00        0.00        0.00           0.13   
ocean_proximity               0.00        0.00        0.00           0.01   

                    median_house_value  ocean_proximity  
longitude                         0.14             0.63  
latitude                          0.12             0.56  
housing_median_age                0.00             0.15  
total_rooms                       0.00             0.01  
total_bedrooms                    0.00             0.04  
population                        0.00             0.01  
households                        0.00             0.03  
median_income                     0.04             0.05  
median_house_value                1.00             0.25  
ocean_proximity                   0.14             1.00  
 

推荐阅读