首页 > 解决方案 > 如何循环相关排序列表?

问题描述

下面是查找相关矩阵并对其进行排序的简单代码,但是如何通过获取列对名称来循环它?

import pandas as pd
import numpy as np

d = {
    'x1': [1, 4, 4, 5, 6], 
    'x2': [0, 0, 8, 2, 4], 
    'x3': [2, 8, 8, 10, 12], 
    'x4': [-1, -4, -4, -4, -5]
}
df = pd.DataFrame(data=d)
print(df)
print('---')
print(df.corr())
print('---')

corr_matrix = df.corr().abs()
sol = (corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool)).stack().sort_values(ascending=False))
print(sol)
print('---')

for s in sol:
    print(s)
    # how to print column 1 and 2 pair names with this "s" corr?

结果:

   x1  x2  x3  x4
0   1   0   2  -1
1   4   0   8  -4
2   4   8   8  -4
3   5   2  10  -4
4   6   4  12  -5
---
          x1        x2        x3        x4
x1  1.000000  0.399298  1.000000 -0.969248
x2  0.399298  1.000000  0.399298 -0.472866
x3  1.000000  0.399298  1.000000 -0.969248
x4 -0.969248 -0.472866 -0.969248  1.000000
---
x1  x3    1.000000
x3  x4    0.969248
x1  x4    0.969248
x2  x4    0.472866
    x3    0.399298
x1  x2    0.399298
dtype: float64
---
1.0
0.9692476431690819
0.9692476431690819
0.4728662437434603
0.39929785312496247
0.39929785312496247

我期望的是:

for (column1, column2, s) in sol:
    print(column1 + ',' + column2 + ',' + str(s))

结果:

x1, x3, 1.000000
x3, x4, 0.969248
x1, x4, 0.969248
x2, x4, 0.472866
x1, x2, 0.399298

标签: pythonpandasnumpycorrelation

解决方案


您可以使用DataFrame.itertuples命名对来迭代数据框行:

pairs = sol.reset_index().itertuples(index=False, name=None)
print('\n'.join(str(p).strip('()') for p in pairs))

或者也可以使用Series.iteritems

for item in sol.iteritems():
    print(str(item).replace('(', '').replace(')', ''))

结果:

'x1', 'x3', 1.0
'x3', 'x4', 0.9692476431690819
'x1', 'x4', 0.9692476431690819
'x2', 'x4', 0.4728662437434603
'x2', 'x3', 0.39929785312496247
'x1', 'x2', 0.39929785312496247

推荐阅读