首页 > 解决方案 > 如何循环遍历稀疏矩阵以在循环中为特定列构建目标变量并将矩阵的其余部分用作特征

问题描述

我有一个稀疏矩阵存储为 np z 压缩文件。下面是从文件中加载的数据的样子。我有一个目标变量列表,我想在循环中将矩阵拆分为目标变量和特征变量。

import numpy as np
import scipy.sparse as sp
i = np.array([0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,2])
j = np.array([0,1,2,3,4,0,5,6,3,7,0,5,8,2,9,7])
data = np.array([10,2,1,1,1,1,2,1,1,1,2,1,3,1,1,1])
visits = np.array(['vis_num_1','vis_num_2','vis_num_3'])
features = np.array(['feature_1','feature_4','codes_a','codes_b','codes_c','feature_6','codes_d','codes_e','feature_2','codes_g'])
sp_mat = sp.coo_matrix((data, (i, j)), shape=(len(visits), len(features)))
sp_mat = sp_mat.tocsr()
target_list = ['feature_1','feature_4']

#TODO - loop through target_list to create target variable array and feature variable arrays

print(sp_mat.toarray())

Out:
array([[10,  2,  1,  1,  1,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  1,  0,  2,  1,  1,  0,  0],
       [ 2,  0,  1,  0,  0,  1,  0,  1,  3,  1]], dtype=int32)

target_arr, feature_arr = sp_mat[:, :1].toarray(), sp_mat[:, 1:].toarray()
print(target_arr)
Out: 
array([[10],
       [ 1],
       [ 2]], dtype=int32)
print(feature_arr)
Out: 
array([[2, 1, 1, 1, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 2, 1, 1, 0, 0],
       [0, 1, 0, 0, 1, 0, 1, 3, 1]], dtype=int32)

标签: pythonnumpyscipysparse-matrix

解决方案


推荐阅读