首页 > 解决方案 > Scipy sparse array from list of integers

问题描述

I have a list (actually, a panda Series) of integers which I want to turn into a sparse matrix for which row i has only one non zero value at j the index defined by the ith element of the list. A little bit like in the question : Numpy: set one specific element of each column based on indexing by array

For now I am first turning the data in a dense matrix, like this :

def cat_to_mat(c, mat_length):
    res = np.zeros((len(c), mat_length))
    res[np.arange(len(c)), c.values % mat_length] = 1
    return res

and then I call csr_matrix on the resulting array. However, it seems a waste to generate a dense matrix and throw away most of its elements.

I am aware of tools like sklearn.preprocessing.OneHotEncoder but they do not really suit because I would like to avoid calling the fit() method: I want to be sure that the (sparse) representation of the data will only depend on c and mat_length (and not on some previous data provided to the OneHotEncoder)

标签: pythonpandasnumpyscipy

解决方案


Seems rather straightforward. Did I misunderstand the question?

from scipy import sparse

def cat_to_mat(c, mat_length):
    return sparse.csc_matrix((c, (len(c), c % mat_length)), shape=(len(c), mat_length))

推荐阅读