首页 > 解决方案 > 在Python中将json转换为coo稀疏矩阵

问题描述

我正在尝试转换形状的 JSON 文件:

{"1": 
{"2": 0, "3": 0, "4": 0, "5": 1, "6": 0, "7": 1, "8": 0, "9": 0, "10": 0, "11": 1, "12": 1, "13": 0, "14": 1, "15": 1, "16": 0, "17": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0, "25": 0, "26": 0, "27": 0, "28": 0, "29": 0, "30": 0, "31": 1, "32": 0, "33": 0, "34": 1, "35": 0, "36": 0, "37": 0, "38": 0, "39": 0, "40": 0, "41": 0, "42": 0, "43": 0, "44": 0, "45": 0},
 "2": 
{"2": 0, "3": 0, "4": 0, "5": 0, "6": 1, "7": 0, "8": 1, "9": 1, "10": 1, "11": 0, "12": 0, "13": 1, "14": 1, "15": 1, "16": 0, "17": 0, "18": 1, "19": 0, "20": 1, "21": 1, "22": 0, "23": 0, "24": 0, "25": 1, "26": 0, "27": 0, "28": 0, "29": 1, "30": 0, "31": 1, "32": 1, "33": 0, "34": 0, "35": 0, "36": 0, "37": 1, "38": 0, "39": 0, "40": 1, "41": 1, "42": 0, "43": 0, "44": 1, "45": 1}, 
"3": 
{"2": 1, "3": 0, "4": 0, "5": 0, "6": 0, "7": 1, "8": 0, "9": 0, "10": 0, "11": 0, "12": 0, "13": 0, "14": 0, "15": 0, "16": 0, "17": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0, "25": 0, "26": 0, "27": 0, "28": 0, "29": 0, "30": 0, "31": 1, "32": 0, "33": 0, "34": 0, "35": 0, "36": 0, "37": 0, "38": 0, "39": 0, "40": 0, "41": 0, "42": 0, "43": 0, "44": 0, "45": 0}, 
"4": 
{"2": 1, "3": 1, "4": 1, "5": 1, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0, "11": 1, "12": 1, "13": 0, "14": 0, "15": 0, "16": 1, "17": 1, "18": 0, "19": 0, "20": 0, "21": 0, "22": 1, "23": 1, "24": 1, "25": 0, "26": 1, "27": 1, "28": 1, "29": 0, "30": 1, "31": 0, "32": 0, "33": 0, "34": 1, "35": 0, "36": 1, "37": 0, "38": 1, "39": 0, "40": 0, "41": 0, "42": 1, "43": 1, "44": 0, "45": 0}}

进入一个 coo 稀疏矩阵,其中有一个坐标显示第一个键,然后是第二个键,然后是值,如下所示:

(1,2) 0
(1,3) 0
(1,4) 0
(1,5) 1
...
(4,44) 0
(4,45) 0

我尝试将 JSON 文件转换为如下所示的 pandas 数据框:

in  1   2   3   4
2   0   0   1   1
3   0   0   0   1
4   0   0   0   1
5   1   0   0   1
6   0   1   0   0
7   1   0   1   0
8   0   1   0   0
9   0   1   0   0
10  0   1   0   0
11  1   0   0   1
12  1   0   0   1
13  0   1   0   0
14  1   1   0   0
15  1   1   0   0
16  0   0   0   1
17  0   0   0   1
18  0   1   0   0
19  0   0   0   0
20  0   1   0   0
21  0   1   0   0
22  0   0   0   1
23  0   0   0   1
24  0   0   0   1
25  0   1   0   0
26  0   0   0   1
27  0   0   0   1
28  0   0   0   1
29  0   1   0   0
30  0   0   0   1
31  1   1   1   0
32  0   1   0   0
33  0   0   0   0
34  1   0   0   1
35  0   0   0   0
36  0   0   0   1
37  0   1   0   0
38  0   0   0   1
39  0   0   0   0
40  0   1   0   0
41  0   1   0   0
42  0   0   0   1
43  0   0   0   1
44  0   1   0   0
45  0   1   0   0

但是我无法将其转换为稀疏矩阵,这将在扩大规模时消除任何功能。

标签: pythonnumpyscipysparse-matrix

解决方案


当我将您复制粘贴json到 Ipython 会话时,我得到一个带有 4 个键的字典。

我可以将其解压缩到一个列表中:

In [466]: alist = [] 
     ...: for k,v in adict.items(): 
     ...:     for k1,v1 in v.items(): 
     ...:         alist.append((int(k),int(k1),v1)) 
     ...:                    

并制作一个数组:

In [467]: arr = np.array(alist)                                                                              
In [468]: arr.shape                                                                                          
Out[468]: (176, 3)

并使用数组的 3 列作为输入sparse.coo_matrix

In [469]: M = sparse.coo_matrix((arr[:,2],(arr[:,0],arr[:,1])))                                              
In [470]: M                                                                                                  
Out[470]: 
<5x46 sparse matrix of type '<class 'numpy.int64'>'
    with 176 stored elements in COOrdinate format>
In [471]: M.A                                                                                                
Out[471]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0],
       [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1,
        0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0,
        1, 1],
       [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0],
       [0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0,
        1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1,
        0, 0]])

变体:

In [472]: rows, cols, data = [],[],[] 
     ...: for k,v in adict.items(): 
     ...:     for k1,v1 in v.items(): 
     ...:         rows.append(int(k)) 
     ...:         cols.append(int(k1)) 
     ...:         data.append(v1) 
     ...:                                                                                                    
In [473]: len(rows)                                                                                          
Out[473]: 176
In [474]: M = sparse.coo_matrix((data,(rows,cols)))  

推荐阅读