首页 > 解决方案 > A very large square matrix powered by 2

问题描述

I have a very large square matrix of order around 570,000 x 570,000 and I want to power it by 2.

The data is in json format casting to associative array in array (dict inside dict in python) form

Let's say I want to represent this matrix:

[ [0, 0, 0],
  [1, 0, 5],
  [2, 0, 0] ]

In json it's stored like:

{"3": {"1": 2}, "2": {"1": 1, "3": 5}}

Which for example "3": {"1": 2} means the number in 3rd row and 1st column is 2.

I want the output to be the same as json, but powered by 2 (matrix multiplication)

The programming language isn't important. I want to calculate it the fastest way (less than 2 days, if possible)

So I tried to use Numpy in python (numpy.linalg.matrix_power), but it seems that it doesn't work with my nested unsorted dict format.

I wrote a simple python code to do that but I estimated that it would take 18 days to accomplish:

jsonFileName = "file.json"
def matrix_power(arr):
    result = {}
    for x1,subarray in arr.items():
        print("doing item:",x1)
        for y1,value1 in subarray.items():
            for x2,subarray2 in arr.items():
                if(y1 != x2):
                    continue
                for y2,value2 in subarray2.items():
                    partSum = value1 * value2
                    result[x1][y2] = result.setdefault(x1,{}).setdefault(y2,0) + partSum

    return result

import json
with open(jsonFileName, 'r') as reader: 
    jsonFile = reader.read()
    print("reading is succesful")
    jsonArr = json.loads(jsonFile)
    print("matrix is in array form")
    matrix = matrix_power(jsonArr)
    print("Well Done! matrix is powered by 2 now")
    output = json.dumps(matrix)
    print("result is in json format")
    writer = open("output.json", 'w+')
    writer.write(output)
    writer.close()
print("Task is done! you can close this window now")

Here, X1,Y1 is the row and col of the first matrix which then is multiplied by the corresponding element of the second matrix (X2,Y2).

标签: pythonnumpymatrixoptimizationmemory-management

解决方案


Numpy 不是问题,您需要以 numpy 可以理解的格式输入它,但是由于您的矩阵非常大,它可能不适合内存,因此使用稀疏矩阵 ( scipy.sparse.csr_matrix) 可能是个好主意:

m = scipy.sparse.csr_matrix((
    [v for row in data.values() for v in row.values()], (
        [int(row_n) for row_n, row in data.items() for v in row],
        [int(column) for row in data.values() for column in row]
    )
))

那么这只是一个做的事情:

m**2

推荐阅读