python - A very large square matrix powered by 2
问题描述
I have a very large square matrix of order around 570,000 x 570,000 and I want to power it by 2.
The data is in json format casting to associative array in array (dict inside dict in python) form
Let's say I want to represent this matrix:
[ [0, 0, 0],
[1, 0, 5],
[2, 0, 0] ]
In json it's stored like:
{"3": {"1": 2}, "2": {"1": 1, "3": 5}}
Which for example "3": {"1": 2}
means the number in 3rd row and 1st column is 2.
I want the output to be the same as json, but powered by 2 (matrix multiplication)
The programming language isn't important. I want to calculate it the fastest way (less than 2 days, if possible)
So I tried to use Numpy in python (numpy.linalg.matrix_power
), but it seems that it doesn't work with my nested unsorted dict format.
I wrote a simple python code to do that but I estimated that it would take 18 days to accomplish:
jsonFileName = "file.json"
def matrix_power(arr):
result = {}
for x1,subarray in arr.items():
print("doing item:",x1)
for y1,value1 in subarray.items():
for x2,subarray2 in arr.items():
if(y1 != x2):
continue
for y2,value2 in subarray2.items():
partSum = value1 * value2
result[x1][y2] = result.setdefault(x1,{}).setdefault(y2,0) + partSum
return result
import json
with open(jsonFileName, 'r') as reader:
jsonFile = reader.read()
print("reading is succesful")
jsonArr = json.loads(jsonFile)
print("matrix is in array form")
matrix = matrix_power(jsonArr)
print("Well Done! matrix is powered by 2 now")
output = json.dumps(matrix)
print("result is in json format")
writer = open("output.json", 'w+')
writer.write(output)
writer.close()
print("Task is done! you can close this window now")
Here, X1,Y1 is the row and col of the first matrix which then is multiplied by the corresponding element of the second matrix (X2,Y2).
解决方案
Numpy 不是问题,您需要以 numpy 可以理解的格式输入它,但是由于您的矩阵非常大,它可能不适合内存,因此使用稀疏矩阵 ( scipy.sparse.csr_matrix
) 可能是个好主意:
m = scipy.sparse.csr_matrix((
[v for row in data.values() for v in row.values()], (
[int(row_n) for row_n, row in data.items() for v in row],
[int(column) for row in data.values() for column in row]
)
))
那么这只是一个做的事情:
m**2
推荐阅读
- jupyter-notebook - 启动 Jupyter Lab 时出现语言服务器规范查找器错误
- c# - 记录哪个组件在哪个路径下搜索哪个 dll
- http - 如何使用从批处理脚本调用的 curl 获取 http post 请求的响应代码?
- javascript - Firebase 查询(多值)
- java - Protocol Buffer 首次使用高延迟
- ionic-framework - ERROR TypeError: Object(...) is not a function at WebIntent.get [as ACTION_VIEW]
- laravel - 如果不为空,则与关系
- javascript - Angular over https:websocket请求不起作用
- python - 为什么检查任何元素是否存在匹配条件时 any() 比 yield 慢
- flutter - 我的颤振项目没有连接到火力基地