out-of-memory - 在 Julia 中使用内存映射

问题描述

我有一个 Julia 代码，版本 1.2，它在10000 x 10000 Array. 由于OutOfMemory()运行代码时出错，我正在探索运行它的其他选项，例如内存映射。关于的使用，由于对https://docs.julialang.org/en/v1/stdlib/Mmap/index.htmlMmap.mmap的解释很少，我对映射到磁盘的 Array 的使用有点困惑. 这是我的代码的开头：

using Distances
using LinearAlgebra
using Distributions
using Mmap
data=Float32.(rand(10000,15))
Eucldist=pairwise(Euclidean(),data,dims=1)
D=maximum(Eucldist.^2)
sigma2hat=mean(((Eucldist.^2)./D)[tril!(trues(size((Eucldist.^2)./D)),-1)])
L=exp.(-(Eucldist.^2/D)/(2*sigma2hat))

L是10000 x 10000 Array我想使用的，所以我将它映射到我的磁盘

s = open("mmap.bin", "w+")
write(s, size(L,1))
write(s, size(L,2))
write(s, L)
close(s)

在那之后我该怎么办？下一步是执行K=eigen(L)其他命令并将其应用于K. 我该怎么做？与K=eigen(L)或K=eigen(s)？对象的作用是什么s，它何时参与？此外，我不明白为什么我必须使用Mmap.sync!以及何时使用。在每个后续行之后eigen(L)？在代码的末尾？我如何确定我使用的是我的磁盘空间而不是 RAM 内存？想要一些关于内存映射的亮点，请。谢谢！

标签： out-of-memoryjuliamemory-mapping

如果内存使用是一个问题，通常最好将非常大的数组重新分配给 0，或类似的类型安全的小矩阵，以便内存可以被垃圾收集，假设你已经完成了那些中间矩阵。之后，您只需在存储的数据文件上调用 Mmap.mmap()，将数据的类型和维度作为 mmap 的第二个和第三个参数，然后将函数的返回值分配给您的变量，在本例中为 L，结果在 L 被绑定到文件内容：

using Distances
using LinearAlgebra
using Distributions
using Mmap

function testmmap()
    data = Float32.(rand(10000, 15))
    Eucldist = pairwise(Euclidean(), data, dims=1)
    D = maximum(Eucldist.^2)
    sigma2hat = mean(((Eucldist.^2) ./ D)[tril!(trues(size((Eucldist.^2) ./ D)), -1)])
    L = exp.(-(Eucldist.^2 / D) / (2 * sigma2hat))
    s = open("./tmp/mmap.bin", "w+")
    write(s, size(L,1))
    write(s, size(L,2))
    write(s, L)
    close(s)

    # deref and gc collect
    Eucldist = data = L = zeros(Float32, 2, 2)
    GC.gc()

    s = open("./tmp/mmap.bin", "r+") # allow read and write
    m = read(s, Int)
    n = read(s, Int)
    L = Mmap.mmap(s, Matrix{Float32}, (m, n))  # now L references the file contents
    K = eigen(L)
    K
end

testmmap()
@time testmmap()  # 109.657995 seconds (17.48 k allocations: 4.673 GiB, 0.73% gc time)

out-of-memory - 在 Julia 中使用内存映射

问题描述

解决方案

推荐阅读