首页 > 解决方案 > Swift FileHandle seek/readData 性能

问题描述

语境:

我有一个项目,我将大量数据存储在二进制文件和数据文件中。我检索二进制文件中的偏移量,存储为UInt64,这些偏移量中的每一个都为我提供了另一个文件中 utf-8 编码字符串的位置。

考虑到所有偏移量,我正在尝试从 utf-8 文件中重建所有字符串。包含所有字符串的文件的大小正好为 20437 字节/大约 177000 个字符串。

假设我已经检索了所有偏移量,现在需要一次重建每个字符串。我也有每个字符串的字节长度。

方法一:

我打开一个FileHandle设置为 utf8 编码的文件,对于每个偏移量我seek到偏移量并执行一次readData(ofLength:),整个操作很长……超过 35 秒。

方法二:

我用 初始化一个Data对象Data(contentsOf: URL)。然后,我Data.subdata(in: Range)为要构建的每个字符串执行一个。范围从 offset 开始,到 offset + size 结束。这会将整个文件加载到 RAM 中,并允许我检索每个字符串所需的字节。这比第一个选项快得多,但在性能方面可能同样糟糕。

我怎样才能获得这个特定任务的最佳性能?

标签: swiftbinaryfilesfilehandle

解决方案


I recently went through a similar experience when caching/loading binary data to/from disk.

Im not sure what the ultimate process is for best performance, but you can improve performance of method 2 further still, by using a "slice" of the data object instead of data.subdata(). This is similar to using array slices.

This probably because instead of creating more data objects with COPIES of the original data, the data returned from the slice uses the source Data object as a reference. This made a significant difference for me as my source data was actually pretty large. You should profile both methods and see if it makes a noticeable for you.

https://developer.apple.com/documentation/foundation/data/1779919-subscript


推荐阅读