swift - Core Data 中什么样的查询可以从 R-Tree 属性索引中获益?
问题描述
在阅读了这篇关于 SQLite 中 R*Tree 的文章https://www.sqlite.org/rtree.html后,我目前正在核心数据模型中试验 2-Dim R-Tree。特别是我期望(可能有点天真)索引表上的某种语句,但是在对具有索引属性的实体select
执行 fetch 语句时,我没有在 SQLite 调试跟踪中看到任何语句(参见下面的代码)。Region
predicateBoundaryIdx
我的问题是:核心数据模型(实体、属性)和 NSPredicate 必须如何才能从R-Tree索引中受益?
[XCode v11.4,iOS v13.1,斯威夫特。打开 com.apple.CoreData.SQLDebug 4]
模型
指数
对应的数据库方案
CREATE TABLE ZPERSON ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZLOCATION INTEGER, Z1CONTACTS INTEGER, ZNAME VARCHAR );
CREATE TABLE ZREGION ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZMAXLATITUDE FLOAT, ZMAXLATITUDEIDX FLOAT, ZMAXLONGITUDE FLOAT, ZMAXLONGITUDEIDX FLOAT, ZMINLATITUDE FLOAT, ZMINLATITUDEIDX FLOAT, ZMINLONGITUDE FLOAT, ZMINLONGITUDEIDX FLOAT, ZNAME VARCHAR );
CREATE INDEX ZPERSON_ZLOCATION_INDEX ON ZPERSON (ZLOCATION);
CREATE INDEX ZPERSON_Z1CONTACTS_INDEX ON ZPERSON (Z1CONTACTS);
CREATE VIRTUAL TABLE Z_Region_RegionIndex USING RTREE (Z_PK INTEGER PRIMARY KEY, ZMINLATITUDEIDX_MIN, ZMINLATITUDEIDX_MAX, ZMAXLATITUDEIDX_MIN, ZMAXLATITUDEIDX_MAX, ZMINLONGITUDEIDX_MIN, ZMINLONGITUDEIDX_MAX, ZMAXLONGITUDEIDX_MIN, ZMAXLONGITUDEIDX_MAX)
/* Z_Region_RegionIndex(Z_PK,ZMINLATITUDEIDX_MIN,ZMINLATITUDEIDX_MAX,ZMAXLATITUDEIDX_MIN,ZMAXLATITUDEIDX_MAX,ZMINLONGITUDEIDX_MIN,ZMINLONGITUDEIDX_MAX,ZMAXLONGITUDEIDX_MIN,ZMAXLONGITUDEIDX_MAX) */;
CREATE TABLE IF NOT EXISTS "Z_Region_RegionIndex_rowid"(rowid INTEGER PRIMARY KEY,nodeno);
CREATE TABLE IF NOT EXISTS "Z_Region_RegionIndex_node"(nodeno INTEGER PRIMARY KEY,data);
CREATE TABLE IF NOT EXISTS "Z_Region_RegionIndex_parent"(nodeno INTEGER PRIMARY KEY,parentnode);
测试代码
func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
let mainContext: NSManagedObjectContext
mainContext = persistentContainer.viewContext
mainContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
mainContext.undoManager = nil
mainContext.shouldDeleteInaccessibleFaults = true
mainContext.automaticallyMergesChangesFromParent = true
var personObj: Person
var locationObj: Region
let n = 1000000
let personNr = stride(from: 1, through: n+1, by: 1).map(String.init).shuffled()
for i in 1...n
{
personObj = Person(context: mainContext)
locationObj = Region(context: mainContext)
locationObj.name = "Region \(i)"
locationObj.minlatitude = 40.000000 - Float.random(in: 0 ..< 5)
locationObj.minlongitude = 9.000000 - Float.random(in: 0 ..< 5)
locationObj.maxlatitude = 40.000000 + Float.random(in: 0 ..< 5)
locationObj.maxlongitude = 9.000000 + Float.random(in: 0 ..< 5)
locationObj.minlatitudeidx = locationObj.minlatitude
locationObj.minlongitudeidx = locationObj.minlongitude
locationObj.maxlatitudeidx = locationObj.maxlatitude
locationObj.maxlongitudeidx = locationObj.maxlongitude
personObj.name = "Person \(personNr[i])"
personObj.location = locationObj
if i % 1000 == 0 {
saveContext()
}
}
saveContext()
let request: NSFetchRequest<Region> = Region.fetchRequest()
let requestIdx: NSFetchRequest<Region> = Region.fetchRequest()
let eps : Float = 1.0
let predicateBoundaryIdx = NSPredicate(format: "(minlatitudeidx >= %lf and maxlatitudeidx =< %lf) and (minlongitudeidx >= %lf and maxlongitudeidx =< %lf)",40.000000-eps,40.000000+eps,9.000000-eps,9.000000+eps)
let predicateBoundary = NSPredicate(format: "(minlatitude >= %lf and maxlatitude =< %lf) and (minlongitude >= %lf and maxlongitude =< %lf)",40.000000-eps,40.000000+eps,9.000000-eps,9.000000+eps)
requestIdx.predicate = predicateBoundaryIdx;
request.predicate = predicateBoundary;
print("fetch index:")
do {
let result = try mainContext.count(for:requestIdx)
print("Count = \(result)")
} catch {
print("Error: \(error)")
}
print("fetch no index:")
do {
let result = try mainContext.count(for:request)
print("Count = \(result)")
} catch {
print("Error: \(error)")
}
for store in (persistentContainer.persistentStoreCoordinator.persistentStores) {
os_log("Store URL: %@", log: Debug.coredata_log, type: .info, store.url?.absoluteString ?? "No Store")
}
return true
}
核心数据 SQL 跟踪
CoreData: sql: SELECT COUNT( DISTINCT t0.Z_PK) FROM ZREGION t0 WHERE ( t0.ZMINLATITUDEIDX >= ? AND t0.ZMAXLATITUDEIDX <= ? AND t0.ZMINLONGITUDEIDX >= ? AND t0.ZMAXLONGITUDEIDX <= ?)
解决方案
2017 年引入了对 R-Tree 索引的 CoreData 支持。WWDC 2017 会议 210涵盖了它并提供了一个示例。正如您将看到的,关键是您需要在谓词格式字符串中使用一个函数来指示应该使用索引。WWDC 2018 session 224还有另一个例子。
对您的示例进行稍微简单的变体:具有位置(latitude
和longitude
)属性和name
属性的实体:
添加一个名为“bylocation”的获取索引,将其类型指定为“R-Tree”并为 and 添加获取索引latitude
元素longitude
:
稍微修改您的代码,以反映不同的属性等。准备两个单独的谓词,一个使用索引,另一个不使用,并运行它们进行比较:
let mainContext: NSManagedObjectContext
mainContext = persistentContainer.viewContext
mainContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
mainContext.undoManager = nil
mainContext.shouldDeleteInaccessibleFaults = true
mainContext.automaticallyMergesChangesFromParent = true
var locationObj: Region
let n = 10 // Just for demo purposes
for i in 1...n
{
locationObj = Region(context: mainContext)
locationObj.name = "Region \(i)"
locationObj.latitude = 40.000000 + 5.0 - Float.random(in: 0 ..< 10)
locationObj.longitude = 9.000000 + 5.0 - Float.random(in: 0 ..< 10)
if i % 1000 == 0 {
saveContext()
}
}
saveContext()
mainContext.reset()
let request: NSFetchRequest<Region> = Region.fetchRequest()
let requestIdx: NSFetchRequest<Region> = Region.fetchRequest()
let eps : Float = 1.0
let predicateBoundaryIdx = NSPredicate(format: "indexed:by:(latitude, 'bylocation') between { %lf, %lf } AND indexed:by:(longitude, 'bylocation') between { %lf, %lf }", 40.0-eps, 40.0+eps, 9.0-eps, 9.0+eps)
let predicateBoundary = NSPredicate(format: "latitude between { %lf, %lf } AND longitude between { %lf, %lf} ",40.000000-eps,40.000000+eps,9.000000-eps,9.000000+eps)
requestIdx.predicate = predicateBoundaryIdx;
request.predicate = predicateBoundary;
print("fetch index:")
do {
let result = try mainContext.fetch(requestIdx)
print("Count = \(result.count)")
} catch {
print("Error: \(error)")
}
mainContext.reset()
print("fetch no index:")
do {
let result = try mainContext.fetch(request)
print("Count = \(result.count)")
} catch {
print("Error: \(error)")
}
使用 SQLDebug = 4 运行它,然后您可以在日志中看到一些正在发生的事情。首先创建数据库并添加Region表,然后是RTree索引。每当修改 Region 表时,都会创建触发器以将相关数据添加到索引中:
CoreData: sql: CREATE TABLE ZREGION ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZLATITUDE FLOAT, ZLONGITUDE FLOAT, ZNAME VARCHAR )
CoreData: sql: CREATE VIRTUAL TABLE IF NOT EXISTS Z_Region_bylocation USING RTREE (Z_PK INTEGER PRIMARY KEY, ZLATITUDE_MIN, ZLATITUDE_MAX, ZLONGITUDE_MIN, ZLONGITUDE_MAX)
CoreData: sql: CREATE TRIGGER IF NOT EXISTS Z_Region_bylocation_INSERT AFTER INSERT ON ZREGION FOR EACH ROW BEGIN INSERT OR REPLACE INTO Z_Region_bylocation (Z_PK, ZLATITUDE_MIN, ZLATITUDE_MAX, ZLONGITUDE_MIN, ZLONGITUDE_MAX) VALUES (NEW.Z_PK, NEW.ZLATITUDE, NEW.ZLATITUDE, NEW.ZLONGITUDE, NEW.ZLONGITUDE) ; END
CoreData: sql: CREATE TRIGGER IF NOT EXISTS Z_Region_bylocation_UPDATE AFTER UPDATE ON ZREGION FOR EACH ROW BEGIN DELETE FROM Z_Region_bylocation WHERE Z_PK = NEW.Z_PK ; INSERT INTO Z_Region_bylocation (Z_PK, ZLATITUDE_MIN, ZLATITUDE_MAX, ZLONGITUDE_MIN, ZLONGITUDE_MAX) VALUES (NEW.Z_PK, NEW.ZLATITUDE, NEW.ZLATITUDE, NEW.ZLONGITUDE, NEW.ZLONGITUDE) ; END
CoreData: sql: CREATE TRIGGER IF NOT EXISTS Z_Region_bylocation_DELETE AFTER DELETE ON ZREGION FOR EACH ROW BEGIN DELETE FROM Z_Region_bylocation WHERE Z_PK = OLD.Z_PK ; END
然后,当涉及到提取时,您可以看到两个不同的查询被发送到 SQLite:
使用索引:
CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZLATITUDE, t0.ZLONGITUDE, t0.ZNAME FROM ZREGION t0 WHERE ( t0.Z_PK IN (SELECT n1_t0.Z_PK FROM Z_Region_bylocation n1_t0 WHERE (? <= n1_t0.ZLATITUDE_MIN AND n1_t0.ZLATITUDE_MAX <= ?)) AND t0.Z_PK IN (SELECT n1_t0.Z_PK FROM Z_Region_bylocation n1_t0 WHERE (? <= n1_t0.ZLONGITUDE_MIN AND n1_t0.ZLONGITUDE_MAX <= ?)))
日志甚至包括 SQLite 使用的查询计划:
2 0 0 SEARCH TABLE ZREGION AS t0 USING INTEGER PRIMARY KEY (rowid=?)
6 0 0 LIST SUBQUERY 1
8 6 0 SCAN TABLE Z_Region_bylocation AS n1_t0 VIRTUAL TABLE INDEX 2:D0B1
26 0 0 LIST SUBQUERY 2
28 26 0 SCAN TABLE Z_Region_bylocation AS n1_t0 VIRTUAL TABLE INDEX 2:D2B3
没有索引:
CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZLATITUDE, t0.ZLONGITUDE, t0.ZNAME FROM ZREGION t0 WHERE (( t0.ZLATITUDE BETWEEN ? AND ?) AND ( t0.ZLONGITUDE BETWEEN ? AND ?))
2 0 0 SCAN TABLE ZREGION AS t0
从中可以看出,使用索引涉及一些非常混乱的子选择。我发现结果是,对于小型数据集,索引实际上会减慢速度。同样,如果结果集很大。但是如果数据集很大,结果集很小,那就有优势了。我把它留给你玩,看看这款游戏是否物有所值。我不太明白的一件事是,使用索引需要两个单独的子选择,一个用于经度,一个用于纬度。在我看来(尽管也许我遗漏了一些东西)破坏了 R-Trees 的全部意义,即它们的多维性。
推荐阅读
- bash - 从 find 命令中修剪目录并附加 `/`
- python - 如何将我在 tkinter 树中单击的目录的路径传递给“打开文件”函数以显示它
- graphql - “无法在“查询”类型上查询“可用”字段
- python - 悬停在应用程序图标上时如何设置系统托盘标题
- php - 如何在 PHP 中自动将表单数据发送到电子邮件地址
- python - 将通用 pandas 数据框传递给 django REST API
- reactjs - @types/react-transition-group:通用类型 'ReactElement
' 需要 1 到 2 个类型的 arguments.ts(2707)
- reactjs - 如何使用 Expo 修复未定义 React-Native 的读取属性“showimagepicker”
- css - 一种在移动设备上隐藏标签但显示包含的 div 的方法
- java - 为什么返回语句没有执行?