ruby - 在 Mongodb 中使用长字符串数据索引键
问题描述
我正在将 mongodb 与mongo
ruby gem 一起使用。
我正在处理的数百万个文档的集合中有很长的字符串,我需要通过长字符串进行查找 - 这非常慢 - 大约需要 5 秒 - 这表明我要索引键。当我尝试在 mongodb 中索引相应的键时出现错误key too large to index
。在阅读Mongodb 文档时,这是预期的,因为索引大小有限制 -The total size of an index entry, which can include structural overhead depending on the BSON type, must be less than 1024 bytes.
处理此问题的一种方法是将保存日志字符串的密钥拆分为小于 1024 字节的更小的可咀嚼大小。
client[:longStrColl].find().each do |doc|
strParts = {}
str = doc[:longStr]
strParts = str.scan(/.{1,1024}/) # split into parts of max 1024 and min 1 chars
strParts.each_with_index do |val, index|
strParts["str#{index}"] = val;
end
client[:longStrColl].update_one({"_id" => doc["_id"]},doc.merge(strParts))
end
这将longStr
拆分为 2 个最大长度为 1024 的键,如下所示,以便可以对它们进行复合索引以加快查找速度。
{
"_id" : ObjectId("5b6c634dd0ae362168c8fd58"),
"longStr" : "abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0",
//other key: values
"str0" : "abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a",
"str1" : "6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0"
}
我面临的问题是即使这样我的索引创建也因错误而失败key too large to index
db.longStrColl.createIndex( { str0: 1, str1: 1});
如何正确拆分字符串并为其编制索引?
解决方案
推荐阅读
- qt - 单击列时在 Qml TableView Header 中查找单击事件
- c# - Xamarin Java.exe 以代码 1 退出(Proguard 问题)
- python - 如何阻止 asyncio.Task 调用
- c++ - 将 Setter 添加到函数模板
- c++ - 如何确定正在运行的用户应用程序,例如 C++ 中的 chrome、word、spotify
- javascript - 用 JavaScript 模拟剪切功能?
- php - 如何以程序风格使用 GAE Memcache?
- google-sheets - 列中值的单次出现
- javascript - 提交后如何禁用输入字段
- python - 使用 for 循环和 if 语句查找迭代的根