首页 > 解决方案 > 在 Mongodb 中使用长字符串数据索引键

问题描述

我正在将 mongodb 与mongoruby​​ gem 一起使用。

我正在处理的数百万个文档的集合中有很长的字符串,我需要通过长字符串进行查找 - 这非常慢 - 大约需要 5 秒 - 这表明我要索引键。当我尝试在 mongodb 中索引相应的键时出现错误key too large to index。在阅读Mongodb 文档时,这是预期的,因为索引大小有限制 -The total size of an index entry, which can include structural overhead depending on the BSON type, must be less than 1024 bytes.

处理此问题的一种方法是将保存日志字符串的密钥拆分为小于 1024 字节的更小的可咀嚼大小。

client[:longStrColl].find().each do |doc|
    strParts = {}
    str = doc[:longStr]
    strParts = str.scan(/.{1,1024}/) # split into parts of max 1024 and min 1 chars
    strParts.each_with_index do |val, index|
        strParts["str#{index}"] = val;
    end
    client[:longStrColl].update_one({"_id" => doc["_id"]},doc.merge(strParts))
end

这将longStr拆分为 2 个最大长度为 1024 的键,如下所示,以便可以对它们进行复合索引以加快查找速度。

{
    "_id" : ObjectId("5b6c634dd0ae362168c8fd58"),
    "longStr" : "abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0",
    //other key: values 
    "str0" : "abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a",
    "str1" : "6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0"
}

我面临的问题是即使这样我的索引创建也因错误而失败key too large to index

db.longStrColl.createIndex( { str0: 1, str1: 1});

如何正确拆分字符串并为其编制索引?

标签: rubymongodb

解决方案


推荐阅读