首页 > 解决方案 > 在 MongoDB 数组中查找索引

问题描述

我们的数据提供者以一种奇怪的格式提供数据。数组datevalue是对应的并保证具有相同的长度。无论出于何种原因,他们甚至决定在date.

[
  {
    "_id": "A000005933",
    "date": [905270400000, 918748800000, 937843200000, 965923200000, 983289600000, 984931200000, 1152806400000, "1171987200000", "1225382400000", "1229616000000", "1286208000000", "1455552000000"],
    "value": ["0.25", "0.15", "0", "0.25", "0.15", "0", "0.25", "0.5", "0.3", "0.1", "0.1", "-0.1"],
    "version": 1614837436798
  },
  {
    "_id": "A000005934",
    "date": [915120000000, 923587200000, 941731200000, 949593600000, 953222400000, 956851200000, 962121600000, 967737600000, 970761600000, 989510400000, 999187200000, 1000742400000, 1005235200000, 1039104000000, 1046966400000, 1054828800000, 1133798400000, 1141747200000, 1150300800000, 1155052800000, 1160496000000, 1165939200000, 1173801600000, 1181664000000, 1215532800000, 1224000000000, 1226419200000, 1228838400000, 1232467200000, 1236700800000, 1239120000000, 1242144000000, 1302624000000, 1310486400000, 1320768000000, 1323792000000, 1341936000000, 1367942400000, 1384272000000, 1402416000000, 1410278400000, 1458057600000],
    "value": ["3", "2.5", "3", "3.25", "3.5", "3.78", "4.25", "4.5", "4.78", "4.5", "4.25", "3.75", "3.25", "2.78", "2.5", "2", "2.25", "2.5", "2.75", "3", "3.25", "3.5", "3.75", "4", "4.25", "3.75", "3.25", "2.5", "2", "1.5", "1.25", "1", "1.25", "1.5", "1.25", "1", "0.75", "0.5", "0.25", "0.15", "0.05", "0"],
    "version": 1614837436548
  },
  ......
]

value我们的典型用例是基于_idand进行查找date,所以我不得不做这样的事情。

def get_value_from_mongo(id_: str, date: datetime.date) -> float:
    result = db.indicators.find_one({"_id": _id}, {"value": 1, "date": 1})
    date_list = list(map(str, result["date"]))
    price_list = list(map(str, result["value"]))

    dt = date.strftime("%s000")
    price = float(price_list[date_list.index(dt)])

    return price

这是非常低效的,因为每次我想检索单个值时都会扫描整个数组。也许我可以进行二进制搜索,但date不能保证已排序,我不想依赖这种行为。

我可以使用任何 MongoDB 运算符来加快查询速度吗?

标签: databasemongodbperformancequery-optimizationpymongo

解决方案


  1. 第一种可能性是专注于查找:在日期数组上创建索引

这是因为写入速度较慢。

在下面的执行计划中,您可以看到使用了索引(如果它带来改进,您应该进行基准测试

> db.indicators.explain().find({dates: '1.1'})
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "dummy.indicators",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "dates" : {
        "$eq" : "1.1"
      }
    },
    "queryHash" : "4204704C",
    "planCacheKey" : "1DBFE945",
    "winningPlan" : {
      "stage" : "FETCH",
      "inputStage" : {
        "stage" : "IXSCAN",// <------
        "keyPattern" : {
          "dates" : 1
        },
        "indexName" : "dates_1",
        "isMultiKey" : true,
        "multiKeyPaths" : {
          "dates" : [
            "dates"
          ]
        },
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 2,
        "direction" : "forward",
        "indexBounds" : {
          "dates" : [
            "[\"1.1\", \"1.1\"]"

  1. 第二种可能性是专注于检索尽可能少的数据

暗示瓶颈不是日期查找而是数据传输

因此,这不会改善查找(假设您在数据库端而不是应用程序代码端“迭代”您的数组)。

您可以使用

  • 位置运算
  • 使用 mongo >= 4.4 将投影作为 find 中的第二个参数
db.indicators.remove({})
db.indicators.insert([{_id: '0', dates: [1, '1.1', 2], prices: [1,2,3]}])
fetch = date => {
  print(date)
  res = db.indicators.find(
    {
      dates: {
        $elemMatch: {
          $in: [Number(date), String(date)]
        }
      }
    },
    {
      'prices.$': 1 // <<--------
    }
  ).toArray()
  printjson(res)
}
fetch(2) // [ { "_id" : "0", "prices" : [ 3 ] } ]
fetch('1.1') // [ { "_id" : "0", "prices" : [ 2 ] } ]


显然你可以组合 1 和 2,但我会尝试只使用 2 来避免创建索引


推荐阅读