首页 > 解决方案 > couchbase not returning some documents when using specific index

问题描述

By the tests I made, it seems that these are specifically large documents (~2mb) and when the query uses specific indexes (in my case an array index).
It seems to be working ok when the document is smaller.
This happens either in Couchbase dashboard, cbq or the scala SDK i'm using.
I'm using Couchbase 4.6.0 with Memory optimized indexes.


I have these indexes relevant to this query:

CREATE INDEX `cache_partial_specific`
ON `content`(`docType`,`entityType`,`entityId`) 
WHERE (`docType` = "feedCachePartial") WITH { "defer_build"=true }  

CREATE INDEX `feed_cache_partial_meta`
ON `content`(`meta().id`)
WHERE (`docType` = `feedCachePartial`)  

CREATE INDEX `cache_partial_index`
ON `content`((distinct (array (`url`.`id`) for `url` in `urls` end)))
WHERE (`docType` = "feedCachePartial") WITH { "defer_build"=true }

The last is the one causing troubles


The problem:

For example When running
SELECT * FROM content WHERE meta().id = 'cached:topic:297:grp:all'

or

SELECT * FROM content WHERE docType='feedCachePartial' AND entityId=297 and entityType='topic'

it returns the documents and I see the url 13319 in the list or urls.

But when running

SELECT * FROM content
WHERE docType='feedCachePartial'
AND ANY url IN urls SATISFIES url.id = 13119 END

or any variation with the condition ANY url IN urls SATISFIES url.id = 13119

the document cached:topic:297:grp:all is not returned.


The max_indexer_doc_size is set to 20 MB so I believe this is not the issue (and either way it is being returned when using the other indexes).

When viewing the query log I see that this specific index i'm using has 1 replica (I have total of 3 index nodes on this cluster).


I would have investigate this index and see which documents resize on the index but I don't know how to do that.

标签: couchbasen1ql

解决方案


检查您的 indexer.log 并查看特定索引是否由于索引键大小限制而跳过您的文档键。如果索引未编入索引,则查询将找不到该文档。如果您已经知道文档键并且未涵盖查询,那么最好的选择是指定 USE KEYS 并删除 META().id 谓词,这样可以节省时间。

由于您的文档很大并且尝试进行 ARRAY 索引,它可能已经跳过。如果您知道文档键,则无需 Array Index 直接使用 USE KEYS 获取文档并应用谓词。如果由于大小限制而跳过文档,请查看此帖子https://forums.couchbase.com/t/how-to-read-max-array-seckey-size-setting-version-4-5-1-2844-community -edition-build-2844/16374

SELECT * FROM content USE KEYS "cached:topic:297:grp:all" WHERE .... 

除非您在 META().id 上进行搜索(例如:META().id LIKE "xyz%"),feed_cache_partial_meta 索引可能没有用。您可以使用使用键。

如果文档很小,您可以像这样组合其他索引,看看它是否有效并避免 Intersectscans。

CREATE INDEX `cache_partial_index`
ON `content`(`docType`,`entityType`,`entityId`, DISTINCT ARRAY url.id FOR url IN urls END)
WHERE (`docType` = "feedCachePartial") WITH { "defer_build"=true };

以下博客有有用的信息

https://blog.couchbase.com/create-right-index-get-right-performance/ https://blog.couchbase.com/n1ql-practical-guide-second-edition/


推荐阅读