postgresql - Indexing a jsonb array in Postgres
问题描述
I try to set up a GIN index but I do not think my index is used when I run the request, whether I use an operator or a function.
Environment
In our table we have a JSONB field (json_aip
) containing a Json that looks like that:
{
"properties": {
"pdi": {
"contextInformation": {
"tags": ["SOME_TAG"]
},
},
}
Table creation :
create table t_aip (
json_aip jsonb,
[...]
);
CREATE INDEX idx_aip_tags
ON t_aip
USING gin ((json_aip -> 'properties' -> 'pdi' -> 'contextInformation' -> 'tags'));
Operator query
We can't use the operator ?|
as we use JDBC. But rumors indicate I should see my index when I run that type of query.
EXPLAIN ANALYZE SELECT count(*)
FROM storage.t_aip
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' ?| array['SOME_TAG']
Result:
Aggregate
(cost=27052.16..27052.17 rows=1 width=8) (actual time=488.085..488.087 rows=1 loops=1)
-> Seq Scan on t_aip (cost=0.00..27052.06 rows=42 width=0) (actual time=0.134..456.978 rows=16502 loops=1)
Filter: ((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]) ?| '{SOME_TAG}'::text[])
Rows Removed by Filter: 17511
Planning time: 23.202 ms
Execution
time: 488.449 ms
Functional query
EXPLAIN ANALYZE SELECT count(*)
FROM storage.t_aip
WHERE jsonb_exists_any(
json_aip#>'{properties,pdi,contextInformation,tags}',
array['SOME_TAG']
)
Result:
QUERY PLAN
Aggregate (cost=27087.00..27087.01 rows=1 width=8) (actual time=369.931..369.933 rows=1 loops=1)
-> Seq Scan on t_aip (cost=0.00..27052.06 rows=13979 width=0) (actual time=0.173..350.437 rows=16502 loops=1)
Filter: jsonb_exists_any((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]), '{SOME_TAG}'::text[])
Rows Removed by Filter: 17511
Planning time: 56.021 ms
Execution time: 370.252 ms
There is nothing about the index at all. Any help would be much appreciated !
I think my index is wrong because it considers that at the end of the path json_aip -> 'properties' -> 'pdi' -> 'contextInformation' -> 'tags'
it index a String whether that's an array. That's my opinion.
解决方案
有一个通用规则,您必须在索引和查询中使用完全相同的表达式才能使用索引。有了这个索引:
CREATE INDEX idx_aip_tags
ON t_aip
USING gin ((json_aip#>'{properties,pdi,contextInformation,tags}'));
查询将使用索引
EXPLAIN ANALYZE
SELECT count(*)
FROM t_aip
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' ?| array['SOME_TAG']
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=149.97..149.98 rows=1 width=0) (actual time=27.783..27.783 rows=1 loops=1)
-> Bitmap Heap Scan on t_aip (cost=20.31..149.87 rows=40 width=0) (actual time=1.504..25.726 rows=20000 loops=1)
Recheck Cond: ((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]) ?| '{SOME_TAG}'::text[])
Heap Blocks: exact=345
-> Bitmap Index Scan on idx_aip_tags (cost=0.00..20.30 rows=40 width=0) (actual time=1.455..1.455 rows=20000 loops=1)
Index Cond: ((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]) ?| '{SOME_TAG}'::text[])
请注意,GIN 索引还支持@>
运算符:
SELECT count(*)
FROM t_aip
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> '["SOME_TAG"]'
但在搜索多个标签时要小心:
SELECT count(*)
FROM t_aip
-- this gives objects containing both tags:
-- WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> '["SOME_TAG", "ANOTHER_TAG"]'
-- and this gives objects with any of two tags:
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> ANY(ARRAY['["SOME_TAG"]', '["ANOTHER_TAG"]']::jsonb[])
推荐阅读
- javascript - 如果代码进入 if 时钟而不是 else 块,则内部错误条件后的事件
- sql - 每秒将列从 excel 发送到 MSSQL 5 次
- excel - 将各种excel中的同名表合并为一张
- tableau-api - 如何在画面中比较当前和去年期间的 3 个度量
- c++ - 以二进制格式写入 vtk 文件
- python - 避免 for 循环创建数组
- javascript - TinyMCE 5:大写的自定义工具栏按钮
- chart.js - 更新通过 ajax 通过 PHP、MySQL 调用的图表数据
- xml - 如何将 XML 文件转换为 WSDL 文件?
- javascript - 有没有办法在 Google Docs 的文本编辑器中以编程方式插入文本框或绘图?