首页 > 解决方案 > 如何从rails中的elasticsearch中检索所有记录

问题描述

您可以从弹性搜索中获得的文档数量有上限(即 10000)。我们可以使用“滚动”来检索所有记录。有谁知道如何将其嵌入代码中?

有这个方法滚动

https://github.com/elastic/elasticsearch-ruby/blob/4608fd144277941003de71a0cdc24bd39f17a012/elasticsearch-api/lib/elasticsearch/api/actions/scroll.rb

但我不知道如何使用它。你能解释一下如何使用它吗?

我试过“扫描”。但 Elasticsearch 不再支持它。

# Open the "view" of the index
response = client.search index: 'test', search_type: 'scan', scroll: '5m', size: 10

# Call `scroll` until results are empty
while response = client.scroll(scroll_id: response['_scroll_id'], scroll: '5m') and not 
   response['hits']['hits'].empty? do
      puts response['hits']['hits'].map { |r| r['_source']['title'] }
end

标签: ruby-on-railsrubyapielasticsearchrubygems

解决方案


您的代码应该可以工作,但正如您提到的那样,scan参数 forsearch_type不是必需的。我只是用一些测试数据在本地运行它并且它有效:

# scroll.rb
require 'elasticsearch'

client = Elasticsearch::Client.new

response = client.search(index: 'articles', scroll: '10m')
scroll_id = response['_scroll_id']
while response['hits']['hits'].size.positive?
  response = client.scroll(scroll: '5m', body: { scroll_id: scroll_id })
  puts(response['hits']['hits'].map { |r| r['_source']['title'] })
end

输出:

$ ruby scroll.rb                                                                                         
Title 297                                                                                                
Title 298                                                                                                
Title 299                                                                                                
Title 300
...

您可以摆弄scroll参数的值,但是这样的东西也应该适合您。


推荐阅读