elasticsearch - ElasticSearch Scroll API 未超过 10000 限制
问题描述
我正在使用 Scroll API 从我们的 Elastic Search 中获取超过 10,000 个文档,但是,每当我尝试查询超过 10k 的代码时,我都会收到以下错误:
Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
这是我的代码:
try {
// 1. Build Search Request
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest(eventId);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(queryBuilder);
searchSourceBuilder.size(limit);
searchSourceBuilder.profile(true); // used to profile the execution of queries and aggregations for a specific search
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // optional parameter that controls how long the search is allowed to take
if(CollectionUtils.isNotEmpty(sortBy)){
for (int i = 0; i < sortBy.size(); i++) {
String sortByField = sortBy.get(i);
String orderByField = orderBy.get(i < orderBy.size() ? i : orderBy.size() - 1);
SortOrder sortOrder = (orderByField != null && orderByField.trim().equalsIgnoreCase("asc")) ? SortOrder.ASC : SortOrder.DESC;
if(keywordFields.contains(sortByField)) {
sortByField = sortByField + ".keyword";
} else if(rawFields.contains(sortByField)) {
sortByField = sortByField + ".raw";
}
searchSourceBuilder.sort(new FieldSortBuilder(sortByField).order(sortOrder));
}
}
searchSourceBuilder.sort(new FieldSortBuilder("_id").order(SortOrder.ASC));
if (includes != null) {
String[] excludes = {""};
searchSourceBuilder.fetchSource(includes, excludes);
}
if (CollectionUtils.isNotEmpty(aggregations)) {
aggregations.forEach(searchSourceBuilder::aggregation);
}
searchRequest.scroll(scroll);
searchRequest.source(searchSourceBuilder);
SearchResponse resp = null;
try {
resp = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = resp.getScrollId();
SearchHit[] searchHits = resp.getHits().getHits();
// Pagination - will continue to call ES until there are no more pages
while(searchHits != null && searchHits.length > 0){
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
resp = client.scroll(scrollRequest, RequestOptions.DEFAULT);
scrollId = resp.getScrollId();
searchHits = resp.getHits().getHits();
}
// Clear scroll request to release the search context
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
} catch (Exception e) {
String msg = "Could not get search result. Exception=" + ExceptionUtilsEx.getExceptionInformation(e);
throw new Exception(msg);
我正在通过此链接实施解决方案:https ://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-search-scroll.html
谁能告诉我我做错了什么以及我需要做什么才能使用滚动 api 超过 10,000?
解决方案
如果您的迭代时间超过 5 分钟,那么您需要调整滚动时间。更改此行以确保滚动上下文不会在 1 分钟后消失
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(10L));
并删除这个:
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // optional parameter that controls how long the search is allowed to take
推荐阅读
- ios - 如何混淆 Cordova 混合应用程序的 Objective C 代码?
- json - Power BI 自定义视觉对象 - 强制有向图 API 错误
- c# - 将二进制数据保存到表中时,我得到 NULL
- javascript - 如果字段有任何数据,应该看到添加按钮?onkeyup 功能不会每次都触发。我的目的是创建一个待办事项列表
- c# - AspNetCore Web Api 微软账户认证
- continuous-integration - Gitlab runner 使用共享运行器在 master 分支中运行我的 ci 部署
- algorithm - 如何计算算法的时间复杂度?
- javascript - 一页中具有不同参数的多个数据表
- button - 如何将自定义按钮添加到 Gmail
- google-compute-engine - 在 GCE 实例上拒绝与 RocketChat 的连接