首页 > 解决方案 > ElasticSearch Scroll API 未超过 10000 限制

问题描述

我正在使用 Scroll API 从我们的 Elastic Search 中获取超过 10,000 个文档,但是,每当我尝试查询超过 10k 的代码时,我都会收到以下错误:

Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]

这是我的代码:

        try {
        // 1. Build Search Request
        final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
        SearchRequest searchRequest = new SearchRequest(eventId);
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.query(queryBuilder);
        searchSourceBuilder.size(limit);

        searchSourceBuilder.profile(true); // used to profile the execution of queries and aggregations for a specific search

        searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // optional parameter that controls how long the search is allowed to take

        if(CollectionUtils.isNotEmpty(sortBy)){
            for (int i = 0; i < sortBy.size(); i++) {
                String sortByField = sortBy.get(i);
                String orderByField = orderBy.get(i < orderBy.size() ? i : orderBy.size() - 1);
                SortOrder sortOrder = (orderByField != null && orderByField.trim().equalsIgnoreCase("asc")) ? SortOrder.ASC : SortOrder.DESC;
                if(keywordFields.contains(sortByField)) {
                    sortByField = sortByField + ".keyword";
                } else if(rawFields.contains(sortByField)) {
                    sortByField = sortByField + ".raw";
                }
                searchSourceBuilder.sort(new FieldSortBuilder(sortByField).order(sortOrder));
            }
        }
        searchSourceBuilder.sort(new FieldSortBuilder("_id").order(SortOrder.ASC));

        if (includes != null) {
            String[] excludes = {""};
            searchSourceBuilder.fetchSource(includes, excludes);
        }

        if (CollectionUtils.isNotEmpty(aggregations)) {
            aggregations.forEach(searchSourceBuilder::aggregation);
        }

        searchRequest.scroll(scroll);
        searchRequest.source(searchSourceBuilder);

        SearchResponse resp = null;
        try {
            resp = client.search(searchRequest, RequestOptions.DEFAULT);
            String scrollId = resp.getScrollId();
            SearchHit[] searchHits = resp.getHits().getHits();

            // Pagination - will continue to call ES until there are no more pages
            while(searchHits != null && searchHits.length > 0){
                SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
                scrollRequest.scroll(scroll);
                resp = client.scroll(scrollRequest, RequestOptions.DEFAULT);
                scrollId = resp.getScrollId();
                searchHits = resp.getHits().getHits();
            }

            // Clear scroll request to release the search context
            ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
            clearScrollRequest.addScrollId(scrollId);
            client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);

        } catch (Exception e) {
            String msg = "Could not get search result. Exception=" + ExceptionUtilsEx.getExceptionInformation(e);
            
            throw new Exception(msg);
        

我正在通过此链接实施解决方案:https ://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-search-scroll.html

谁能告诉我我做错了什么以及我需要做什么才能使用滚动 api 超过 10,000?

标签: elasticsearch

解决方案


如果您的迭代时间超过 5 分钟,那么您需要调整滚动时间。更改此行以确保滚动上下文不会在 1 分钟后消失

final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(10L));

并删除这个:

searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // optional parameter that controls how long the search is allowed to take

推荐阅读