Issue
I am planning to fetch all the rows in an elastic search index, and then store the rows as a CSV file. However, most methods that I have tried, ended up giving me size limit errors.
curl -k -u username:password -XGET "https://xx.xx.xx.xx:xxxx/foo-index/_search?scroll=10m"
-H 'Content-Type: application/json'
-d'{ "from": 0, "size": 933963, "query" : { "match_all" : {} }, "track_total_hits": true, "_source": ["foo_id"]}'
The error displayed is:
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Batch size is too large, size must be less than or equal to: [10000] but was [933963]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"foo-index","node":"k0OUtLDFRye4gIXGKCKLmQ","reason":{"type":"illegal_argument_exception","reason":"Batch size is too large, size must be less than or equal to: [10000] but was [933963]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."}}]
The thing is that there is no way I can reduce the size because I need to get the whole content inside the index.
Solution
You are getting exception because Elasticsearch have a limit for size 10k.
You can use the search after api to get all the documents where you will set size:10000
so it will be multiple call (each call will get 10k documents) for getting all the data from your index.
To use PIT, you need to first generate PTI ID using below command:
POST /my-index-000001/_pit?keep_alive=1m
The API returns a PIT ID.
{
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
}
You can use PIT ID as show in below Query:
GET /_search
{
"size": 10000,
"query": {
"match" : {
"user.id" : "elkbee"
}
},
"pit": {
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
},
"sort": [
{"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type" : "date_nanos" }}
]
}
Above query will return you first 10k documents and in response you will get new PIT ID which you can pass to same query for getting next set of 10k documents.
{
"pit_id" : "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"took" : 17,
"timed_out" : false,
"_shards" : ...,
"hits" : {
"total" : ...,
"max_score" : null,
"hits" : [
...
{
"_index" : "my-index-000001",
"_id" : "FaslK3QBySSL_rrj9zM5",
"_score" : null,
"_source" : ...,
"sort" : [
"2021-05-20T05:30:04.832Z",
4294967298
]
}
]
}
}
PS: No solution will give you more then 10k documents in single API call. you need to use either search_after or scroll API.
Answered By - Sagar Patel Answer Checked By - Marie Seifert (WPSolving Admin)