Home [Development] ElasticSearch - UpdateByQuery
Post
Cancel

[Development] ElasticSearch - UpdateByQuery

Update VS Reindex

  • An update is a reindex of the original document, then marking that original as deleted, then having to merge it out of the segement. A reindex adds the original document to a new segment in a new index (usually), and leaves the original.

https://discuss.elastic.co/t/update-vs-reindex/269929

UpdateByQuery

  • Sync
  • bulk

UpdateByQueryAsync

  • Asynchronous execution
    • Executing a UpdateByQueryRequest can also be done in an asynchronous fashion so that the client can return directly. Users need to specify how the response or potential failures will be handled by passing the request and a listener to the asynchronous update-by-query method:

UpdateByQuery

boolQuery().filter() vs termsQuery()

  • the only difference is “caching”
  • filter is not cached by default
  • the succeeding calls will be faster since the first call will cache the result of the above filter.

  • Filter query works much much faster as chunks with just terms query. But making really big filter can slower getting the result a lot. In my case, using filter query with chunks of 10 000 ids is 10 times faster, than using filter query with all 100 000 ids at once (btw, this number is already restricted in Elasticsearch 6).
  • Generally, filters are executed in a “non-scoring” mode which gives them two main performance advantages. Firstly, they can omit the actual scoring of the document. Scoring a doc is relatively quick (the summation of a bunch of multiplications), but even 1ns for a billion documents is 1 second of computation.
  • Secondly, non-scoring filters can be cached, meaning subsequent executions of the filter clause can leverage the cache instead of hitting the various data-structures.
  • Where possible, we encourage people to convert queries => filters for better performance (assuming you don’t need the scoring aspect)
  • for clarity and simplicity, we will use the term “filter” to mean a query which is used in a non-scoring, filtering context. You can think of the terms “filter”, “filtering query” and “non-scoring query” as being identical.
  • Similarly, if the term “query” is used in isolation without a qualifier, we are referring to a “scoring query”.

filter vs match

_search API

if array

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
GET /product/_search
{
  "query": {
      "nested" : {
        "path": "flags",
        "query": {
          "term": {
            "flags.flagCode": {
              "value": "dd"
            }
          }
        }
      }
    }
}

if array

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
GET /product/_search
{
  "query": {
      "nested" : {
        "path": "flags",
        "query": {
          "term": {
            "flags.flagCode": {
              "value": "dd"
            }
          }
        }
      }
    }
}
This post is licensed under CC BY 4.0 by the author.

[Development] StopWatch

[Development] Redis

Comments powered by Disqus.