elasticsearch - Query with multi_match is getting not expected order -

i need find phrase in document , need in title , content. title more important content, expect following result:

  • get first documents have match both in title , content
  • get documents have match in title
  • get documents have match in content

it seems quite basic stuff.

so i've created index , data this:

put /test_index  put /test_index/article/3263 {   "id": 3263,   "pagetitle": "lösungen",   "searchable_content": "abc" }   put /test_index/article/1005 {   "id": 1005,   "pagetitle": "lösungen",   "searchable_content": "test! lösungen test?" }  put /test_index/article/677 {   "id": 677,   "pagetitle": "lösungen",   "searchable_content": "test lösungen test!" }  put /test_index/article/666 {   "id": 666,   "pagetitle": "abc",   "searchable_content": "test lösungen test abc" } 

and run query this:

get /test_index/_search {     "query": {         "bool": {             "must": [{                     "multi_match": {                         "query": "lösungen",                         "fields": ["pagetitle^2", "searchable_content"]                     }                 }             ]         }     },     "highlight": {         "fields": {             "pagetitle": {},             "searchable_content": {}         }     } } 

but result not expect. document have match in title before documents have match in both title , content this:

{   "took": 1,   "timed_out": false,   "_shards": {     "total": 5,     "successful": 5,     "failed": 0   },   "hits": {     "total": 4,     "max_score": 0.5753642,     "hits": [       {         "_index": "test_index",         "_type": "article",         "_id": "3263",         "_score": 0.5753642,         "_source": {           "id": 3263,           "pagetitle": "lösungen",           "searchable_content": "abc"         },         "highlight": {           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "1005",         "_score": 0.36464313,         "_source": {           "id": 1005,           "pagetitle": "lösungen",           "searchable_content": "test! lösungen test?"         },         "highlight": {           "searchable_content": [             "test! <em>lösungen</em> test?"           ],           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "677",         "_score": 0.36464313,         "_source": {           "id": 677,           "pagetitle": "lösungen",           "searchable_content": "test lösungen test!"         },         "highlight": {           "searchable_content": [             "test <em>lösungen</em> test!"           ],           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "666",         "_score": 0.2876821,         "_source": {           "id": 666,           "pagetitle": "abc",           "searchable_content": "test lösungen test abc"         },         "highlight": {           "searchable_content": [             "test <em>lösungen</em> test abc"           ]         }       }     ]   } } 

what trying manipulating more fields boosting. seems in above case worked setting boost both fields , using most_fields type this:

get /test_index/_search {     "query": {         "bool": {             "must": [{                     "multi_match": {                         "query": "lösungen",                         "fields": ["pagetitle^3", "searchable_content^2"],                         "type": "most_fields"                                            }                 }             ]         }     },     "highlight": {         "fields": {             "pagetitle": {},             "searchable_content": {}         }     } } 

and gave expected result set of data.

however if add 2 records:

put /test_index/article/999 {   "id": 999,   "pagetitle": "abc",   "searchable_content": "test lösungen test abc double match lösungen" }   put /test_index/article/1006 {   "id": 1006,   "pagetitle": "lösungen , lösungen",   "searchable_content": "test sample" } 

it won't work more because results now:

{   "took": 1,   "timed_out": false,   "_shards": {     "total": 5,     "successful": 5,     "failed": 0   },   "hits": {     "total": 6,     "max_score": 2.2315955,     "hits": [       {         "_index": "test_index",         "_type": "article",         "_id": "1006",         "_score": 2.2315955,         "_source": {           "id": 1006,           "pagetitle": "lösungen , lösungen",           "searchable_content": "test sample"         },         "highlight": {           "pagetitle": [             "<em>lösungen</em> , <em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "666",         "_score": 1.219939,         "_source": {           "id": 666,           "pagetitle": "abc",           "searchable_content": "test lösungen test abc"         },         "highlight": {           "searchable_content": [             "test <em>lösungen</em> test abc"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "1005",         "_score": 0.86785066,         "_source": {           "id": 1005,           "pagetitle": "lösungen",           "searchable_content": "test! lösungen test?"         },         "highlight": {           "searchable_content": [             "test! <em>lösungen</em> test?"           ],           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "677",         "_score": 0.86785066,         "_source": {           "id": 677,           "pagetitle": "lösungen",           "searchable_content": "test lösungen test!"         },         "highlight": {           "searchable_content": [             "test <em>lösungen</em> test!"           ],           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "3263",         "_score": 0.8630463,         "_source": {           "id": 3263,           "pagetitle": "lösungen",           "searchable_content": "abc"         },         "highlight": {           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "999",         "_score": 0.7876096,         "_source": {           "id": 999,           "pagetitle": "abc",           "searchable_content": "test lösungen test abc double match lösungen"         },         "highlight": {           "searchable_content": [             "test <em>lösungen</em> test abc double match <em>lösungen</em>"           ]         }       }     ]   } } 

so see text match in content got higher text match in title , content.

could please give me explanation i'm doing wrong here , how fixed?

try constant score so:

get test_index/_search {   "query": {     "bool": {       "should": [         {           "constant_score": {             "query": {               "match": {                 "pagetitle": {                   "query": "lösungen"                 }               }             },             "boost": 2           }         },         {           "constant_score": {             "query": {               "match": {                 "searchable_content": "lösungen"               }             }           }         }       ]     }   },   "highlight": {     "fields": {       "pagetitle": {},       "searchable_content": {}     }   } } 

constant score, according docs: "...wraps query , returns constant score equal query boost every document in filter." ref
@davide's link understand why match on searchable_content turn higher score document. since want ignore term frequencies , idfs across fields, can use constant score on each field's match.


according rules listed in original question, above query works perfectly. based on comments op, need rank results on basis of frequency of occurrence of searched term too. apparently, term frequency , inverse document frequency important, perhaps don't care field length here (if want rank results on number of occurrences). in case, i'd advise set index so:

post test_index_v1 {   "mappings": {       "article": {         "properties": {           "id": {             "type": "long"           },           "pagetitle": {             "type": "string",             "norms": {               "enabled": false             }           },           "searchable_content": {             "type": "string",             "norms": {               "enabled": false             }           }         }       }    } } 

note: type: string replaced type: text in version 5 , above.

the link mentioned @davide describes functioning of disabling norms.

secondly, running query on small number of documents, , assuming have more 1 shard assigned index, better run query search_type=dfs_query_then_fetch local idfs per shard vary lot. (read this)

thirdly, adding last query, want factor in weight of tf-idf. last query ranking documents same, 2 or 3 occurrences of search term in same field. can add bool-should block add score constant-score blocks, so:

get test_index_v1/_search?search_type=dfs_query_then_fetch {   "query": {     "bool": {       "should": [         {           "constant_score": {             "query": {               "match": {                 "pagetitle": {                   "query": "lösungen"                 }               }             },             "boost": 2           }         },         {           "constant_score": {             "query": {               "match": {                 "searchable_content": "lösungen"               }             }           }         },         {           "bool": {             "should": [               {                 "match": {                   "pagetitle": {                     "query": "lösungen",                     "boost": 2                   }                 }               },               {                 "match": {                   "searchable_content": "lösungen"                 }               }             ]           }         }       ]     }   },   "highlight": {     "fields": {       "pagetitle": {},       "searchable_content": {}     }   } } 


Popular posts from this blog

ios - MKAnnotationView layer is not of expected type: MKLayer -

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -