elasticsearch - Query with multi_match is getting not expected order -


i need find phrase in document , need in title , content. title more important content, expect following result:

  • get first documents have match both in title , content
  • get documents have match in title
  • get documents have match in content

it seems quite basic stuff.

so i've created index , data this:

put /test_index  put /test_index/article/3263 {   "id": 3263,   "pagetitle": "lösungen",   "searchable_content": "abc" }   put /test_index/article/1005 {   "id": 1005,   "pagetitle": "lösungen",   "searchable_content": "test! lösungen test?" }  put /test_index/article/677 {   "id": 677,   "pagetitle": "lösungen",   "searchable_content": "test lösungen test!" }  put /test_index/article/666 {   "id": 666,   "pagetitle": "abc",   "searchable_content": "test lösungen test abc" } 

and run query this:

get /test_index/_search {     "query": {         "bool": {             "must": [{                     "multi_match": {                         "query": "lösungen",                         "fields": ["pagetitle^2", "searchable_content"]                     }                 }             ]         }     },     "highlight": {         "fields": {             "pagetitle": {},             "searchable_content": {}         }     } } 

but result not expect. document have match in title before documents have match in both title , content this:

{   "took": 1,   "timed_out": false,   "_shards": {     "total": 5,     "successful": 5,     "failed": 0   },   "hits": {     "total": 4,     "max_score": 0.5753642,     "hits": [       {         "_index": "test_index",         "_type": "article",         "_id": "3263",         "_score": 0.5753642,         "_source": {           "id": 3263,           "pagetitle": "lösungen",           "searchable_content": "abc"         },         "highlight": {           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "1005",         "_score": 0.36464313,         "_source": {           "id": 1005,           "pagetitle": "lösungen",           "searchable_content": "test! lösungen test?"         },         "highlight": {           "searchable_content": [             "test! <em>lösungen</em> test?"           ],           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "677",         "_score": 0.36464313,         "_source": {           "id": 677,           "pagetitle": "lösungen",           "searchable_content": "test lösungen test!"         },         "highlight": {           "searchable_content": [             "test <em>lösungen</em> test!"           ],           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "666",         "_score": 0.2876821,         "_source": {           "id": 666,           "pagetitle": "abc",           "searchable_content": "test lösungen test abc"         },         "highlight": {           "searchable_content": [             "test <em>lösungen</em> test abc"           ]         }       }     ]   } } 

what trying manipulating more fields boosting. seems in above case worked setting boost both fields , using most_fields type this:

get /test_index/_search {     "query": {         "bool": {             "must": [{                     "multi_match": {                         "query": "lösungen",                         "fields": ["pagetitle^3", "searchable_content^2"],                         "type": "most_fields"                                            }                 }             ]         }     },     "highlight": {         "fields": {             "pagetitle": {},             "searchable_content": {}         }     } } 

and gave expected result set of data.

however if add 2 records:

put /test_index/article/999 {   "id": 999,   "pagetitle": "abc",   "searchable_content": "test lösungen test abc double match lösungen" }   put /test_index/article/1006 {   "id": 1006,   "pagetitle": "lösungen , lösungen",   "searchable_content": "test sample" } 

it won't work more because results now:

{   "took": 1,   "timed_out": false,   "_shards": {     "total": 5,     "successful": 5,     "failed": 0   },   "hits": {     "total": 6,     "max_score": 2.2315955,     "hits": [       {         "_index": "test_index",         "_type": "article",         "_id": "1006",         "_score": 2.2315955,         "_source": {           "id": 1006,           "pagetitle": "lösungen , lösungen",           "searchable_content": "test sample"         },         "highlight": {           "pagetitle": [             "<em>lösungen</em> , <em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "666",         "_score": 1.219939,         "_source": {           "id": 666,           "pagetitle": "abc",           "searchable_content": "test lösungen test abc"         },         "highlight": {           "searchable_content": [             "test <em>lösungen</em> test abc"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "1005",         "_score": 0.86785066,         "_source": {           "id": 1005,           "pagetitle": "lösungen",           "searchable_content": "test! lösungen test?"         },         "highlight": {           "searchable_content": [             "test! <em>lösungen</em> test?"           ],           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "677",         "_score": 0.86785066,         "_source": {           "id": 677,           "pagetitle": "lösungen",           "searchable_content": "test lösungen test!"         },         "highlight": {           "searchable_content": [             "test <em>lösungen</em> test!"           ],           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "3263",         "_score": 0.8630463,         "_source": {           "id": 3263,           "pagetitle": "lösungen",           "searchable_content": "abc"         },         "highlight": {           "pagetitle": [             "<em>lösungen</em>"           ]         }       },       {         "_index": "test_index",         "_type": "article",         "_id": "999",         "_score": 0.7876096,         "_source": {           "id": 999,           "pagetitle": "abc",           "searchable_content": "test lösungen test abc double match lösungen"         },         "highlight": {           "searchable_content": [             "test <em>lösungen</em> test abc double match <em>lösungen</em>"           ]         }       }     ]   } } 

so see text match in content got higher text match in title , content.

could please give me explanation i'm doing wrong here , how fixed?

try constant score so:

get test_index/_search {   "query": {     "bool": {       "should": [         {           "constant_score": {             "query": {               "match": {                 "pagetitle": {                   "query": "lösungen"                 }               }             },             "boost": 2           }         },         {           "constant_score": {             "query": {               "match": {                 "searchable_content": "lösungen"               }             }           }         }       ]     }   },   "highlight": {     "fields": {       "pagetitle": {},       "searchable_content": {}     }   } } 

constant score, according docs: "...wraps query , returns constant score equal query boost every document in filter." ref
@davide's link understand why match on searchable_content turn higher score document. since want ignore term frequencies , idfs across fields, can use constant score on each field's match.

edit:

according rules listed in original question, above query works perfectly. based on comments op, need rank results on basis of frequency of occurrence of searched term too. apparently, term frequency , inverse document frequency important, perhaps don't care field length here (if want rank results on number of occurrences). in case, i'd advise set index so:

post test_index_v1 {   "mappings": {       "article": {         "properties": {           "id": {             "type": "long"           },           "pagetitle": {             "type": "string",             "norms": {               "enabled": false             }           },           "searchable_content": {             "type": "string",             "norms": {               "enabled": false             }           }         }       }    } } 

note: type: string replaced type: text in version 5 , above.

the link mentioned @davide describes functioning of disabling norms.

secondly, running query on small number of documents, , assuming have more 1 shard assigned index, better run query search_type=dfs_query_then_fetch local idfs per shard vary lot. (read this)

thirdly, adding last query, want factor in weight of tf-idf. last query ranking documents same, 2 or 3 occurrences of search term in same field. can add bool-should block add score constant-score blocks, so:

get test_index_v1/_search?search_type=dfs_query_then_fetch {   "query": {     "bool": {       "should": [         {           "constant_score": {             "query": {               "match": {                 "pagetitle": {                   "query": "lösungen"                 }               }             },             "boost": 2           }         },         {           "constant_score": {             "query": {               "match": {                 "searchable_content": "lösungen"               }             }           }         },         {           "bool": {             "should": [               {                 "match": {                   "pagetitle": {                     "query": "lösungen",                     "boost": 2                   }                 }               },               {                 "match": {                   "searchable_content": "lösungen"                 }               }             ]           }         }       ]     }   },   "highlight": {     "fields": {       "pagetitle": {},       "searchable_content": {}     }   } } 

Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -