TokenCountPayloadFilter : Différence entre versions

De JDONREF Wiki
 
Ligne 1 : Ligne 1 :
 
Include in integer payloads the count of tokens with the same payload within the same field.
 
Include in integer payloads the count of tokens with the same payload within the same field.
  +
  +
===== Sample =====
   
 
For example, the document :
 
For example, the document :
Ligne 38 : Ligne 40 :
   
 
These factored tokens can be used with the checker All from [[PayloadCheckerSpanQuery]].
 
These factored tokens can be used with the checker All from [[PayloadCheckerSpanQuery]].
  +
  +
===== Features =====
  +
  +
{| border="1"
  +
| '''Setting'''
  +
| '''description'''
  +
|-
  +
| factor
  +
| (Mandatory) The factor by which the count of tokens with a given payload will be multiplied.
  +
|-
  +
| ignored_types
  +
| (none) The token's payload associated with these types won't be modified. Others will.
  +
|}

Version actuelle en date du 3 mai 2015 à 01:30

Include in integer payloads the count of tokens with the same payload within the same field.

Sample

For example, the document :

 { "fullName": "BOULEVARD|1 DE|1 PARIS|1 L|2 HOPITAL|2" }

indexed with a mapping like :

 "fullName" : {"type": "string", "term_vector" : "with_positions_offsets_payloads", "index_analyzer":"myAnalyzer"}

and settings like :

 {
   "index" : {
       "analysis" : {
           "analyzer": {
               "myAnalyzer" : {
                   "type" : "custom",
                   "tokenizer" : "whitespace",
                   "filter" : ["delimited_payload_filter", "lowercase", "tokencount_payload_filter"]
               },
           "filter" : {
               "delimited_payload_filter" : {
                 "type": "delimited_payload_filter",
                 "delimiter" : "|",
                 "encoding" : "int"
               },
               "tokencount_payload_filter" : {
                 "type": "tokencountpayloads",
                 "factor": 1000
               }
       }
 }

will index the tokens BOULEVARD, DE, PARIS, L, HOPITAL with the respective payloads : 3001, 3001, 3001, 2002, 2002.

  • 3001 means there is 3 tokens with payload 1 ( 3*factor +1 ).
  • 2002 means there is 2 tokens with payload 2 ( 2*factor +2 ).

These factored tokens can be used with the checker All from PayloadCheckerSpanQuery.

Features
Setting description
factor (Mandatory) The factor by which the count of tokens with a given payload will be multiplied.
ignored_types (none) The token's payload associated with these types won't be modified. Others will.