To have all documents mentioning an exact phrase, you can use double quotes.
Example: “Alicia Martinez’s bank account in Portugal”
To have all documents mentioning all or one of the queried terms, you can use a simple space between your queries or 'OR'. You need to write 'OR' with all letters uppercase.
Example: Alicia Martinez
Same search: Alicia OR Martinez
To have all documents mentioning all the queried terms, you can use 'AND' between your queried words. You need to write 'AND' with all letters uppercase.
Example: Alicia AND Martinez
To have all documents NOT mentioning some queried terms, you can use 'NOT' before each word you don't want. You need to write 'NOT' with all letters uppercase.
Example: NOT Martinez
Same search: !Martinez
Parentheses should be used whenever multiple operators are used together and you want to give priority to some.
Example: ((Alicia AND Martinez) OR (Delaware AND Pekin) OR Grey) AND NOT parking lot
If you search faithf?l, the search engine will look for all words with all possible single character between the second f and the l in this word. It also works with * to replace multiple characters.
Example: Alicia Martin?z
Example: Alicia Mar*z
You can set fuzziness to 1 or 2. It corresponds to the maximum number of operations (insertions, deletions, substitutions and transpositions) on characters needed to make one term match the other.
kitten -> sitten (1 substitution (k turned into s) = fuzziness is 1)
kitten -> sittin (2 substitutions (k turned into s and e turned into i) = fuzziness is 2)
If you search for similar terms (to catch typos for example), you can use fuzziness. Use the tilde symbol at the end of the word to set the fuzziness to 1 or 2.
"The default edit distance is 2, but an edit distance of 1 should be sufficient to catch 80% of all human misspellings. It can be specified as: quikc~1" (source: Elastic).
Example: quikc~ brwn~ foks~ (as the default edit distance is 2, this query will catch all quick, quack, quock, uqikc, etc. as well as brown, folks, etc.)
Example: Datashare~1 (this query will catch Datasahre, Dqtashare, etc.)
When you type an exact phrase (in double quotes) and use fuzziness, then the meaning of the fuzziness changes. Now, the fuzziness means the maximum number of operations (insertions, deletions, substitutions and transpositions) on terms needed to make one phrase match the other.
“the cat is blue” -> “the small cat is blue” (1 insertion = fuzziness is 1)
“the cat is blue” -> “the small is cat blue” (1 insertion + 2 transpositions = fuzziness is 3)
"While a phrase query (eg "john smith") expects all of the terms in exactly the same order, a proximity query allows the specified words to be further apart or in a different order. A proximity search allows us to specify a maximum edit distance of words in a phrase." (source: Elastic).
Example: "fox quick"~5 (this query will catch "quick brown fox", "quick brown car thin fox" or even "quick brown car thin blue tree fox"
The closer the text in a field is to the original order specified in the query string, the more relevant that document is considered to be. When compared to the above example query, the phrase
"quick fox" would be considered more relevant than
"quick brown fox"(source: Elastic).
Use the boost operator
^ to make one term more relevant than another. For instance, if we want to find all documents about foxes, but we are especially interested in quick foxes:
Example: quick^2 fox
The default boost value is 1, but can be any positive floating point number. Boosts between 0 and 1 reduce relevance. Boosts can also be applied to phrases or to groups:
Example: "john smith"^2 (foo bar)^4
If you are looking for documents that:
contains term1, term2 and term3
and were created after 2010
you can use the 'Date' filter or type in the search bar:
term1 AND term2 AND term3 AND metadata.tika_metadata_creation_date:>=2010-01-01
'metadata.tika_metadata_creation_date:' means that we filter with creation date
'>="'means 'since January 1st included'
'2010-01-01' stands for January 2010 and the search will include January 2010
For other searches:
'<' will mean 'strictly after (with January 1st excluded)'
nothing will mean 'at this exact date'
You can use other metadata fields using the following model: metadata.tika_metadata_[and copying the field's name here].
To find the list of existing metadata fields, go to a document's 'Tags and details' tab, click 'Show more details' and refer to this list: