3.84

PostgreSQL Full-Text Search Guide

Image
How to use the Full Text Search with PostgreSQL to deliver accurate search results

PostgreSQL Database supports Full Text Search (FTS)

Full Text Search (or just text search, or just FTS) in databases (as in PostgreSQL) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query. PostgreSQL have a good FTS support since version 9 with many improvements in the last versions: 11 and 12. This article will show you how to configure and use the FTS features on PostgreSQL database.


An introduction to Full-Text Search

(...) full-text search refers to techniques for searching a single computer-stored document or a collection in a full text database; (...) distinguished from searches based on metadata or on parts of the original texts represented in databases (such as titles, abstracts, selected sections, or bibliographical references)

Thus, the Full-Text Search offers accurate search results. A Full-Text Search query allows you to search for words inside text data.


Full-Text Search using PostgreSQL

PostgreSQL offers a full featured Full-Text Search FTS engine since version 9.6. In later versions some improvements has been added to the FTS engine.

Full-Text Search Initialization in PostgreSQL
To start using the Full-Text Search within a PostgreSQL database it is required to initialize a FTS dictionary for each language that will be used to search. In this example english for which the english_stem_nostop (english, no stop words) dictionary available in PostgreSQL will be used. The following queries need to be run over the target database that will be used for searches.

-- initialize the FTS english_stem_nostop within the target database
CREATE TEXT SEARCH DICTIONARY public.english_stem_nostop(Template = snowball, Language = english);

-- create the FTS search configuration within the target database
CREATE TEXT SEARCH CONFIGURATION public.english_nostop ( COPY = pg_catalog.english );

-- add some basic parsers to the FTS search configuration within the target database
ALTER TEXT SEARCH CONFIGURATION public.english_nostop ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, hword, hword_part, word WITH english_stem_nostop;


Full-Text Search Setup example, for table my_table
Once the above initialization has been completed it is the time to create a FTS dedicated field for searches. It can be done without, by only using an index, but in this example we will create a separate special FTS field in order to be able to visualize the FTS data.

-- 1. the original table (before adding FTS) will be modified as in example below
CREATE TABLE my_table (
    id character varying(10) NOT NULL,
    description text NOT NULL,
    CONSTRAINT my_table__chk__id CHECK ((char_length((id)::text) = 10))
);


-- 2. we will modify the table from above by creating an extra FTS index for the description field to be used on FTS searches
CREATE INDEX my_table__idx__fts_field ON my_table USING gin (to_tsvector('public.english_nostop'::regconfig, description));


In the variant 1. we can only do approximative searches using LIKE or ILIKE over the text which cannot be sure it covers full words or even native english words only.

-- approximative search (case sensitive)
SELECT * FROM my_table WHERE description LIKE '%example%';

-- approximative search (case insensitive)
SELECT * FROM my_table WHERE description ILIKE '%example%';

In the variant 2. we can use the FTS (Full-Text Search) way of searching results to deliver realistic matches and accurate search results ; because the above setup was using the No-Stop dictionary the stop words will be ignored.

-- basic: searching the word 'example'
-- will deliver better results than LIKE or ILIKE
SELECT * FROM my_table WHERE to_tsvector('english_nostop', "description") @@ plainto_tsquery('english_nostop', 'example');

-- advanced: searching the word 'example' and return first the results that have higher matching scores
-- will deliver even better results than LIKE or ILIKE or FTS above
SELECT * FROM my_table WHERE to_tsvector('english_nostop', "description") @@ plainto_tsquery('english_nostop', 'example')
ORDER BY ts_rank_cd(to_tsvector('english_nostop', "description"), plainto_tsquery('english_nostop', 'example')) DESC;


Conclusions

  • FTS Full-Text Search delivers faster and accurate results comparing with LIKE or ILIKE because each valid english word available in the target field is already indexed.
  • The to_tsvector method is used togehter with plainto_tsquery as in the above examples and sometimes with ts_rank_cd to deliver the optimized results. Instead plainto_tsquery which takes the text as is, plain, the to_tsquery or phraseto_tsquery can be used which are not explained here (see the PostgreSQL documentation).
  • The stop words such as and, the (and similar) will be ignored when searching phrases using FTS
  • Using FTS is the recommended way to do searches whenever is possible
  • There are also limitations for using FTS engine:
    • not all languages have FTS dictionaries available with PostgreSQL ...
    • number of lexemes must be less than 264 ; no more than 256 positions per lexeme ; position values in tsvector must be greater than 0 and no more than 16383 and some others