PostgreSQL Database supports Full Text Search (FTS)
Full Text Search (or just text search, or just FTS) in databases (as in PostgreSQL) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query. PostgreSQL have a good FTS support since version 9 with many improvements in the last versions: 11 and 12. This article will show you how to configure and use the FTS features on PostgreSQL database.
An introduction to Full-Text Search
(...) full-text search refers to techniques for searching a single computer-stored document or a collection in a full text database; (...) distinguished from searches based on metadata or on parts of the original texts represented in databases (such as titles, abstracts, selected sections, or bibliographical references)
Thus, the Full-Text Search offers accurate search results. A Full-Text Search query allows you to search for words inside text data.
Full-Text Search using PostgreSQL
PostgreSQL offers a full featured Full-Text Search
FTS
engine since version 9.6. In later versions some improvements has been added to the FTS engine.Full-Text Search Initialization in PostgreSQL
To start using the Full-Text Search within a PostgreSQL database it is required to initialize a
FTS
dictionary for each language that will be used to search. In this example english
for which the english_stem_nostop(english, no stop words) dictionary available in PostgreSQL will be used. The following queries need to be run over the target database that will be used for searches.
-- initialize the FTS english_stem_nostop within the target database
CREATE TEXT SEARCH DICTIONARY public.english_stem_nostop(Template = snowball, Language = english);
-- create the FTS search configuration within the target database
CREATE TEXT SEARCH CONFIGURATION public.english_nostop ( COPY = pg_catalog.english );
-- add some basic parsers to the FTS search configuration within the target database
ALTER TEXT SEARCH CONFIGURATION public.english_nostop ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, hword, hword_part, word WITH english_stem_nostop;
Full-Text Search Setup example, for table my_table
Once the above initialization has been completed it is the time to create a FTS dedicated field for searches. It can be done without, by only using an index, but in this example we will create a separate special FTS field in order to be able to visualize the FTS data.
-- 1. the original table (before adding FTS) will be modified as in example below
CREATE TABLE my_table (
id character varying(10) NOT NULL,
description text NOT NULL,
CONSTRAINT my_table__chk__id CHECK ((char_length((id)::text) = 10))
);
-- 2. we will modify the table from above by creating an extra FTS index for the description field to be used on FTS searches
CREATE INDEX my_table__idx__fts_field ON my_table USING gin (to_tsvector('public.english_nostop'::regconfig, description));
In the variant 1. we can only do approximative searches using
LIKE
or ILIKE
over the text which cannot be sure it covers full words or even native english words only.
-- approximative search (case sensitive)
SELECT * FROM my_table WHERE description LIKE '%example%';
-- approximative search (case insensitive)
SELECT * FROM my_table WHERE description ILIKE '%example%';
In the variant 2. we can use the
FTS
(Full-Text Search) way of searching results to deliver realistic matches and accurate search results ; because the above setup was using the No-Stop dictionary the stop words will be ignored.
-- basic: searching the word 'example'
-- will deliver better results than LIKE or ILIKE
SELECT * FROM my_table WHERE to_tsvector('english_nostop', "description") @@ plainto_tsquery('english_nostop', 'example');
-- advanced: searching the word 'example' and return first the results that have higher matching scores
-- will deliver even better results than LIKE or ILIKE or FTS above
SELECT * FROM my_table WHERE to_tsvector('english_nostop', "description") @@ plainto_tsquery('english_nostop', 'example')
ORDER BY ts_rank_cd(to_tsvector('english_nostop', "description"), plainto_tsquery('english_nostop', 'example')) DESC;
Conclusions
FTS
Full-Text Search delivers faster and accurate results comparing withLIKE
orILIKE
because each valid english word available in the target field is already indexed.- The
to_tsvector
method is used togehter withplainto_tsquery
as in the above examples and sometimes withts_rank_cd
to deliver the optimized results. Insteadplainto_tsquery
which takes the text as is, plain, theto_tsquery
orphraseto_tsquery
can be used which are not explained here (see the PostgreSQL documentation). - The stop words such as and, the (and similar) will be ignored when searching phrases using
FTS
- Using
FTS
is the recommended way to do searches whenever is possible - There are also limitations for using
FTS
engine:- not all languages have
FTS
dictionaries available with PostgreSQL ... - number of lexemes must be less than 264 ; no more than 256 positions per lexeme ; position values in tsvector must be greater than 0 and no more than 16383 and some others
- not all languages have