Query in Disorder PosgreSQL

0

How could I do a SQL query by inserting a string with words in any order? For example find a row that has in some column the following String: "my dog is called spike" inserting in a like something like "dog my llama spike"

Let's see the following example:

create table demo(
  texto varchar(255)
);

insert into demo(texto) values ('mi perro se llama spike');
insert into demo(texto) values ('mi perro se llamarada spike');
insert into demo(texto) values ('mi perro dinamita');
insert into demo(texto) values ('mi perro se llama spike y mi gato felix');
insert into demo(texto) values ('perro mi llama se spike');

The idea would be: to be able to search for a series of words (all should appear), within the text column, so that if we search for perro mi llama se spike , the result should be:

texto
--------------------------------------
mi perro se llama spike
perro mi llama se spike
mi perro se llama spike y mi gato felix
    
asked by Irvin Lara 27.02.2018 в 23:41
source

1 answer

1

I do not know PostgreSql in depth, so I had to do some research. Conceptually the problem is that we have a list of words and we want to find all these in a certain column of a table, all regardless of the order. First we put together a small example:

create table demo(
  texto varchar(255)
);

insert into demo(texto) values ('mi perro se llama spike');
insert into demo(texto) values ('mi perro se llamarada spike');
insert into demo(texto) values ('mi perro dinamita');
insert into demo(texto) values ('mi perro se llama spike y mi gato felix');
insert into demo(texto) values ('perro mi llama se spike');

Expanding the words to search in a table

select texto       
       from demo d
       left join unnest(regexp_split_to_array('perro mi llama se spike', '\s')) p
            on position(' ' || p.p || ' ' in ' ' || d.texto || ' ') > 0
       group by d.texto
       having count(1) >= array_length(regexp_split_to_array('perro mi llama se spike', '\s'),1)
;

With regexp_split_to_array() and the pattern \s (any space in white) we make a split of the phrase to look for and generate an array, which by unnest() we transform into a table, we use it in a left join with the table where we will find the occurrences, the idea is to return the cases where each word matches and finally we make a group by bringing those cases where the number of matches is equal to the number of words to search.

Note 1 : to avoid false positives, for example that llama matches llamarada , we add spaces to the right and left of the column to search and always verify the words with spaces: position(' ' || p.p || ' ' in ' ' || d.texto || ' ') > 0

Note 2 : if we do not care for uppercase and lowercase, we can normalize both chains, for example: position(' ' || lower(p.p) || ' ' in ' ' || lower(d.texto) || ' ') > 0

    
answered by 28.02.2018 / 03:30
source