Search string within a text field, PostgreSQL

Question

Search string within a text field, PostgreSQL

Navigation

#1 by (2 votes)

2

I have the following problem: I need to find an exact substring that begins and ends with numbers in a text field.

The query at the moment is:

SELECT * (...)    

FROM dominio AS Dom, ampliatoria AS Amp, (...) 

WHERE (...)
AND Amp.descripcion ILIKE '%' || Dom.dom_descripcion || '%';

The Dom.dom_descripcion field has strings that have the form {números + "-" + números} or {números + "-" + números + "/" + números} . For example, some cases would be: "1-1234" or "5-678 / 9", etc.

The Amp.descripcion field has strings with text that include any of the previous forms. It may contain text before or after the numbers.

The problem is that when I run the query there are erroneous matches of type "1-123" with "11-1234", "4-789 / 1" with "14-789 / 11", etc ...

I would need to know some way to find if the field Amp.descripcion contains exactly the string that brings the field Dom.dom_descripcion .

That is, if for example the field description ="blah blah 1-1234 and 2-345 / 6 blah blah", it should match some field "dom_descripcion"="1-1234" or "2-345 / 6".

Use PostgreSQL 9.0 with pgAdmin III. I can not change the version or manager used.

Thanks for your time!

postgresql busquedas

asked by Bhelic 18.12.2018 в 13:54

source

1 answer

Multiplication hours x number Determine form to do other things first and then resume it and send it to CRM

score 2 · Accepted Answer

I'm not clear about the chains that qualify as accurate, so I only see two situations:

The chains are completely accurate. That is, the fields amp.descripcion and dom.dom_descripcion have exactly the same value.

The amp. description strings can contain any character before and after the desired text, except numbers. For example AA1-23B matches 1-23 but 11-231 does not match 1-23.

If you can add more validations, it would be perfect. Meanwhile, assuming only the two previous validations:

create table ampliatoria(
descripcion varchar(100) not null
);

create table dominio(
dom_descripcion varchar(100) not null
);


insert into ampliatoria values ('11-1234');
insert into ampliatoria values ('111-12344');
insert into ampliatoria values ('1-1234');
insert into ampliatoria values ('AA5-55B');
insert into ampliatoria values ('4-789/1');

insert into dominio values ('5-55');
insert into dominio values ('1-123');
insert into dominio values ('1-1234');
insert into dominio values ('4-789/1');

Using like between both fields:

select *
from ampliatoria amp, dominio dom
where  amp.descripcion like '%' || dom.dom_descripcion || '%'

We get the following correct and erroneous results:

descripcion     dom_descripcion
11-1234         1-123
11-1234         1-1234
111-12344       1-123
111-12344       1-1234
1-1234          1-123
1-1234          1-1234
AA5-55B         5-55
4-789/1         4-789/1

We will have to avoid the numbers before and after the searched string. In postgresql we can not use character lists like [a-z] with LIKE but we can use them with SIMILAR . We define a list of the numbers as [0-9] and use the character ^ to indicate that it does not include any character from the list, like this [^ 0-9]. We indicate that it is not found before or after the search string:

select *
from ampliatoria amp, dominio dom
where  amp.descripcion similar to '%[^0-9]' || dom.dom_descripcion || '[^0-9]%'

This gives us the following result:

descripcion     dom_descripcion
AA5-55B         5-55

The result is correct, since 5-55 is contained within AA5-55B. However, data like 4-789 / 1 does not meet the condition, since the first block% [^ 0-9] avoids the 4 which is the first character of the string. Then, taking into account the first condition, we look for the chains that are exact with a = and we use both conditions:

select *
from ampliatoria amp, dominio dom
where ( amp.descripcion = dom.dom_descripcion
  or amp.descripcion similar to '%[^0-9]' || dom.dom_descripcion || '[^0-9]%')

This gives us the following results:

descripcion     dom_descripcion
1-1234          1-1234
AA5-55B         5-55
4-789/1         4-789/1

With this we obtain only the chains that qualify. The code is in fiddle for PostgreSQL 9.6