I'm not clear about the chains that qualify as accurate, so I only see two situations:
The chains are completely accurate. That is, the fields amp.descripcion
and dom.dom_descripcion
have exactly the same value.
The amp. description strings can contain any character before and after the desired text, except numbers. For example AA1-23B matches 1-23 but 11-231 does not match 1-23.
If you can add more validations, it would be perfect. Meanwhile, assuming only the two previous validations:
create table ampliatoria(
descripcion varchar(100) not null
);
create table dominio(
dom_descripcion varchar(100) not null
);
insert into ampliatoria values ('11-1234');
insert into ampliatoria values ('111-12344');
insert into ampliatoria values ('1-1234');
insert into ampliatoria values ('AA5-55B');
insert into ampliatoria values ('4-789/1');
insert into dominio values ('5-55');
insert into dominio values ('1-123');
insert into dominio values ('1-1234');
insert into dominio values ('4-789/1');
Using like between both fields:
select *
from ampliatoria amp, dominio dom
where amp.descripcion like '%' || dom.dom_descripcion || '%'
We get the following correct and erroneous results:
descripcion dom_descripcion
11-1234 1-123
11-1234 1-1234
111-12344 1-123
111-12344 1-1234
1-1234 1-123
1-1234 1-1234
AA5-55B 5-55
4-789/1 4-789/1
We will have to avoid the numbers before and after the searched string. In postgresql we can not use character lists like [a-z] with LIKE
but we can use them with SIMILAR
. We define a list of the numbers as [0-9] and use the character ^ to indicate that it does not include any character from the list, like this [^ 0-9]. We indicate that it is not found before or after the search string:
select *
from ampliatoria amp, dominio dom
where amp.descripcion similar to '%[^0-9]' || dom.dom_descripcion || '[^0-9]%'
This gives us the following result:
descripcion dom_descripcion
AA5-55B 5-55
The result is correct, since 5-55 is contained within AA5-55B. However, data like 4-789 / 1 does not meet the condition, since the first block% [^ 0-9] avoids the 4 which is the first character of the string. Then, taking into account the first condition, we look for the chains that are exact with a = and we use both conditions:
select *
from ampliatoria amp, dominio dom
where ( amp.descripcion = dom.dom_descripcion
or amp.descripcion similar to '%[^0-9]' || dom.dom_descripcion || '[^0-9]%')
This gives us the following results:
descripcion dom_descripcion
1-1234 1-1234
AA5-55B 5-55
4-789/1 4-789/1
With this we obtain only the chains that qualify.
The code is in fiddle for PostgreSQL 9.6