Search string within a text field, PostgreSQL

2

I have the following problem: I need to find an exact substring that begins and ends with numbers in a text field.

The query at the moment is:

SELECT * (...)    

FROM dominio AS Dom, ampliatoria AS Amp, (...) 

WHERE (...)
AND Amp.descripcion ILIKE '%' || Dom.dom_descripcion || '%';

The Dom.dom_descripcion field has strings that have the form {números + "-" + números} or {números + "-" + números + "/" + números} . For example, some cases would be: "1-1234" or "5-678 / 9", etc.

The Amp.descripcion field has strings with text that include any of the previous forms. It may contain text before or after the numbers.

The problem is that when I run the query there are erroneous matches of type "1-123" with "11-1234", "4-789 / 1" with "14-789 / 11", etc ...

I would need to know some way to find if the field Amp.descripcion contains exactly the string that brings the field Dom.dom_descripcion .

That is, if for example the field description ="blah blah 1-1234 and 2-345 / 6 blah blah", it should match some field "dom_descripcion"="1-1234" or "2-345 / 6".

Use PostgreSQL 9.0 with pgAdmin III. I can not change the version or manager used.

Thanks for your time!

    
asked by Bhelic 18.12.2018 в 14:54
source

1 answer

2

I'm not clear about the chains that qualify as accurate, so I only see two situations:

  • The chains are completely accurate. That is, the fields amp.descripcion and dom.dom_descripcion have exactly the same value.

  • The amp. description strings can contain any character before and after the desired text, except numbers. For example AA1-23B matches 1-23 but 11-231 does not match 1-23.

  • If you can add more validations, it would be perfect. Meanwhile, assuming only the two previous validations:

    create table ampliatoria(
    descripcion varchar(100) not null
    );
    
    create table dominio(
    dom_descripcion varchar(100) not null
    );
    
    
    insert into ampliatoria values ('11-1234');
    insert into ampliatoria values ('111-12344');
    insert into ampliatoria values ('1-1234');
    insert into ampliatoria values ('AA5-55B');
    insert into ampliatoria values ('4-789/1');
    
    insert into dominio values ('5-55');
    insert into dominio values ('1-123');
    insert into dominio values ('1-1234');
    insert into dominio values ('4-789/1');
    

    Using like between both fields:

    select *
    from ampliatoria amp, dominio dom
    where  amp.descripcion like '%' || dom.dom_descripcion || '%'
    

    We get the following correct and erroneous results:

    descripcion     dom_descripcion
    11-1234         1-123
    11-1234         1-1234
    111-12344       1-123
    111-12344       1-1234
    1-1234          1-123
    1-1234          1-1234
    AA5-55B         5-55
    4-789/1         4-789/1
    

    We will have to avoid the numbers before and after the searched string. In postgresql we can not use character lists like [a-z] with LIKE but we can use them with SIMILAR . We define a list of the numbers as [0-9] and use the character ^ to indicate that it does not include any character from the list, like this [^ 0-9]. We indicate that it is not found before or after the search string:

    select *
    from ampliatoria amp, dominio dom
    where  amp.descripcion similar to '%[^0-9]' || dom.dom_descripcion || '[^0-9]%'
    

    This gives us the following result:

    descripcion     dom_descripcion
    AA5-55B         5-55
    

    The result is correct, since 5-55 is contained within AA5-55B. However, data like 4-789 / 1 does not meet the condition, since the first block% [^ 0-9] avoids the 4 which is the first character of the string. Then, taking into account the first condition, we look for the chains that are exact with a = and we use both conditions:

    select *
    from ampliatoria amp, dominio dom
    where ( amp.descripcion = dom.dom_descripcion
      or amp.descripcion similar to '%[^0-9]' || dom.dom_descripcion || '[^0-9]%')
    

    This gives us the following results:

    descripcion     dom_descripcion
    1-1234          1-1234
    AA5-55B         5-55
    4-789/1         4-789/1
    

    With this we obtain only the chains that qualify. The code is in fiddle for PostgreSQL 9.6

        
    answered by 19.12.2018 / 16:31
    source