Consult with Like to search for characters that are not within A to Z

Question

Consult with Like to search for characters that are not within A to Z

Navigation

#1 by (1 votes)

3

I find myself doing a search in my database. The search consists of finding rows with characters that do not belong to the alphabet (from the 'A' to the 'Z' ) for example 'Þ' , '%' or an Ampersand ' & '.

In the attached image, I show the query and the result it throws at me; in the result I do not see the symbol I'm looking for in the LIKE. I do not understand why this happens, does anyone have any ideas?

sql sql-server

asked by Cesar Sanchez 30.05.2018 в 22:35

source

1 answer

Python collation excel report with Pandas Can you use value in a string-array to assign a different identifier to each item of an Android spinner?

score 1 · Answer 1

This is due to the table of equivalences of the comparison collection ( collates ) Latin1_General* of the SQL Server (except the binary). Probably other checklists have the same behavior. I do not know what you are using in the server / table / session.

I do not dare to define this as a bug , since I do not know if that character will have any pronunciation anywhere in the world where this character set is used, although certainly not in the world of Spanish / English speaking.

I have done several tests with SQL Server 2014 (12.0.5000) and SQL Server 2017 (14.0.1000), obtaining the same results.

What I have been able to observe, is that SQL detects that the character of marras as equivalent to the sequence of characters 'TH' . Therefore, it returns all the rows that have said sequence.

You can check it easily with the following queries:

with
no_partemp as (
          select 'HT-DS100TH' collate latin1_General_100_ci_ai no_parte
union all select 'MAXVS940TH'
union all select 'MAXVS950TH'
union all select 'MAXVS990TH'
union all select 'MAXZS530TH'
union all select 'MAXZS720TH'
union all select 'MM39TH/XAP'
union all select 'RCDS30TH'
union all select 'RCDS50TH'
union all select 'THD-750'
union all select 'T000000'  --Este registro se excluye
union all select 'TH00000'
)
select   ascii('Þ') ascii_de_busqueda
       , char(ascii('Þ')) caracter_de_busqueda
       , charindex('Þ', no_parte) posicion_encontrada
       , no_parte
       , substring(no_parte, charindex('Þ', no_parte), 1) caracter_coincidencia
       , ascii(substring(no_parte, charindex('Þ', no_parte), 1)) ascii_coincidencia
  from no_partemp
 where no_parte like '%Þ%';

Which returns the following result:

ascii_de_busqueda caracter_de_busqueda posicion_encontrada no_parte   caracter_coincidencia ascii_coincidencia
----------------- -------------------- ------------------- ---------- --------------------- ------------------
222               Þ                    9                   HT-DS100TH T                     84
222               Þ                    9                   MAXVS940TH T                     84
222               Þ                    9                   MAXVS950TH T                     84
222               Þ                    9                   MAXVS990TH T                     84
222               Þ                    9                   MAXZS530TH T                     84
222               Þ                    9                   MAXZS720TH T                     84
222               Þ                    5                   MM39TH/XAP T                     84
222               Þ                    7                   RCDS30TH   T                     84
222               Þ                    7                   RCDS50TH   T                     84
222               Þ                    1                   THD-750    T                     84
222               Þ                    1                   TH00000    T                     84

(11 row(s) affected)

The query that follows, would be identical to the previous one, except that in this I use the collate latin1_General_100_bin , with which it makes the exact comparison and does not return any result (neither returns uppercase as match of lowercase letters, etc.)

with
no_partemp as (
          select 'HT-DS100TH' collate latin1_General_100_bin no_parte
union all select 'MAXVS940TH'
union all select 'MAXVS950TH'
union all select 'MAXVS990TH'
union all select 'MAXZS530TH'
union all select 'MAXZS720TH'
union all select 'MM39TH/XAP'
union all select 'RCDS30TH'
union all select 'RCDS50TH'
union all select 'THD-750'
union all select 'T000000'
union all select 'TH00000'
)
select   ascii('Þ') ascii_de_busqueda
       , char(ascii('Þ')) caracter_de_busqueda
       , charindex('Þ', no_parte) posicion_encontrada
       , no_parte
       , substring(no_parte, charindex('Þ', no_parte), 1) caracter_coincidencia
       , ascii(substring(no_parte, charindex('Þ', no_parte), 1)) ascii_coincidencia
  from no_partemp
 where no_parte like '%Þ%';

Result:

ascii_de_busqueda caracter_de_busqueda posicion_encontrada no_parte   caracter_coincidencia ascii_coincidencia
----------------- -------------------- ------------------- ---------- --------------------- ------------------

(0 row(s) affected)

It is interesting to note that even with the NVarchar data type the same behavior is observed:

with
no_partemp as (
          select N'HT-DS100TH' no_parte
union all select 'MAXVS940TH'
union all select 'MAXVS950TH'
union all select 'MAXVS990TH'
union all select 'MAXZS530TH'
union all select 'MAXZS720TH'
union all select 'MM39TH/XAP'
union all select 'RCDS30TH'
union all select 'RCDS50TH'
union all select 'THD-750'
union all select 'T000000'
union all select 'TH00000'
)
select   ascii('Þ') ascii_de_busqueda
       , char(ascii('Þ')) caracter_de_busqueda
       , charindex('Þ', no_parte) posicion_encontrada
       , no_parte
       , substring(no_parte, charindex('Þ', no_parte), 1) caracter_coincidencia
       , ascii(substring(no_parte, charindex('Þ', no_parte), 1)) ascii_coincidencia
  from no_partemp
 where no_parte like N'%Þ%';

This returns the same results as the first query, which I do not repeat anymore so as not to abuse the space.