Is there any incompatibility between the utf8 / 16 / 32_spanish_ci?

3

I find myself creating a database, but I am faced with a doubt that is the differences but I do not know if it affects the databases.

I always use utf8_spanish_ci to create the database regarding the collate of it. But I noticed that there is both the utf16_general_ci , and utf32_general_ci .

For what I have researched and know:

  • utf8 are one-byte
  • utf16 are two to four byte
  • utf32 are fixed four.

Source

But my question is not the size, it's Would it affect the data entry to it? I mean, it will generate problems with the text, etc ...

I suppose not, but I know that there are slight modifications that may affect them.

    
asked by CodeNoob 10.11.2016 в 13:04
source

1 answer

2

When you choose a collate as utf8_spanish_ci , you are actually specifying 2 things:

  • The charset: utf-8
  • The collate: spanish_ci

The charset determines how to represent the data internally (the bytes), while the collate determines the rules that are followed to compare and sort the text.

Reference: Character Sets and Collations in MySQL .

Charset

So for the same string (for example 'abc' ), the values and the number of bytes used to represent that string internally will not be the same if you use utf8_general_ci or utf16_general_ci , because they use 2 charsets different.

There are 2 main reasons why you would choose one charset rather than another in different circumstances:

  • In the case of some charsets, you may not have the ability to represent certain characters, so it is important to choose a charset that can handle all the characters you need. If you use a charset that starts with utf ( utf8 , utf16 , utf32 , etc.), then you can be sure that you can handle any character that is part of Unicode.
  • The number of bytes used per character varies between different charsets. So if you want to control the size of the database, it's something to think about.

Generally, the use of utf-8 is favored because it balances very well the need to handle all the characters in Unicode, but at the same time using a format that reduces the amount of bytes needed.

For languages such as Spanish, utf-8 can represent the characters with a single byte. But if you were to handle Chinese for example, then more bytes may be required, and then% co_of% may be more advantageous. (Note: contrary to what you put in the question, utf16 does not represent all the characters in Unicode with a single byte, some yes, others no.)

Collate

Now, within a charset , you have the option to choose different collates . For example, you can choose between utf8 and utf8_general_ci , and even utf8_spanish_ci . In all these cases, the internal representation of the text is identical (the bytes are the same).

Rather, the effect of choosing a different collate is that it adjusts how the text is compared and ordered in your queries.

For example, if the text contains letters such as utf8_spanish2_ci or double L ñ , using different collates you will notice that the text is sorted differently when you do ll .

Here is a demonstration of how the collate affects the order of the text in MySQL.

As for the effect it has on the comparison of the data ( ORDER BY ), I am not aware of any difference between where col = 'abc' and utf8_general_ci . But there may be. I know there are differences in the case of other languages like German, for example.

Reference: Examples of the Effect of Collation .

    
answered by 10.11.2016 / 15:40
source