Validate / Convert text to UTF8 vb.net

1

I have to generate a series of xml retrieving information from a database. The company that will process the xml requires that only UTF8 characters are used but in the information that I recover from the database there are many characters not allowed as accents, ñ, ...

Dim b As String = "AaáÁäÄñÑ"
Dim bytes As Byte() = System.Text.Encoding.Default.GetBytes(b)
Dim a As String = System.Text.Encoding.UTF8.GetString(bytes)

This returns to me as "not strange" characters only "a" and "A".

I have searched for information about the allowed characters in UTF8 but I have seen different tables and I do not know very well which one to trust.

I had thought is to make a substitution of the vowels accented by not accented, the ñ by n, ... but I do not know if there would be a more effective solution.

Is there any way to optimize this task? How can I get the list of allowed characters in UTF8?

    
asked by Jaime Capilla 26.04.2017 в 08:54
source

1 answer

4

UTF8 allows accented characters, ñ, etc ... In Complete Character List for UTF-8 you have a list of accepted characters.

The problem with your test code is that to get the bytes you use the default encoding with System.Text.Encoding.Default.GetBytes . You have to use the UTF-8 encoding. This would be your test code:

Dim b As String = "AaáÁäÄñÑ"
Dim bytes As Byte() = System.Text.Encoding.UTF8.GetBytes(b)
Dim a As String = System.Text.Encoding.UTF8.GetString(bytes)

I take this opportunity to recommend a page (in English) that perfectly explains the entire Unicode theme and character encoding: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

    
answered by 26.04.2017 / 09:45
source