Remove accents from a string in C #

5

I have searched but I can not find what I need, although I find solutions to replace but I can not find a solution to help me.

I have an input string with accents and I need to check out the accents, my code:

string palabra = "pálábrá cón tíldés";

string palabaSinTilde = Regex.Replace(palabra, @"[^0-9A-Za-z]", "",
RegexOptions.None);

The output I have is: "plbr cn tlds"

What I need: word with accents

Thank you, have a good afternoon.

    
asked by andres garcia 31.08.2018 в 17:24
source

3 answers

1

I have solved my problem, First find out the format of my file, in this case it was UTF8, and with that data I found the solution:

string accentedStr;
byte[] tempBytes;
tempBytes = System.Text.Encoding.GetEncoding(“ISO-8859- 8”).GetBytes(accentedStr);
string asciiStr = System.Text.Encoding.UTF8.GetString(tempBytes);
    
answered by 31.08.2018 / 20:15
source
10

Try the following extension method

public static class StringExtensions
{
    public static string SinTildes(this string texto) =>
        new String(
            texto.Normalize(NormalizationForm.FormD)
            .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
            .ToArray()
        )
        .Normalize(NormalizationForm.FormC);
}

Explanation:

Characters such as á , ö , etc can be expressed in Unicode in two ways: A single character that corresponds to the character already accentuated: á for example or two consecutive characters where the first character is the tilde and the next the character to which ´a is going to be applied. Both forms are for text editors to show this version - > á

This line:

.texto.Normalize(NormalizationForm.FormD)

Ensures that the string expands to separates the characters as tides and other modifiers in their consituting characters.

Then

.Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)

He makes sure to keep only those characters that are not diacritics.

Then a new string is created with characters already removed

 new String(...)

Finally the chain is returned to its normal state with this line

.Normalize(NormalizationForm.FormC)
    
answered by 31.08.2018 в 17:38
0

I think this is what you are looking for:

 string textoOriginal = "Mañana será otro día";//transformación UNICODE
 string textoNormalizado = textoOriginal.Normalize(NormalizationForm.FormD);
//coincide todo lo que no sean letras y números ascii o espacio
//y lo reemplazamos por una cadena vacía.Regex reg = new Regex("[^a-zA-Z0-9]");
string textoSinAcentos = reg.Replace(textoNormalizado, "");
Debug.WriteLine(textoSinAcentos); //muestra 'Manana sera otro dia'
    
answered by 31.08.2018 в 17:43