Remove accents from a string in C #

Question

Remove accents from a string in C #

Navigation

#1 by (10 votes)
#2 by (1 votes)
#3 by (0 votes)

5

I have searched but I can not find what I need, although I find solutions to replace but I can not find a solution to help me.

I have an input string with accents and I need to check out the accents, my code:

string palabra = "pálábrá cón tíldés";

string palabaSinTilde = Regex.Replace(palabra, @"[^0-9A-Za-z]", "",
RegexOptions.None);

The output I have is: "plbr cn tlds"

What I need: word with accents

Thank you, have a good afternoon.

c#

asked by andres garcia 31.08.2018 в 15:24

source

3 answers

10

Try the following extension method

public static class StringExtensions
{
    public static string SinTildes(this string texto) =>
        new String(
            texto.Normalize(NormalizationForm.FormD)
            .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
            .ToArray()
        )
        .Normalize(NormalizationForm.FormC);
}

Explanation:

Characters such as á , ö , etc can be expressed in Unicode in two ways: A single character that corresponds to the character already accentuated: á for example or two consecutive characters where the first character is the tilde and the next the character to which ´a is going to be applied. Both forms are for text editors to show this version - > á

This line:

.texto.Normalize(NormalizationForm.FormD)

Ensures that the string expands to separates the characters as tides and other modifiers in their consituting characters.

Then

.Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)

He makes sure to keep only those characters that are not diacritics.

Then a new string is created with characters already removed

 new String(...)

Finally the chain is returned to its normal state with this line

.Normalize(NormalizationForm.FormC)

answered by 31.08.2018 в 15:38

0

I think this is what you are looking for:

 string textoOriginal = "Mañana será otro día";//transformación UNICODE
 string textoNormalizado = textoOriginal.Normalize(NormalizationForm.FormD);
//coincide todo lo que no sean letras y números ascii o espacio
//y lo reemplazamos por una cadena vacía.Regex reg = new Regex("[^a-zA-Z0-9]");
string textoSinAcentos = reg.Replace(textoNormalizado, "");
Debug.WriteLine(textoSinAcentos); //muestra 'Manana sera otro dia'

answered by 31.08.2018 в 15:43

How to wait a certain amount of time between each execution of a method Get great-grandson of a div

score 1 · Accepted Answer

I have solved my problem, First find out the format of my file, in this case it was UTF8, and with that data I found the solution:

string accentedStr;
byte[] tempBytes;
tempBytes = System.Text.Encoding.GetEncoding(“ISO-8859- 8”).GetBytes(accentedStr);
string asciiStr = System.Text.Encoding.UTF8.GetString(tempBytes);