wchar_t does not print letters with accents

1

I have the following code:

int
wordsAnalyzer(char* dir){
    FILE* doc = fopen(dir,"r");
    wchar_t myChar;
    int count = 0;

    while((myChar = fgetwc(doc)) != EOF){
        putwchar(myChar);
    }
    getchar();

    return count;
}

And I try to print the following text (which is inside a file):

  

Hólà

When printing it appears this:

  

H? l?

For what I understand wchar.h is the library that allows me to handle unicode characters, but so far I have not been able to make it work. What is the problem?

PS: Now I've tried adding setlocale (LC_ALL, ""); to my code and the print of the file changes to:

  

H

    
asked by Cristofer Fuentes 14.05.2017 в 11:04
source

1 answer

-1

There are several issues that you have to make sure that your program prints Hólà:

1.- The file that contains the Hólà string must be recorded with a coding compatible with the locale with which you are going to execute the program. I'm going to do it with the locale es_ES.UTF-8
Therefore the file must be encoded in UTF-8.
If you do a hexdump of a file with the string Hólà and nothing else, neither returns of car or anything, you must obtain:

jose@cpu:~/t$ hexdump -C fich.txt
00000000  48 c3 b3 6c c3 a0                                 |H..l..|
00000006

Where:

48    Unicode U+0048 LATIN CAPITAL LETTER H (Mayúscula latina H)
c3b3  Unicode U+00F3 LATIN SMALL LETTER O WITH ACUTE (Minúscula latina O
                                                      con tilde cerrada)
6c    Unicode U+006C LATIN SMALL LETTER L (Minúscula latina L)
c3a0  Unicode U+00E0 LATIN SMALL LETTER A WITH GRAVE (Minúscula latina A
                                                      con tilde abierta)

You can check the unicode codes in link

2.- Your program must load the locale set in the environment variables. This you do well.

setlocale( LC_ALL, "");

3.- The locale must be available in your system.
For example, in an Ubuntu system you have to edit the file /etc/locale.gen
And make sure it contains this line:

en_US.UTF-8 UTF-8

If it did not contain it or contained it commented (starting with #) then you have to execute locale-gen after editing it.

4.- The user must establish the locale. This is what you had wrong. Using bash would be like this:

jose@cpu:~/t$ LC_ALL=es_ES.utf8
jose@coy:~/t$ ./programa
Hólà
    
answered by 14.05.2017 в 12:45