Issue
I'm having a trouble with GCC compiler and Windows CMD because I can't see the UTF-8 characters correctly. I've the following code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char caractere;
int inteiro;
float Float;
double Double;
printf("Tipo de Dados\tNúmero de Bytes\tEndereço\n");
printf("Caractere\t%d bytes \t em %d\n", sizeof(caractere), &caractere);
printf("Inteiro\t%d bytes \t em %d\n", sizeof(inteiro), &inteiro);
printf("Float\t%d bytes \t\t em %d\n", sizeof(Float), &Float);
printf("Double\t%d bytes \t em %d\n", sizeof(Double), &Double);
printf("Caractere: %d bytes \t em %p\n", sizeof(caractere), &caractere);
printf("Inteiro: %d bytes \t em %p\n", sizeof(inteiro), &inteiro);
printf("Float: %d bytes \t\t em %p\n", sizeof(Float), &Float);
printf("Double: %d bytes \t em %p\n", sizeof(Double), &Double);
return 0;
}
And then I run the following command:
gcc pointers01.c -o pointers
I don't get any compiling errors. But when I execute the produced file (.exe) it doesn't show the UTF-8 characters:
Tipo de Dados Número de Bytes Endereço
Caractere 1 bytes em 2686751
Inteiro 4 bytes em 2686744
Float 4 bytes em 2686740
Double 8 bytes em 2686728
Caractere: 1 bytes em 0028FF1F
Inteiro: 4 bytes em 0028FF18
Float: 4 bytes em 0028FF14
Double: 8 bytes em 0028FF08
How do I do to resolve this problem? Thank you.
Solution
Sadly, the Windows console has very limited and buggy support for UTF-8.
What can be done: Set the codepage to 65001
and use one of the fonts which are supporting it, eg. "Lucida Console". The codepage can be set by the command chcp
or, in C/C++, by the function SetConsoleOutputCP
; the font is set with SetCurrentConsoleFontEx
.
However, there are some major (and minor) problems. Minor first:
a) These functions are valid for one session, ie. if you run the program again later, you have to set it again. Making it default is possible in theory, but not recommendable, because it will affect all console programs and introduce the problems below to them, even if they don´t do anything with codepages and are not written to mitigate the problems.
b) If the console isn´t opened by the programn, but you´re starting it from an existing console, it will affect whatever runs after it, until this console is closed. So you have to change it back to the default value before your own program exits.
c) Some functions usable for console input/output won´t work properly with CP65001.
(that´s the most severe thing)
Unlike the whole UTF16 part of Windows, it partially treats UTF8 like any 1-byte charset, and does some strange things which just happened to fulfill the standard with 1byte charsets, but are implemented differently.
As an example, fread should return the number of bytes read (if called with size 1), but in Microsofts implementation, it does return the number of characters (UTF16 is an exception, but not UTF8). With any normal codepage, it will work because 1char=1byte, but not with UTF8 ... wrong return value => wrong data processed
Another example, fflush can hang (at least is reported to, didn´t check). etc.etc.
And it doesn´t only affect standard C functions, but the direct Winapi calls too.
d) As a result of c), all batch files with UTF-8 characters (except the normal ASCII range) won´t work properly, at least in some Windows versions (didn´t check each one, but it´s very likely that Win10 still has this bug. MS shows no intention to fix it anytime soon.)
Some more reading for c and d: https://social.msdn.microsoft.com/Forums/vstudio/en-US/e4b91f49-6f60-4ffe-887a-e18e39250905/possible-bugs-in-writefile-and-crt-unicode-issues?forum=vcgeneral
Answered By - deviantfan Answer Checked By - Robin (WPSolving Admin)