Character corruption on FontDemo

If you found a bug in our library or on our website, please report it in this section. In this forum you can also make concrete suggestions or feature requests.

Moderators: CEGUI MVP, CEGUI Team

vroad
Just popping in
Just popping in
Posts: 6
Joined: Sun Sep 12, 2010 12:27

Character corruption on FontDemo

Postby vroad » Mon Sep 13, 2010 13:06

Compiler is Visual Studio 2008(Japanese).
CEGUI's version is 0.7.2 .

By default, compiler cannot compile Sample_FontDemo.cpp.
VS 2008's compiler seems to be failed to detecting encoding of this file.
Compiler can compile it successfully by adding a UTF-8 BOM to this file, with 61 wornings.

1>Sample_FontDemo.cpp
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(59) : warning C4566: ユニバーサル文字名 '\u00E2' によって表示されている文字は、現在のコード ページ (932) で表示できません
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(59) : warning C4566: ユニバーサル文字名 '\u0103' によって表示されている文字は、現在のコード ページ (932) で表示できません
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(63) : warning C4566: ユニバーサル文字名 '\u015F' によって表示されている文字は、現在のコード ページ (932) で表示できません
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(67) : warning C4566: ユニバーサル文字名 '\u00E5' によって表示されている文字は、現在のコード ページ (932) で表示できません

... And same 57 warnings.


However, some characters (including Japanese and Korean) are can't shown in this demo.
They are replaced to ?.

I added a TrueType font, 梅明朝, which includes Japanese character and modified font name of source file.
But characters are not shown correctly.

I have noticed strings in this demo are defined as char*, and CEGUI::String only supports UTF-32.
I think adding converesion between encodings, such as UTF-8, UTF-16, UTF-32 is not so difficult.
Strings should be defined as wchar_t*.
Why CEGUI::String does not have this function?

Code: Select all

static struct
{
    utf8 *Language;
    utf8* Font;
   utf8 *Text;
} LangList [] =
{
   // A list of strings in different languages
   // Feel free to add your own language here (UTF-8 ONLY!)...
    { (utf8 *)"English",
      (utf8*)"DejaVuSans-10",
     (utf8 *)"THIS IS SOME TEXT IN UPPERCASE\n"
              "and this is lowercase...\n"
              "Try Catching The Brown Fox While It's Jumping Over The Lazy Dog" },
    { (utf8 *)"Русский",
      (utf8*)"DejaVuSans-10",
     (utf8 *)"Всё ускоряющаяся эволюция компьютерных технологий предъявила жёсткие требования к производителям как собственно вычислительной техники, так и периферийных устройств.\n"
              "\nЗавершён ежегодный съезд эрудированных школьников, мечтающих глубоко проникнуть в тайны физических явлений и химических реакций.\n"
              "\nавтор панграмм -- Андрей Николаев\n" },
    { (utf8 *)"Română",
      (utf8*)"DejaVuSans-10",
      (utf8 *)"CEI PATRU APOSTOLI\n"
              "au fost trei:\n"
              "Luca şi Matfei\n" },
    { (utf8 *)"Dansk",
      (utf8*)"DejaVuSans-10",
      (utf8 *)"FARLIGE STORE BOGSTAVER\n"
              "og flere men små...\n"
              "Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen Walther spillede på xylofon\n" },
   { (utf8 *)"Japanese",
      (utf8*)"ume-tmo3-30",
      (utf8 *)"日本語を選択\n"
              "トリガー検知\n"
              "鉱石備蓄不足\n" },
   { (utf8 *)"Korean",
      (utf8*)"Batang-26",
      (utf8 *)"한국어를 선택\n"
              "트리거 검지\n"
              "광석 비축부족\n" },
    { (utf8 *)"Việt",
      (utf8*)"DejaVuSans-10",
      (utf8 *)"Chào CrazyEddie !\n"
              "Mình rất hạnh phúc khi nghe bạn nói điều đó\n"
              "Hy vọng sớm được thấy CEGUI hỗ trợ đầy đủ tiếng Việt\n"
              "Cám ơn bạn rất nhiều\n"
              "Chúc bạn sức khoẻ\n"
              "Tạm biệt !\n" }
};

User avatar
CrazyEddie
CEGUI Project Lead
Posts: 6760
Joined: Wed Jan 12, 2005 12:06
Location: England
Contact:

Re: Character corruption on FontDemo

Postby CrazyEddie » Tue Sep 14, 2010 09:14

I'll start by saying that the compilation of that demo works fine on all systems I ever tried it on, with no compilation warnings such as what you mention. It seems to me the file has been edited and saved with the wrong encoding and, combined with attempts to salvage the situation, this is what's causing issues. I'd start with a fresh copy of the file and ensure that it's always saved with utf-8 encoding.

I have noticed strings in this demo are defined as char*, and CEGUI::String only supports UTF-32.

They're cast to CEGUI::utf8* which is the correct way to deal with this situation - although the caveat is that the file must be saved with utf-8 encoding. And CEGUI::String supports utf-8 and utf-32 encoded data as input.

I think adding converesion between encodings, such as UTF-8, UTF-16, UTF-32 is not so difficult.

As stated, CEGUI already supports UTF-8 and UTF-32. There are no plans to add direct support for UTF-16. For Windows users needing to convert to and from UTF-16 there are Windows API functions that can assist in this area.

Strings should be defined as wchar_t*.

Absolutely NOT! What wchar_t actually represents is compiler specific and is useless for the needs of CEGUI without contaminating the code with lots of conditionally compiled variations based on the particular compiler and compiler version that's building the code (and even then you run the risk of interoperability issues). The Unicode standard mentions a whole load of good reasons why wchar_t basically sucks, I recommend you read it - see the section: Unicode Data Types for C in Chapter 5: Implementation guidelines.

CE.

vroad
Just popping in
Just popping in
Posts: 6
Joined: Sun Sep 12, 2010 12:27

Re: Character corruption on FontDemo

Postby vroad » Tue Sep 14, 2010 14:03

I copied original code from archive CEGUI-0.7.2, But This iisue still happen.

...
1>Sample_FontDemo.cpp
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp : warning C4819: ファイルは、現在のコード ページ (932) で表示できない文字を含んでいます。データの損失を防ぐために、ファイルを Unicode 形式で保存してください。
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(58) : error C2001: 定数が 2 行目に続いています。
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(59) : error C2059: 構文エラー : ')'
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(62) : error C2143: 構文エラー : ';' が '}' の前にありません。
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(62) : error C2059: 構文エラー : '}'
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(63) : error C2143: 構文エラー : ';' が '{' の前にありません。
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(63) : error C2447: '{' : 対応する関数ヘッダーがありません (旧形式の仮引数リスト?)
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(67) : error C2059: 構文エラー : ','
1>c:\pathToCEGUI\cegui_mk2\samples\fontdemo\sample_fontdemo.cpp(68) : error C2143: 構文エラー : ';' が '{' の前にありません。
...


You mean using char for string literals and save the source code as UTF-8 is better than using wchar_t, doesn't you?
I am missed into thinking that CEGUI::String does not support UTF-8, sorry.
I think If source code was saved as UTF-8, converting to UTF-32 from UTF-8 would work correctly.

I saw a implementation that assumes wchar_t is unicode.
If sizeof(wchar_t) is 1, it assumes encoding of given string is UTF-8.
if it is 2, assumes UTF-16.
if it is 4. assumes UTF-32.
It performs convertion of encoding and byteswap if needed.
However, there is no grantee that compiler uses unicode, so I noticed it's not compatible with any platform.
This implementation may not work correctly on any platform, but compilation always succeeds if source code is Shift-JIS.

By removing these lines, the compilation succeeds.

Code: Select all

{ (utf8 *)"Română",
      (utf8*)"DejaVuSans-10",
      (utf8 *)"CEI PATRU APOSTOLI\n"
              "au fost trei:\n"
              "Luca şi Matfei\n" },


But "\n" is shown in edit box and Hangeul in Korean is not shown.
Also, I found another issue.
If size of font is big, drawing of partially clipped line is incorrect.
If edit box is scrolled, suddenly appers string of next line!

Code: Select all

EEE:\nEE
PEE
EE (and Hangeul)

日本語を選択\nトリガー検知
鉱石備蓄不足


Return to “Bug Reports, Suggestions, Feature Requests”

Who is online

Users browsing this forum: No registered users and 11 guests