Page 1 of 1

Problem with CEGUI::String

Posted: Mon Sep 21, 2009 19:13
by Impz0r
Hey I've just ran into a problem concerning the CEGUI::String.

I'm trying to use german "umlaute" like: "äöü' within an edit box. This works just fine until the point i try actually accessing them by getText().
The thing is that I'm using std::string within my entire application except the gui of course. To get a CEGUI::String content into a std::string i do something like:

Code: Select all

std::string blah = control->getText().c_str();

This works most of the time, but not with these "äöü". Internally the string is converted to utf8 and thereby the "äüö" get fecked up somehow.

So may question is, how do I get it right, if it is even possible?

PS: And besides, why is CEGUI not using the std::string class it does also support unicode?


Thanks in advance!

Mfg Imp

Re: Problem with CEGUI::String

Posted: Tue Sep 22, 2009 08:40
by CrazyEddie
Impz0r wrote:To get a CEGUI::String content into a std::string i do something like:

Code: Select all

std::string blah = control->getText().c_str();

This works most of the time, but not with these "äöü". Internally the string is converted to utf8 and thereby the "äüö" get fecked up somehow.

Can you clarify this bit: Internally the string is converted to utf8 - do you mean by std::string, or the fact that CEGUI::String does this? I'm not aware of std::string having such a function, and the reason CEGUI does it is because it's impossible to represent the entire set of Unicode code points in 8 bit chars (so we use utf8 in that case).

Impz0r wrote:PS: And besides, why is CEGUI not using the std::string class it does also support unicode?

I don't believe std::string does support unicode, even the wide character type that's in the standard is not too helpful for us because the actual representation is not specified and so varies by implementation - it's for this reason we wrote a string class that we can rely on to do what we expect in all cases ;)

So, to clear up a couple of points. What representation are you yourself using for characters? Some form of actual unicode or ISO/IEC 8859-1 or something else? :) Knowing this will aid in coming up with a suitable conversion, though largely it will involve accessing the UTF32 codes in the CEGUI::String and stuffing them into your std:: string (after applying any required conversion).

CE.

Re: Problem with CEGUI::String

Posted: Tue Sep 22, 2009 10:25
by Impz0r
Hey CE, thanks for your answer.

I'm sorry, i did not express my concern very well.

What I need is ISO/IEC 8859-1 because it supports "ÄäÖöÜü" as long as the Wikipedia page does nod lie ;)

Can you clarify this bit: Internally the string is converted to utf8 - do you mean by std::string, or the fact that CEGUI::String does this?


Sorry I was also quite unclear here. What i meant was, that the CEGUI::String internally converts the String to utf8 which is correct, but the outcome seems not to fit. Meaning, I put "ÄäÖöÜü" into it and I get chars like "Á" back. Dunno if I'm doing something wrong here?


Thanks in advance!

Mfg Imp

Re: Problem with CEGUI::String

Posted: Tue Sep 22, 2009 13:16
by CrazyEddie
Thanks for the clarification, it should be a simple 'stuffing' exercise. Try something like this:

Code: Select all

std::string& CEGUIStringToStdString(const CEGUI::String& in_str, std::string& out_str)
{
    out_str.resize(in_str.length());

    for (size_t i = 0; i < in_str.length(); ++i)
        out_str[i] = (char)in_str[i];

    return out_str;
}


Btw, the reason your Ä turns into Á is because utf8 is a multibyte representation where each glyph is represented by a variable number of chars - for normal ASCII it's all fine because code points 0 to 127 translate directly, for values above this, code points are represented by two or more bytes: Ä which is 0xC4 is encoded into utf8 as the sequence 0xC3 0x84 - all fun stuff :)

CE.

Re: Problem with CEGUI::String

Posted: Wed Sep 23, 2009 08:23
by Impz0r
Hey CE thanks for the quick snipped you've posted. As far as i understand, you just typcast the 2Byte unicode string into a 1Byte. So the upper part of it will just be cut off, right?


Thanks again for your great support!

Mfg Imp

Re: Problem with CEGUI::String

Posted: Wed Sep 23, 2009 08:55
by CrazyEddie
Well, we use UTF32 which is four bytes, but other than that, yes that function is just using the low byte as the final char - this does not work in all cases, but should be fine for ISO/IEC 8859-1.

CE.