Page 1 of 1

How can i extract unicode final codepages?

Posted: Sun Jun 28, 2009 14:08
by Ahmadi
Hi
Im working on a project in an arabic language.
for sample,see following text :
"بب ب"
The above text are using only one codepoint(0x628) but after pressing "space", windows show different character.
How can i extract a displayed/rendered text codepoints after that the Windows show it!?
Please be inform, that if i save the sample text, the binary contain of the file will be "<HEADER> 0x628 0x628 0x20 0x628 "
But why? why the second character have codepoint same as first(0x628)? they are different after the rendering.
How can i extract final UTF8 codepoints after text rendering, or if i don't need it How can i say the CEGUI that render RTL UTF8 same as the Windows?

Thank you for any help.
H.Ahmadi

Re: How can i extract unicode final codepages?

Posted: Tue Jun 30, 2009 08:48
by CrazyEddie
Huh? I'll answer the other post instead, since I understand that one :)

Re: How can i extract unicode final codepages?

Posted: Tue Jun 30, 2009 14:06
by Ahmadi
This question is not fully relative to CEGUI, but i think you can help me. For this reason i posted in "advanced help".
As i know,each font have some codepoints(for example 0000-FFFF) and each codepoint show one glygh as shape of a character.
When the Microsoft Windows are rendering a text in a textbox, the Renderer will show some codepoints of the fon't as a sequence, the sequence is desire text.(The result text is relative to the Regional and Languages Options).
I want know is there anyway that i found the final codepoints of a displayed text from the textbox?
Maybe this question come to mind that why i need this codepoints, i detected that if i find final codepoints and pass them to CEGUI as an Ogre::UTFString, CEGUI can render them similar to Windows.
But im founding the final codepoints from Windows\Tools\CharacterMap! after opening my font! but how can i capture them after the Windows render?
Notes:
1.I used UTFString because the codepoints are unicode and it work fully.
2.I have no problem with CEGUI or Ogre(Its my project renderer) , The only problem that i have, is founding the final codepoints!?

Thank you for any help.
H.Ahmadi

Re: How can i extract unicode final codepages?

Posted: Wed Jul 01, 2009 08:36
by CrazyEddie
OK. I think you can do it by saving the text to a file and examining the resulting file in a binary file editor / viewer. The key is that the file must be saved with the correct encoding so that the results will match what the class you're using is expecting (I'm unfamiliar with Ogre::UTFString - it must be a fairly recent addition) - also remember to set the viewer to use the appropriate data width so the results are interpreted correctly.

CE.

Re: How can i extract unicode final codepages?

Posted: Wed Jul 01, 2009 17:48
by Ahmadi
CrazyEddie wrote:OK. I think you can do it by saving the text to a file and examining the resulting file in a binary file editor / viewer. The key is that the file must be saved with the correct encoding so that the results will match what the class you're using is expecting (I'm unfamiliar with Ogre::UTFString - it must be a fairly recent addition) - also remember to set the viewer to use the appropriate data width so the results are interpreted correctly.
CE.

Im happy that we are reaching to the problem, the problem is here! The Windows are showing the different glyph when it want to render a character(for example because there is "space" after the character!).
But when i save the text to file, all of characters have same codes(for example 0x628)! (in rendering have different glyph depend of position of characters!)
The first post of me in this thread was mentioned to save and load in hex editor return wrong codes !
I tested all type of text formats(Ansi,Unicode,Unicode big endation,UTF8), all of them store same codes!
I think,We need a tools that capture final codepoints after Windows rendering. (After that Windows render the text!).

Thank you for your help.
H.Ahmadi

Re: How can i extract unicode final codepages?

Posted: Thu Jul 02, 2009 09:02
by CrazyEddie
Ok, I know you probably already know this, but just to be sure that I have this clear myself: I think this is related to something I vaguely remember from when I read the Unicode specifications some years ago, whereby for certain sequences of codepoints a single glyph gets rendered (yes, you already said as much, but my being able to relate it to something remembered from the Unicode spec is important to me ;)). We are now on the same page, and I fully understand the issue - because the sequence translation is happening only at render time - so copy / paste / file examinations just yield the original codepoint sequence and not the codepoint of the rendered glyph. Again, sorry for repeating what you already know and already said - it's done for clarification purposes only :)

To answer the actual question, in short, I'm not sure how to get the codepoint of the final rendered glyph. There may be a tool out there that can do this, but I'm not largely aware of such things. It may be necessary to make use of the Unicode tables to perform the mapping (I don't think this is done as part of the BiDi support in 0.7.0).

CE.

Re: How can i extract unicode final codepages?

Posted: Thu Jul 02, 2009 19:48
by Ahmadi
CrazyEddie wrote:Ok, I know you probably already know this, but just to be sure that I have this clear myself: I think this is related to something I vaguely remember from when I read the Unicode specifications some years ago, whereby for certain sequences of codepoints a single glyph gets rendered (yes, you already said as much, but my being able to relate it to something remembered from the Unicode spec is important to me ;)). We are now on the same page, and I fully understand the issue - because the sequence translation is happening only at render time - so copy / paste / file examinations just yield the original codepoint sequence and not the codepoint of the rendered glyph. Again, sorry for repeating what you already know and already said - it's done for clarification purposes only :)

To answer the actual question, in short, I'm not sure how to get the codepoint of the final rendered glyph. There may be a tool out there that can do this, but I'm not largely aware of such things. It may be necessary to make use of the Unicode tables to perform the mapping (I don't think this is done as part of the BiDi support in 0.7.0).

CE.

Thank you for your attention.
Do you know anyone that can help me about the problem, its some weeks that im suffering the problem.
Thank you for your help.

Re: How can i extract unicode final codepages?

Posted: Sat Jul 04, 2009 20:23
by CrazyEddie
Ahmadi wrote:Do you know anyone that can help me about the problem, its some weeks that im suffering the problem.

Not especially. The only source of information regarding this I have available is the docs / specifications on the Unicode site - which, while very informative, is somewhat dry and boring to wade through. As far as a source of general advice as regards to such matters, I'm not sure what to suggest.

CE.