Internationalization Cookbook

CharMapEx - Some kind of character map :-)

Story

This is a small tool that started as a private investigation into the functionality of some Windows API.

At some point somebody complained that GetGlyphIndices is not surrogate-aware and does not work for characters outside BMP (Basic Multilingual Plane).

So I have started a small application to test the claim.
And it was true!

So the next step was: “let’s find a solution!” and shortly after I had my own routine doing almost the same thing, but doing it’s own parsing of the cmap OpenType table.

To test that I needed an easy way to change the font and to visualize the results. Then I wanted to know what font contains a certain character (make sure to right-click :-)

And little by little, it grew into something that might be useful to others. And some friends also asked: “Why don’t you give it away?”

So, here it is, for your benefit and/or enjoyment :-)

CharMapEx screen-shoot

Future plans:

  • Fix printing
  • Determine glyph presence using Uniscribe (ScriptGetCMap)
  • Tile vertical and horizontal
  • Allow users to assign a font for each block
  • Take block names from the Unicode file “Blocks.txt”
  • Show Unicode information for each character using the Unicode files (“UnicodeData.txt,” Unihan.txt,” and maybe others)
  • Maybe publish some of the code

So make sure to select “Help” -> “Check for updates…” once in a while :-)

Disclaimer

In general, I am not responsible (irresponsible?) for any problem with this tool. It is provided “as is,” take or leave it :-)

Download

Ok now, there you go: CharMapEx.zip (contains the executable)

Good luck!

6 Comments to “CharMapEx - Some kind of character map :-)”

  1. Lubo says:
    Hi , I found your char map tool. I was wondering how did you find names for all characters ? Did you create list? Or are you receiving name from font file ... How do you know what character set is selected ? Thanks for information. Lubo
    • Mihai says:
      Those are really Unicode blocks, with the names in the Blocks.txt (part of regular Unicode releases, you can find it at ftp://ftp.unicode.org/Public//ucd/ with lots of other good info). For now the info is hard-coded, but I want to take it out and make it easier to update for new Unicode releases (and add more info about each character, the stuff in UnicodeData.txt, Unihan.zip, Scripts.txt, etc.) But somehow I don't really find the time :-)

Leave a comment