Language Support - Char Ranges

Kezarus

Member
Hello everyone! I am trying to add language support to my Framework for others to use and I bumped in a characters range issue.

I plan to add cyrillic, japanese and chinese from Noto Sans Fonts.

For cyrillic I added the range 1025 to 1105.
For japanese I added 12352 to 12543 for hiragana and katakana (I followed the answer of @FrostyCat from this post )
For chinese... I am clueless.

The questions are:
1 - Is this the best way to add language support? Add a font for each language family supported?
2 - Should I add them dinamically using add_font (I read that this is kinda slow)?
3 - Are the ranges correct? Which should I use?
4 - And... err... kanjis...? There are thousands of them and I don't know if it's better to left them out and let the user add them on demand.

I will still add a dictionary structure and simple functions for the developer just add another language file and BAM magic happens. The framework will read a file and get the correct font and set the correct texts by a key/value pair that will be stored in memory. But I stumbled in this add font range issue. If anyone could help it will be awesome!


Cheers!
Kezarus
 

Yal

šŸ§ *penguin noises*
GMC Elder
4 - And... err... kanjis...? There are thousands of them and I don't know if it's better to left them out and let the user add them on demand.
Supposedly you need to know 10,000 kanji to be considered literate and 50,000 to be able to read a book or a newspaper. This goes for both Japanese and Chinese, too - writing stuff with only kana makes text look incredibly childish (books for children usually have the kana spelling written above any uncommon kanji, in fact). Trying to add them all will just waste a lot of VRAM for any user not using them, and chances are you missed a bunch of specialized kanji the end-user will want to use anyway (e.g. a dev making a sci-fi game might want terms for chemical compounds, someone making a fantasy game might need archaic versions of words for bastion types and old trades, etc).

It might be worth noting that usually, all potential versions of kanji aren't pre-rendered and stored as separate glyphs: they're combined from individual radicals to form all potential combinations. (If you open Google Docs or Microsoft Bloatffice and use the "add symbol" interface you'll find a bunch of groups like "Han 4-stroke radicals" - these are the symbols used to make kanji). The idea is that kanji for a more complicated concept is formed by combining more simple terms: "wife" is formed from "woman", "hand" and "broom", for instance. (If you needed another reminder just how old the chinese language is)
 

Kezarus

Member
Yeah, I didn't know I was headed into something this big, but I will do what I need to do! =]

So, I am planing to do the following and please do share your opinions about it.

I will have archives of game texts. They will contain the font family to use (latin, cyrillic, japanese, and chinese) on the first line and a key/value pair in the rest of the archive. A program will audit and collect each unique symbol used on that language archive and include the font family with every character needed using the font_add script. This will be executed at change language only and necessary variables to load this at the game start will be placed at the game INI file and font families will be on an Included Files folder.

Question is: how do I add multiple character ranges with font_add?
 

FrostyCat

Member
1 - Is this the best way to add language support? Add a font for each language family supported?
In my opinion, this is a domain that you should stay out of at the engine level. Let your user configure their fonts as needed by their internationalization requirements, and leave slots in your engine where these fonts can be specified. It's not your fault that they're stupid enough to not know the basic details of text rendering in their home language.

Unless you have coverage-centric font like Arial Unicode, you are asking for trouble down the line if you call draw_set_font() directly from Draw events without an abstraction (e.g. a script that sets a font according to the language and context). You use an artistic font that works in English, then you add a new language that uses glyphs not in your original font, but its latin character set looks ugly or doesn't fit, etc. All sorts of crap happens with hard-coded draw_set_font() architecture when localizing beyond the European sprachbund, and there is little awareness of the issue. Numbers and dingbats are the only exceptions to this rule.

2 - Should I add them dinamically using add_font (I read that this is kinda slow)?
My experience is to always use font_add() with CJK languages, and not even bother with character ranges or surface-based fonts (i.e. the font resource in the IDE).

If you will only be outputting CJK characters, then surface-based fonts might be viable if you collect all the characters last-minute. But once you start taking input, the whole scheme for surface-based fonts comes apart. There's no telling which characters could crop up, and the entire set is too large to entirely include on a surface. That's where font_add() fonts come in.

As for the performance argument, it's overblown to start with, and more or less moot because there is no other practical way. If I'm localizing a game for CJK, I'd rather take a minor FPS hit over drawing blanks or illegible crap where there should be normal text. And people who keep saying "add the whole range for your language" are just provincial monolinguals who have not seen how big the range is for CJK, and how much of that would go to waste if the entire set is pre-loaded and pre-rendered.

3 - Are the ranges correct? Which should I use?
The range you got for Cyrillic, hiragana and katakana are correct, though you did forget the punctuations for CJK languages (which are slightly different and twice as wide). But the ranges for Chinese characters are much more open-ended (see CJK Unified Ideographs), with 7 so far (1 base block and 6 supplementals) and new blocks being constantly allocated even now.

The correct action is to not care about character ranges, and just use font_add() where glyphs can be cached on the fly. There is no other viable way for general CJK text. Performance freaks can cry me a river as to how "inefficient" that is, but I'll gladly sacrifice a little performance for the text to come out right.

4 - And... err... kanjis...? There are thousands of them and I don't know if it's better to left them out and let the user add them on demand.
The FreeType renderer in the runner will add them on-demand if you use font_add().

Chinese characters (AKA Kanji, Hanja) and Hangul both have the same issue with character set size, and in both cases the solution is to leave it to the runner. Pre-rendering a fixed range of glyphs onto a surface is good for up to a few hundred characters at most.

Question is: how do I add multiple character ranges with font_add?
You don't. When you use font_add(), that "range" you specified there is just the starting seed. If you draw characters outside that range, they will be added and cached on the fly if the TTF file has them. This is the opposite of what surface-based fonts do, which is drawing a placeholder character or blank when it sees an out-of-range character, even if the original font has a glyph for it.
 

Kezarus

Member
Wow, thanks a lot @FrostyCat! Gonna have to read what you write a couple more times to grasp all that you said.

Right now I am just doing the basics for Latin and Cyrillic (ini files, Map, funcs), then I plan to implement this font_add thingy.

If I understand correctly what you said, I could use a font_add on the fly to switch to a desired font (that will be sent along the game included files), just add the basic characters, and let it cache more symbols as they are needed. Is that right?

I can't thank you enough, @FrostyCat. =]


Cheers!
Kezarus
 

FrostyCat

Member
If I understand correctly what you said, I could use a font_add on the fly to switch to a desired font (that will be sent along the game included files), just add the basic characters, and let it cache more symbols as they are needed. Is that right?
Download my example from this post. You'll see how it's done there.
 

Kezarus

Member
Nice! I understood you right then. =]

You add the font from a included file, the range doesn't matter (I even tested with a range of 1) as it will add what it need on the fly. I will aim for that.

Thanks a lot for you help, mate!
 
Top