Localizing for Chinese

Phil Strahl

Member
Hey everybody. I am having a game that supports multiple languages by reading a JSON file as a ds_map. With adding all necessary glyphs for all the supported languages to the font that went quite well.

Problem is: How to do this for languages with a huge amount of characters, such as Chinese and Japanese? I have received a few requests from Chinese players who would love to play the game in their language and I always had to turn them down...

The only thing I cam up with would be creating a buttload of sprites and even more if/switch statements throughout my code to display an image instead of a line of text. Needless to say that this is tedious and is just asking for problems...

So has anybody experience with this kind of thing?
 
S

seanm

Guest
It should in theory be the same as displaying the roman alphabet, as characters don't join together like in Arabic.

I would seriously consider just finding a different font for the Chinese text. No one is really going to play both versions, so they aren't going to notice you're using a different font.

This looks like it has chinese and japanese pixel characters.
https://github.com/SolidZORO/zpix-pixel-font
edit: nope, costs $500 for commercial use.

I believe the default fonts on most computers should be able to display the characters.
 

Phil Strahl

Member
It should in theory be the same as displaying the roman alphabet, as characters don't join together like in Arabic.

I would seriously consider just finding a different font for the Chinese text. No one is really going to play both versions, so they aren't going to notice you're using a different font.

This looks like it has chinese and japanese pixel characters.
https://github.com/SolidZORO/zpix-pixel-font
edit: nope, costs $500 for commercial use.
I don't think the font is the problem. I am a bit worried about having 2000+ glyphs in texture pages for different fonts and different sizes... also i don't know how (well?) GM handles UTF-16 encoded JSON files, etc...
 

YanBG

Member
Were you able to add the characters to the font?

Is your problem that english based strings won't match the chinese words? You can use separate strings for only names and for full lines of dialogues.
 

FrostyCat

Redemption Seeker
There are three ways to do fonts in GMS 1.x:
  • Texture-based fonts added from the IDE
  • Sprite fonts from font_add_sprite()
  • Raw TTF fonts from font_add() in native platforms

For anything beyond a few hundred fixed characters, the first two approaches are trash. Include the TTF file with the game and use font_add(). Fonts loaded this way can render glyphs on-demand, and are the only way to reasonably do general CJK right. As complex as CJK looks to a non-speaker, it is not CTL, so the standard character-by-character assumption in OpenType still holds.

For the encoding, there is no reason for you to use UTF-16. GMS only has support for UTF-8, which is enough for CJK already. Your characters aren't coming out right because it's garbage-in-garbage-out.

For the font face, regular western fonts aren't likely to save you, they probably won't have the right glyphs (check Character Map and you'll see). Get a native speaker to help pick a font with matching aesthetics. If you want a general-purpose font for Chinese, Arial Unicode MS, DFKai-SB, Batang, Dotum are commonly available standard picks. Artsy fonts are also available and can be searched for separately. They usually also support character sets in common lower ranges (ASCII is a given, sometimes Cyrillic and/or Greek), so if you can find a good stand-in for what you already use and need, go for it.

A key consideration is whether a font may be used to print arbitrary CJK input, such as user input and online downloads. For those that don't, enumerate the characters that they will be used to print, then add them to texture-based fonts in the IDE. Make sure to update them every time you edit strings that will be printed with them. Save the third method for fonts used to print arbitrary CJK input or large amounts of well-varied CJK text.

Speaking of CJK input, there are GMS-specific concerns as to what you can no longer use. keyboard_string-based approaches, pretend on-screen keyboards and most of the glitzy stuff (i.e. not system UI like get_string_async()) will all come apart. Solving the font problem is only half the fight.

PS: This community needs a guide on how to do CJK properly, if nobody else wants to do it then I'll try. From my interactions with YoYo regarding i18n matters, they seem blissfully apt to toot the inadequate methods' horns. Maybe it's because I'm a native Chinese speaker and they're not.
 
S

seanm

Guest
@FrostyCat Its my understanding that both fonts loaded internally, and font_add, will draw out all of the characters onto a texture page. What's the difference? I feel like I'm missing something here.
 

Mike

nobody important
GMC Elder
if you use font_add() to load a TTF font, then characters are cached as you use them, and will be flushed if it runs out of space. I don't know off hand how large the cache is, but this is there for exactly this reason.

Please note: if you include a TTF font inside your game YOU MUST HAVE A LICENSE TO DISTRIBUTE THAT FONT.

You don't need that license to create a texture page, only if you want to distribute the actual font.
 

Phil Strahl

Member
There are three ways to do fonts in GMS 1.x:
  • Texture-based fonts added from the IDE
  • Sprite fonts from font_add_sprite()
  • Raw TTF fonts from font_add() in native platforms

For anything beyond a few hundred fixed characters, the first two approaches are trash. Include the TTF file with the game and use font_add(). Fonts loaded this way can render glyphs on-demand, and are the only way to reasonably do general CJK right. As complex as CJK looks to a non-speaker, it is not CTL, so the standard character-by-character assumption in OpenType still holds.

For the encoding, there is no reason for you to use UTF-16. GMS only has support for UTF-8, which is enough for CJK already. Your characters aren't coming out right because it's garbage-in-garbage-out.

For the font face, regular western fonts aren't likely to save you, they probably won't have the right glyphs (check Character Map and you'll see). Get a native speaker to help pick a font with matching aesthetics. If you want a general-purpose font for Chinese, Arial Unicode MS, DFKai-SB, Batang, Dotum are commonly available standard picks. Artsy fonts are also available and can be searched for separately. They usually also support character sets in common lower ranges (ASCII is a given, sometimes Cyrillic and/or Greek), so if you can find a good stand-in for what you already use and need, go for it.

A key consideration is whether a font may be used to print arbitrary CJK input, such as user input and online downloads. For those that don't, enumerate the characters that they will be used to print, then add them to texture-based fonts in the IDE. Make sure to update them every time you edit strings that will be printed with them. Save the third method for fonts used to print arbitrary CJK input or large amounts of well-varied CJK text.

Speaking of CJK input, there are GMS-specific concerns as to what you can no longer use. keyboard_string-based approaches, pretend on-screen keyboards and most of the glitzy stuff (i.e. not system UI like get_string_async()) will all come apart. Solving the font problem is only half the fight.

PS: This community needs a guide on how to do CJK properly, if nobody else wants to do it then I'll try. From my interactions with YoYo regarding i18n matters, they seem blissfully apt to toot the inadequate methods' horns. Maybe it's because I'm a native Chinese speaker and they're not.
Thanks heaps for your comprehensive answer, Frosty! That clears it all up (for now). I am not having that much text to render and luckily don't rely on user input via keyboard apart from some F-keys, still I'm reading usernames from Steam's leaderboards. I'll see if I can get it working with some garbage text and then try to find a native Chinese localization service.

if you use font_add() to load a TTF font, then characters are cached as you use them, and will be flushed if it runs out of space. I don't know off hand how large the cache is, but this is there for exactly this reason.
That's good to know, thanks. The documentation talks about "a limitation of around 200 glyphs that can be rendered in a single frame for a particular font," so I have to run some tests to see if am not running in some issues there.

Again, thanks a lot! You guys are wonderful!
 
this might be a waste to mention but i have been actively contemplating this a great deal and if all you are doing is displaying message or instructions to the player, why couldnt you just only select the symbols you need or create sprites of the text and then have the computer draw each indavidual symbol or have a single sprite for the whole line of text?

i suspect because its much more trouble then its worth but im just trying to understand what would be the optimizal situations.

please disreguard my question if the player is actually inputting text.

thank you.
 

YanBG

Member
He doesn't have the translation yet, but selecting only the needed characters in the font is a good idea. I made a Cyrillic localization and there are many unused letters in the generic fonts.
 

Phil Strahl

Member
this might be a waste to mention but i have been actively contemplating this a great deal and if all you are doing is displaying message or instructions to the player, why couldnt you just only select the symbols you need or create sprites of the text and then have the computer draw each indavidual symbol or have a single sprite for the whole line of text?
Unfortunately, I have some dynamic text based on the player's Steam name, and I noticed quite a few with Chinese characters. On the other hand, the game will occasionally get updates so I can't know for sure which characters I might be needing in the future. On top of that, I already have a very well working localization for seven languages and I would hate to tear it all up and insert special cases in a 30k+ lines of code ;)
 

FrostyCat

Redemption Seeker
If you don't have control over what characters could show up, then you have to go with the third option of loading from TTFs. If you don't want to revamp your existing code this way, delete the texture-based original and use globalvar with font_add() to set up a new font under the old name.

If you do have source-level control over what characters could show up (e.g. localization files), take advantage of the From File button in the window that adds character ranges. Whenever you are notified of a file change from a translator, use that button on all fonts that will be used to draw text from it.
 
T

Tirous

Guest
I would try to limit the amount of text in the game, and only use the glyph's that are actually used within the game itself.

Not saying that ain't obvious, nor that its easy, just saying that when put up against a wall like this that's what i would do.

Also i would take the opportunity to use the localization to reduce the amount of text in the game, once its translated you can in one way or another change it to be as small as possible. Again, not sure how much work that would be to get done, but i'm just saying.

Hope this helps a tad ;)
 

RekNepZ

GMC Historian
I think most people in China (or at least, most who own a computer) know how to read Latin characters. It would probably be easiest to just use the romanticized forms of such languages.

As for names though, I'm really not sure. Would it be possible to use ascii codes?
 
Unfortunately, I have some dynamic text based on the player's Steam name, and I noticed quite a few with Chinese characters. On the other hand, the game will occasionally get updates so I can't know for sure which characters I might be needing in the future. On top of that, I already have a very well working localization for seven languages and I would hate to tear it all up and insert special cases in a 30k+ lines of code ;)
might i be able to inquire how you did that? im still trying to learn a whole lot of things and i havnt been ablt to even figure out how to display text unless its drawn in or you use the fonts loaded into the game.

thank you.
 

FrostyCat

Redemption Seeker
I would try to limit the amount of text in the game, and only use the glyph's that are actually used within the game itself.

Not saying that ain't obvious, nor that its easy, just saying that when put up against a wall like this that's what i would do.

Also i would take the opportunity to use the localization to reduce the amount of text in the game, once its translated you can in one way or another change it to be as small as possible. Again, not sure how much work that would be to get done, but i'm just saying.
That's why I advised figuring out which fonts will be taking arbitrary input and which ones won't. The ones that won't can and should stay in texture-based form. Then you can let OpenType pick characters out of a TTF file at runtime for small exceptions like online Steam user info.

I think most people in China (or at least, most who own a computer) know how to read Latin characters. It would probably be easiest to just use the romanticized forms of such languages.

As for names though, I'm really not sure. Would it be possible to use ascii codes?
That's silly. Chinese is never written in Romanticized form for native speakers. There's Pinyin, but that's a phonetic aid, not an orthography.
 

RekNepZ

GMC Historian
That's silly. Chinese is never written in Romanticized form for native speakers. There's Pinyin, but that's a phonetic aid, not an orthography.
Why does it need to use Chinese orthography? Native speakers can read the Romanticized form. It might look a bit weird to them, but I doubt it would be much of a problem.
 

FrostyCat

Redemption Seeker
I am a native speaker and can certainly read Pinyin just fine, but it's not just a bit weird, it's completely out of this world. Nothing I've ever read in Chinese over the past two decades is written entirely in Pinyin. If you aren't a native speaker, I suggest that you save the conclusions for people who are.

And if you notice the context of this topic, you'd notice another problem with your suggestion. The Chinese characters are showing up in user titles on Steam, which is a prime spot for making faces with them (e.g. 囧). In these cases, nothing but the original orthography would suffice.
 

Phil Strahl

Member
Status update: Finding a font that works is hard! Most of them are OpenType fonts which can't easily be converted to TTF. Then, even the TrueType fonts are hit and miss since apparently there are a lot of different ways of storing Unicode glyphs and only a fraction of them gets properly rendered via GM:S. Not even to speak of trying to reach the rights holders for some fonts... But I love a good challenge :)
 
Top