Strange cheaper alternative to encoding

Joe Ellis

Member
Today I discovered that if you write 2 specific bytes at the start of any file that it turns all the characters to chinese symbols when opened in a text editor like notepad or wordpad. (255, 254)

I'm expecting it's something to do with character translation from binary, but I don't know why this happens at all.

This is good for me, as the files my engine creates can easily be encoded with no lag instead of the entire buffer having to be encoded.
I'm mainly wondering if anyone knows about this and most importantly if it's a bad idea to use this as a form of encoding
 
If you haven't found the answer already, here's what I discovered. Knowing that 255, 254 in Hex is 0xFFFE, I googled around a bit which revealed that if using these characters at the start of a text file, indicates that the files byte order is little-endian.

See here : https://en.wikipedia.org/wiki/Endianness

Unicode text can optionally start with a byte order mark (BOM) to signal the endianness of the file or stream. Its code point is U+FEFF. In UTF-32 for example, a big-endian file should start with 00 00 FE FF; a little-endian should start with FF FE 00 00.
 

Joe Ellis

Member
If you haven't found the answer already, here's what I discovered. Knowing that 255, 254 in Hex is 0xFFFE, I googled around a bit which revealed that if using these characters at the start of a text file, indicates that the files byte order is little-endian.

See here : https://en.wikipedia.org/wiki/Endianness
Cool, that's the kind of thing I was imagining. As it stands, I think this is as effective as any encoding option, basically making it unreadable in notepad- stopping casual editing
 
If you want to encode your text files, write them backwards as a binary file and read them backwards to translate them back into a text file. This means the first byte in your binary file will be the EOF of your text file, and the last byte of your binary file is actually the first byte of your text file originally. The part of the program that reads the file, reads the bytes backwards in binary from the last byte to the first byte, to translate into a text file. After the translation, you can use the text file that was derived from the binary file. The same coding method to write backwards, is also used to read backwards since it goes to the EOF byte position of the file.

"0\l\r\ !dlroW olleH"

Late Edit : I meant to use the word encryption, not encoding ( which I confused in my train of thought )
 
Last edited:

TsukaYuriko

☄️
Forum Staff
Moderator
Are we talking about encryption here? If so, the presented approach is not suitable for your target application. It is not by any means encryption. It is also conceptually different from encoding. Encoding attempts to ensure data is readable, while encryption intends to prevent it, at least for prying eyes.

You may think you're encrypting the data, but you're really just (accidentally!) instructing common text processing software to display the (unchanged) data a certain way - or, in other words, messing up the encoding. It's a bit like opening an image in a text editor and concluding that because you can't see an image, people won't be able to steal your graphics. Actual encryption involves changing the data, not the display of said data, and actually offers security to a certain extent. Open one "encrypted" (or not so encrypted) file you created this way and manually change the encoding (actually encoding) to the proper one, and you'll see exactly what you put there... some strange characters at the start, followed by the original data.
 

Joe Ellis

Member
Are we talking about encryption here? If so, the presented approach is not suitable for your target application. It is not by any means encryption. It is also conceptually different from encoding. Encoding attempts to ensure data is readable, while encryption intends to prevent it, at least for prying eyes.

You may think you're encrypting the data, but you're really just (accidentally!) instructing common text processing software to display the (unchanged) data a certain way - or, in other words, messing up the encoding. It's a bit like opening an image in a text editor and concluding that because you can't see an image, people won't be able to steal your graphics. Actual encryption involves changing the data, not the display of said data, and actually offers security to a certain extent. Open one "encrypted" (or not so encrypted) file you created this way and manually change the encoding (actually encoding) to the proper one, and you'll see exactly what you put there... some strange characters at the start, followed by the original data.
I don't know why you think I'm talking about encrytion, I've said encoding throughout the whole thread. I know that this method doesn't make it unopenable or uneditable, but the point was that it makes it unreadable in a text editor just like any encoding method and it's just as easy for someone to make a small program that removes the first 2 bytes as it is to decode from base64. but the difference with this is that it doesn't cause any lag cus it doesn't have to process the entire file.
So my point is still correct that it is an effective alternative to encoding

I'm practically going off what the manual says about encoding: https://docs2.yoyogames.com/source/_build/3_scripting/4_gml_reference/file handling/index.html

So you can't really blame me if it's wrong
 
Last edited:
@Joe Ellis IMHO, your confusing the idea of encoding with the idea of encryption. Its very easy to confuse the two concepts. What your describing is encryption in your posts, if your intentions are that you dont want people to read your text files. Encoding is the method of how the data is written to be read later. For instance, ASCII (7-bit) and EBCDIC (8-bit) use two different methods for encoding text so that they can be read, but they're not encrypted. However, encryption is a form of encoding, but not the other way around.
 

Joe Ellis

Member
@Joe Ellis IMHO, your confusing the idea of encoding with the idea of encryption. Its very easy to confuse the two concepts. What your describing is encryption in your posts, if your intentions are that you dont want people to read your text files. Encoding is the method of how the data is written to be read later. For instance, ASCII (7-bit) and EBCDIC (8-bit) use two different methods for encoding text so that they can be read, but they're not encrypted. However, encryption is a form of encoding, but not the other way around.
I understand what your saying, but I'm sure I'm not confusing the two things. I've known what the difference is for ages and made quite a few encryption methods. But people do use encoding to make things unreadable too, its just that the encoding algorithms are easily reversible and supposed to be like that, and this thing I'm talking about is aswell

It's mainly for aesthetics, not security
桳彤敤慦汵t楈桧倠慥獫䐀晥畡瑬Ȁ敓瑴湩獧ࠀ䈀捡杫潲湵⁤潃潬r潃潬r㤲‬〷‬〹䌀汥楓敺刀湡敧㔀〰‬㔲ⰰ㔠〰0楖睥删摡畩s慒杮e〲〰‬ⰱ㘠〰〰倀楯瑮删摡畩s慒杮e〵ⰰㄠ‬〵〰䄀楮潳牴灯捩䈀潯l牴敵䘀汩整⁲敔瑸牵獥䈀潯l牴敵䘀汩整⁲灓楲整s潂汯昀污敳䄀灬慨吠獥⁴敒f慒杮e㈱ⰸ〠‬㔲5汇扯污唠楮潦浲s畓楄敲瑣潩n慙灷瑩档㘀ⰰ㘠ⰰㄠ匀湵䌠汯牯刀执㈀㔵‬㔲ⰵ㈠㔵䄀扭敩瑮䌠汯牯刀执㌀ⰸ㌠ⰷ㐠6潆⁧潃潬r杒bⰰ〠‬0潆⁧瑓牡t慒杮e〳ⰰ〠‬〱〰〰0潆⁧敌杮桴刀湡敧㐀〰ⰰ〠‬〱〰〰0潆⁧汁桰a慒杮eⰱ〠‬1湉瑳湡散漀橢摟晥畡瑬̀敇敮慲l扏敪瑣䤠摮硥伀橢捥t扯彪敤慦汵t潍敤l潍敤l楈桧倠慥獫䄀楮慭楴湯䠀摩敤n䘀慲敭䤠摮硥䠀摩敤n䘀慲敭匠数摥䠀摩敤n䄀瑣癩e潂汯琀畲e楖楳汢e潂汯琀畲e⁘潐s敒污 夀倠獯刀慥l0⁚潐s敒污 夀睡刀慥l0楐捴h敒污 刀汯l敒污 匀慣敬刀慥l⸱〰䴀獡k潍敤䜀楌瑳洀獡彫潭敤孳崴嘀牡慩汢獥 䤀獮慴据e扯彪汰祡牥晟獰̀敇敮慲l扏敪瑣䤠摮硥伀橢捥t扯彪汰祡牥晟獰䴀摯汥䴀摯汥 0湁浩瑡潩n楈摤湥 牆浡⁥湉敤x楈摤湥 牆浡⁥灓敥d楈摤湥 捁楴敶䈀潯l牴敵嘀獩扩敬䈀潯l牴敵堀倠獯刀慥l㜭〰〮7⁙潐s敒污㈀㔲㌮9⁚潐s敒污 夀睡刀慥l0楐捴h敒污 刀汯l敒污 匀慣敬刀慥l⸱〰䴀獡k潍敤䜀楌瑳洀獡彫潭敤孳崲刀摡畩s慒杮e〵‬ⰱㄠ〰〰嘀牡慩汢獥฀猀整彰癥湥t捓楲瑰瀀慬敹彲灦彳瑳灥洀癯彥灳敥d敒污㄀2捡散l敒污 ㄮ0敤散l敒污 ㄮ0牧癡刀慥l⸰〲樀浵彰灳敥d敒污㔀〮0畴湲獟数摥刀慥l6畴湲獟潭瑯h慒杮e⸰ㄳ‬⸰㄰‬1潬歯灟瑩档刀湡敧 ‬㤭ⰰ㤠0灨刀慥l〱0慭彸敨污桴刀慥l〱0牡潭牵刀慥l〱0慭彸牡潭牵刀慥l〱0慣敭慲穟潟晦敳t敒污ⴀ㌷

Looks better than
shd_default Lofty Castle Default Settings Background Color Color 29, 70, 90 Cell Size Range 25, 1, 5000 Collision Radius Range 1, 1, 60000 View Radius Range 2000, 1, 60000 Point Radius Range 500, 1, 5000 Anisotropic Bool true Filter Textures Bool true Filter Sprites Bool false Alpha Test Ref Range 128, 0, 255 Global Uniforms Sun Direction Yawpitch 60, 60, 1 Sun Color Rgb 255, 255, 255 Ambient Color Rgb 38, 37, 46 Fog Color Rgb 0, 0, 0 Fog Start Range 300, 0, 1000000 Fog Length Range 4000, 0, 1000000 Fog Alpha Range 1, 0, 1 Instance obj_default General Object Index Object obj_default Model Model Lofty Castle Animation Hidden Frame Index Hidden Frame Speed Hidden Active Bool true Visible Bool true X Pos Real 0 Y Pos Real 0 Z Pos Real 0 Yaw Real 0 Pitch Real 0 Roll Real 0 Scale Real 1.00 Mask Mode Gl List mask_modes[4] Variables Instance obj_player_platformer General Object Index Object obj_player_platformer Model Model Default Animation Hidden Frame Index Hidden Frame Speed Hidden Active Bool true Visible Bool true X Pos Real -2450 Y Pos Real 2600 Z Pos Real 250 Yaw Real 0 Pitch Real 0 Roll Real 0 Scale Real 0.25 Mask Mode Gl List mask_modes[2] Radius Range 12.50, 1, 10000 Variables step_event Script player_platformer_step move_speed Real 7 accel Real 0.05 decel Real 0.15 grav Real 0.20 jump_spd Real 5.00 turn_spd Real 6 max_zspeed Real 30 turn_smooth Range 0.1, 0.01, 1 hp Real 100 max_health Real 100 armour Real 100 max_armour Real 100 camera_zoom Real 221 camera_z_pos Real 73
 

GMWolf

aka fel666
Reading / writing from a file is going to be slower than whatever transformation you do with it.
I would recommend you use base64 encoding if all you want is to stop people from just opening with a text editor, but don't care about real security.

If you want performance, then write it in chunks, convert a chunk to base64, the async write it to the buffer, in a loop.



But remember, if it's something the users may like to edit, but you don't care about doing full encryption, then leave it as plain text and let them modify it easily!
 

Joe Ellis

Member
Reading / writing from a file is going to be slower than whatever transformation you do with it.
I would recommend you use base64 encoding if all you want is to stop people from just opening with a text editor, but don't care about real security.

If you want performance, then write it in chunks, convert a chunk to base64, the async write it to the buffer, in a loop.



But remember, if it's something the users may like to edit, but you don't care about doing full encryption, then leave it as plain text and let them modify it easily!
My files are written with a buffer, a mix of binary values(u8, u16, u32 and f32) and strings for variable names and stuff, so they're not really editable in a text editor anyway. So I just wanted to make all of it unreadable cus it looks better than a bunch of non-letters and words mixed together.

But if I can encode it using base_64 to to achieve this, or just write a 2 byte bom at the start, I'd rather do the second cus the saving and loading time
will be much less.

Anyway, all the source code the engine uses for loading and saving the different files is available for people to use, in the project(warp3d)
So, I'm not bothered about the files being editable, in fact I'd be glad if people made programs that can edit them, similar to fbx, obj etc.
And having encryption with every file would completely ruin people being able to share files(levels and models) with eachother.

The only time I'm concerned with encryption is for when someone has completed a game and wants all the files to be encrypted, so I'm thinking of making a separate tool that encrypts them with a certain key, and the key would have to be written somewhere in the project's gml code and compiled into the exe. I'm guessing this still wouldn't be that hard for a hacker to retrieve, especially if they had access to the source code. But I don't really know how important this would be to most developers anyway, cus every game ends up being hacked if someone really wants to get into it.
But I'm definitely gonna be working on the best encryption method I can think of, but that has nothing to do with this thread lol


Also, I thought of a way to rephrase what I said at first to hopefully make it as straightforward as possible:

Some people would want to encode a buffer before saving it so all the strings in it aren't readable by the human eye.
But doing this will slow down the saving and loading process, especially when decoding it back when loading.
So instead, they can just set a byte order marker at the start of the file.
It achieves the same goal of the strings not being readable by the human eye, but is a heck of alot faster, and they don't have to decode it back when loading, just ignore the first 2 bytes.

I also want to state that this method is even less secure than using base_64, so it's purely for the way a file looks when opened in notepad :)
 

TsukaYuriko

☄️
Forum Staff
Moderator
I assumed you're talking about encryption because the end result of encryption (dissuading people from reading or editing the data) seemed to be the effect you're going for, and because the immediately noticeable effect of messing up the BOM sort of looks like the data is encrypted, so I figured you may have mistaken it to be that. If that's not the case, apologies for jumping to conclusions.

So let's try that once again, but this time going fully with encoding:

You are correct that encoding tends to make data unreadable, or at least not readable as easily as the original data. The form of "encoding" you're presenting is not encoding data per se, though, as the original data is left intact, but rather messing up the byte-order mark (BOM) that indicates the endianness of the data following it. The BOM being present at all also implies that the encoding is Unicode, which is why the data ends up being displayed as mojibake (as you're now reading two bytes per character instead of one).

In other words, you're changing the "use this encoding and endianness by default" setting. This causes text editors (that are aware of it) to misrepresent the data, as it is being read the wrong way as a result.


What I meant regarding it not being suitable for the purpose of making it harder to read is that it's really just a matter of how a program reads the data.
If the text editor you're using is not aware of Unicode or endianness, the effect of doing what you're describing is void.
That aside, reversing the "encoding" is as simple as opening the file with UTF-8 encoding. You can also open it in a hex editor, where everything past the first two bytes will look as it always did.

To summarize: Yes, you're making it look like garbage when opened via the default "open in Notepad". But that's about it, as it won't stop anyone who has ever used a hex editor from realizing it's still plain text.
 

Yal

🐧 *penguin noises*
GMC Elder
This is trivially easy to reverse... a lot of programs aren't Unicode-aware, and Notepad is so simple it shouldn't be, it's just pretending it understands it.
upload_2019-10-10_0-23-4.png

A lot of programs can analyze file contents more thoroughly, and they'll see through your ruse.
 
S

Sam (Deleted User)

Guest
This is trivially easy to reverse... a lot of programs aren't Unicode-aware, and Notepad is so simple it shouldn't be, it's just pretending it understands it.
View attachment 26952

A lot of programs can analyze file contents more thoroughly, and they'll see through your ruse.
It's a problem with how Windows is designed at its core. You have to write and use helper functions to convert string formats and use the wide variants of every Win32/Posix function, otherwise it'll only recognize ANSI encoding.

Mac and Linux are superior for this reason; everything is already in UTF-8 without the need to jump through any extra hoops. I argued with two Linux fans about this. They didn't understand this is a problem with Windows and they told me the problem was a bug with the MinGW compiler, but the reality is using the ASNI function variants inherently do not support UTF-8 no matter what you do, and this will be the case regardless of the compiler you are using, whether MinGW or Visual Studio's compiler.
 
Top