Edit: part of my comment is corrected by comment below - Thanks openasocket!
Another comment about the content of this article:
Three quarters down the wiki page there is code for "adding foreign language" to the code. The options are are to add code comments in Arabic/Chinese/Russian/Korean/Farsi. My gut reaction is the purpose of this added language is to obfuscate the true source of the code - i.e. the code has Chinese comments in it so it must be from China. Ahh. I guess this makes sense to do. Only problem now is that the Chinese/Russian/Farsi/etc characters that they included in their code is now public. (Obviously now the CIA will change the foreign language words they insert)
I'd posit if someone had an X-year-old (i.e. x=7) copy of some malware, and the malware had these specific foreign language comments as shown by the article, there's a good possibility the source of the malware would be from the us government.
This is for obfuscating string constants, the foreign languages included is a red herring. The reason for this is that nontrivial code often has string constants in it, and the string contents are stored in the ELF/PE file in a manner that makes it trivial to extract. Since these strings often reveal a lot about the malware (e.g. a string constant "Your computer has been infected with randomware. Please deposit %d bitcoins to address %s") antivirus signatures often use them to detect specific kinds of malware, and reverse engineers find them useful in determining what a binary does. This framework scrambles the string contents (using techniques like XOR-ing every character against a random key), and injects some code into the executable so that the strings are unscrambled on startup. They just have foreign languages in the example to demonstrate this framework correctly handles unicode.
Analysts never use the language of the code comments for attribution, because such things are trivial to forge.
Considering that debug symbols, comments in code and Cyrillic characters in the metadata of files is being used a solid evidence Russia hacked the DNC, I'd say that it's probably still a useful tool
Source? I've read the stuff Crowdstrike and Manidant have put out and they mentioned none of those as evidence. Just binary analysis and network indicators from what I've seen.
Thanks for this insight! I'll edit my comment to credit you, but I won't delete it since someone might have the same thought process as me.
My comment:
So I see now (thanks to you) that it is just showing test cases (test warbles) to demonstrate that these scrambling techniques work with foreign languages. However, why would the us gov need to make sure that this program can successfully obfuscate Unicode strings in Chinese/Russian/Arabic/Farsi?
My gut reaction: while code comments would be trivial to forge, it appears the us gov is still using foreign language strings in some way - maybe having just one string constant originally in a foreign language that is then obfuscated/scrambled (such as by xoring every char against a random key)
Just FYI. Those Chinese characters are really really really rarely used in any writings. In fact, anyone with Chinese reading compression will tell you those are gibberish words and none of the words make any sense.
Another comment about the content of this article:
Three quarters down the wiki page there is code for "adding foreign language" to the code. The options are are to add code comments in Arabic/Chinese/Russian/Korean/Farsi. My gut reaction is the purpose of this added language is to obfuscate the true source of the code - i.e. the code has Chinese comments in it so it must be from China. Ahh. I guess this makes sense to do. Only problem now is that the Chinese/Russian/Farsi/etc characters that they included in their code is now public. (Obviously now the CIA will change the foreign language words they insert)
I'd posit if someone had an X-year-old (i.e. x=7) copy of some malware, and the malware had these specific foreign language comments as shown by the article, there's a good possibility the source of the malware would be from the us government.