I don't buy this. LLMs are basically just fancy text completion based on training data. "Binary data from a proprietary industrial machine" sounds like the furthest possible thing that could have been in the training data. How can you possibly trust its output if it's not something it's ever seen before?
The only reason I say this is because I have tried. I asked an LLM to decode a variety of base64 strings, and every single time, it said the deocded ASCII was "Hello, world!"
This doesn't come as a surprise to me. Unless it was trained on a dataset that included a mapping of every base64-encoded character, it's just going to pattern-complete on sequences of base64-encoded-like characters and assume it translates to "Hello, world!" from some programming tutorial it was trained on.
That's still kinda cool. Now I'm curious if it can decode all the figlet fonts too. Size can be controlled with HTML as some are easier to read visually by a human if smaller
[Edit] - This might makes ones eyes bleed but I am curious if it can read this [1]. If installing figlet type showfigfonts to see examples of all the installed fonts. More can be installed [2] in /usr/share/figlet/fonts/
That kind of decoding is a bit different though. For one, the tokenization process makes encodings difficult to handle (unless it’s trained on a lot of pairs).
This would be more akin to asking ChatGPT to help build a black box parser for base64, not asking it to decode it itself.
GPT4 can absolutely decode base64. Early jailbreaks were to base64 a python-based jailbreak to get it to output whatever you wanted and later OpenAI added a patch to filter base64 outputs to follow their rules.