Introduction :
Finite Field Assembly is a programming language that lets you emulate GPUs on CPUs
It's a CUDA alternative that uses finite field theory to convert GPU kernels to prime number fields.
Finite Field is the primary data structure : FF-asm is a CUDA alternative designed for computations over finite fields.
Recursive computing support : not cache-aware vectorization, not parallelization, but performing a calculation inside a calculation inside another calculation.
Extension of C89 - runs everywhere gcc is available.
Context : I'm getting my math PhD and I built this language around my area of expertise, Number Theory and Finite Fields.
I guess it's not clear to me why it's even interesting to talk about their LinkedIn or their PhD in the first place? It's not like not having a PhD will make the work any more true or not. Wouldn't it be more interesting to discuss the merits of the post? There's really little point in trying to say that their LinkedIn has different info than the comment therefore the submission is invalid.
But, suppose I did actually hold that belief for some reason, then it would seem fairly intellectually dishonest to withhold relevant info in my pointed inquisition wherein I just characterize them as someone lacking mathematical experience at all, let alone from a world class university. But maybe that's just me!
I think they're pointing it out because stating they're working towards a PhD in something when they've just graduated and don't seem to be (as far as we can tell) enrolled in a PhD program is misleading. Note that the parent isn't the one that brought up their PhD, the author is, presumably to head off the big question marks everyone reading this got as to what it is.
It's unclear whether this page is something that could be useful, and deserves attention. The fact that the author is at best making misleading statements is useful in determining whether you should take their claims at face value.
They claim "Finite Field Assembly is a programming language that lets you emulate GPUs on CPUs".
It's not a programming language, it's a handful of C macros, and it doesn't in any way emulate a GPU on the CPU. I'll be honest, I think the author is trying to fake it till they make it, they seem interested in mathematics but their claims are far beyond what they've demonstrated, and their post history reveals a series of similar submissions. In so far as they're curious and want to experiment I think it's reasonable to encourage, but they're also asking for money and don't seem to be delivering much.
Why would they post the 4th article in a series where the previous ones require you to pay?
> I guess it's not clear to me why it's even interesting to talk about their LinkedIn or their PhD in the first place?
Am I taking crazy pills? I didn't bring it up, the guy himself, here at the top of this very thread branch, wrote specifically explicitly that he's a PhD student working on number theory.
> Wouldn't it be more interesting to discuss the merits of the post?
There is no merit, nothing to discuss. I linked the corresponding GitHub below so you can judge for yourself.
Depending on what properties they sold, they certainly could have gotten valuable real-world expertise with finite fields. It's certainly easier to sell them than infinite ones!
This is phrased in a kind of demanding way to an author who has been kind enough to share their novel work with us. Are you sure you spent enough time trying to understand?
It seems that pretty much everybody here is confused by this article. One user even accused it of LLM plagiarism, which is pretty telling in my opinion.
I for one have no clue what anything I read in there is supposed to mean. Emulating a GPU's semantics on a CPU is a topic which I thought I had a decent grasp on, but everything from the stated goals at the top of this article to the example code makes no sense to me.
Not to mention the part where adding compressors like this somewhat defeats the purpose of using a simple format like QOI (although at least zstd is faster than gzip, let alone 7zip).
But if we're modifying things like that, then they might as well make use of Nigel Tao's improved QOIR format, and replace the LZ4 compressor it uses with zstd. That's probably faster and likely compresses better than QOI.
So to clarify: my suggested point of comparison was replacing QOI + 7Zip of GP with QOIR + zstd. QOIR already compresses better than QOI before the LZ4 pass, and zstd compresses faster than 7zip and often better. On top of that you can put zstd in the header option when streaming data on a browser so you don't need to increase the JS bundle or whatever if the use case is the web. So that's basically a guaranteed net improvement all around.
Second of all, the speed/compression trade-off with zstd can be tuned a lot. The "half as fast as LZ4" stat is for the fastest setting, but for the proposed comparison point of 7zip a slower setting with better compression ratio is likely perfectly fine.
I saw something similar in Bit Twiddling hacks. Out of utter curiosity, when would you need to interleave bits in prod? Is it something a Saas dev would be doing or maybe sb in embedded programming?
To expand on azornathogron's answer, when you are working with 2d data (it generalises to 3d too!) you often want to filter pixels in a rectangular area. This is commonly for bilinear filtering or some kind of convolution kernel.
If you don't interleave the bits and have large textures (think 4096 pixels wide) where each pixel is 4bytes big, that means there is a distance of 16kb between a pixel and the pixel below it.
This is super bad for caches (really important for the TLB in the MMU which is usually way smaller than data caches).
In GPU literature you'll see this called "tiling" (again like azornathogron said it's not always pure morton order), Intel document their tiling layout, here's an older layout doc:
If you're doing low level graphics programming (processing pixel data) or in some other way dealing with 2D raster type data, you might want to work with data in Morton order or with Morton order tiles or something similar. Interleaving the bits of the x & y coordinate values helps to put pixels that are close together in your 2D space also close together in memory, which can help with making best use of caches.
Keccak (and other ciphers only using bit rotation and bitwise ops) can use bit interleaving to avoid slow 64-bit rotations on 32-bit hardware, by replacing 1 64-bit rotation by 2 independent 32-bit rotations on the interleaved words [1, §2.1].
Two-bit values are common in bioinformatics, and I’ve found the ability to efficiently convert between packed arrays of 1- and 2-bit values to be valuable in that ___domain.