Apparently, every computer science problem has easy to understand solution that’s easy to implement but awfully slow in practice.
This is one of them.
Rotating images that way isn’t CPU bound problem. It’s RAM bandwidth and RAM latency bound problem. The optimal data structure for that is not recursive, it’s 2D array of small square blocks of pixels. If pixels are bits, I’d bet on either 8x8 or 16x16 blocks. If they are RGB, one good format for those images, optimized for rotations, is BC7 https://www.opengl.org/registry/specs/ARB/texture_compressio... the decompressor being implemented right in the graphics hardware.
> If we were writing an image manipulation application, we’d provide much snappier behaviour using coloured quadtrees to represent images on screen.
That statement was true 30 years ago, no longer a case. Branch misprediction is expensive, memory latency is huge, RAM is a block device now (the block size typically being 16 bytes for dual-channel DDR). In addition, some displays now show 8+ megapixels, and most show 16M colors.
For an image manipulation application, that recursive thing ain’t good at all.
>> we’d provide much snappier behaviour using coloured quadtrees
> That statement was true 30 years ago, no longer a case.
The colored quadtrees are an optimization over the plain quadtrees earlier in the article, reducing the number of comparisons for some images. That will be more efficient 300 years from now just as much as it was 30 years ago. In context it is perfectly obvious that "much snappier" does not refer to the totally different techniques used by actual graphics software.
It's a toy implementation used as an example, if you get hung up on the example, you miss the whole point.
By that argument, any blog post that does not contain a comprehensive software development education can be criticized.
You should ask whether these techniques are useful in writing correct code and if they save developer effort, rather than picking apart irrelevant aspects of the toy example.
I don't think modern graphics code represents individual pixels as strings either, would you dismiss the post for that reason?
> The optimal data structure for that is not recursive, it’s 2D array of small square blocks of pixels. If pixels are bits, I’d bet on either 8x8 or 16x16 blocks.
If the image is very large and contains large contiguous regions of solid colors, then a quadtree that doesn't subdivide homogeneous regions can be a much faster approach than a flat pile-o-pixels.
Either way, this article isn't about rotating images.
Well, hierarchical representation of those images is indeed a good idea for some data. Saves both RAM space and bandwidth, and CPU time.
Just not the quad-tree way described in the article.
There are ways to implement hierarchical structures that work well on modern hardware. One is hashmaps keeping dense 2 bits per pixel blocks (again, 16-64 bytes per block) with the values saying filled/empty/mixed. Another one is B-trees or B+ trees.
I am not convinced hashlife is still the best of them.
The invention is from early 80-s.
Many old heuristics about caching versus calculation is no longer true today.
Here’s my article on the related topic, lookup tables: https://github.com/Const-me/LookupTables
If I wanted to implement efficient life simulator, I’d probably used the fact that with AVX2, it’s possible to count those neighbors for 8x8 block of pixels in a single __m256i register (with 8 neighbors max, 4 bits / value is more than enough). Might be faster than L3 cache access, and will definitely be much faster than main memory access.
That’s why I’d put those pixels in 8x8 blocks, single uint64_t value per block.
Putting those blocks in a hash table takes care about the infinite field. std::unordered_map will work, but if you care about performance prefer CAtlMap from Microsoft, dense_hash_map from Google, or something similar.
I believe most modern implementations, including hybrid implementations of HashLife, take advantage of register and cache sizes. HashLife as I might implement it in JavaScript, is really a very naïve approach. I think Bill Gosper even had versions in the 1970s that did things with registers and assembly, not just doing everything in lists.
But the big issue with HashLife is not whether you can write a faster implementation, it's whether the particular pattern(s) you are exploring are amenable to HashLife. It wins for highly repetitive patterns, but is much slower for patterns with a very large amount of entropy in time and space, like methuselahs.
HashLife can be fabulous for exploring purpose-built machines that cycle though regular states.
Looking at golly-2.8-src. At the first sight, it does not use SIMD. Only uses 32-bit registers in 32-bit builds, 64-bit registers in 64-bit builds.
All x86 CPUs manufactured after 2003 have at least 16 128-bit SIMD registers. Using them for calculations is the only way to get anywhere near advertised CPU performance.
For languages other than C and C++, like JavaScript, where there’s no SIMD instructions, calculations are generally slower, but the hash tables are provided by the platform and hence implemented in highly-optimized C code, HashLife should be perfectly fine.
For HashLife, I think the chief problems are around an efficient cache, and especially efficient canonicalization. The algorithm is so fast for universes that suit its ___domain that I think most other considerations drop by the wayside.
However, besides the fact that not all problems have enough redundancy in time+space to suit it, it also suffers from the fact that it is most efficient when computing massive jumps in generations. So it’s as not as performant for creating animations as it is for doing things like finding the maximum size of a pattern, calculating its growth rate, and so forth.
For JavaScript you might do quite well with a 32-way bit-slice implementation, like http://fanf.livejournal.com/93032.html
That is, if you can't do it on the GPU with WebGL :-)
It's an entirely plausible scenario. It will rarely be the case with photographs, but graphical images often feature large solid color blocks. It's large reason why the PNG file format exists.
You might still want the precision and detail at the boundaries of the shapes that a raster representation would give you. Think big blotchy cow spots or something along those lines.
I don't think the point of the article was to present an efficient and novel solution for image rotation. It was just a nicely visual example of a recursive data structure.
We're talking about several orders of magnitude here.
Is a web page that loads in fifty seconds 'correct' or is it wrong? Is a video format that can only decode 2 frames per second correct, or wrong?
Often, timeliness is an unstated part of the requirements, in which case an academically pleasing solution that isn't practical doesn't fulfill the requirements.
I think you're missing the point.
A quadtree is a compact way to store a partition of two-dimensional space, much like you'd store a binary heap in an array.
The implementation of a recursive data structure or algorithm can usually take advantage of tail recursion or trampolining. The practical implementation of a theoretical algorithm may look extremely different, much like you wouldn't literally translate abstract pseudocode to the language of your choice.
> Is a web page that loads in fifty seconds 'correct' or is it wrong?
Neither: it's probably using JavaScript to perform an AJAX request to get some data in a bespoke markup format, then using JavaScript to parse that markup format and build DOM entries, all because the developer thinks HTML is 'so 90s, dude!'
> Is a web page that loads in fifty seconds 'correct' or is it wrong?
Of course it is correct - this is how the internet was/is with a 14k or 56k dialup modem, I don't blame you for not remembering/not experiencing it. I remember watching JPEGs render progressively from a blurry mess one at a time, or sometimes not at all. It took way more than 50 seconds to load and render all elements on a typical website.
Depends on the business constraints. Wrong is not just "returns invalid results".
There's also the notion of wrong or right solution for a particular problem (which the phrase above already assumes). E.g. for a game where things have to run in 60hz, a slow algorithm would be the wrong solution to use.
The reverse is also true: a solution can return wrong results in the absolute sense (e.g. false positives) and still be correct for our business case. This is the case with things like Bloom filters for example.
> Apparently, every computer science problem has easy to understand solution that’s easy to implement but awfully slow in practice.
I have been doing some data structures/algorithms review as I haven't really used much of that knowledge from my CS degree and, frankly, wasn't that great of a student when I was learning it. I've obviously focused mostly on understanding theoretical performance (Big-O) as I've been implementing and playing with things, but I'd like to know about good resources for things to consider about performance on real systems. I hear about things like cache misses, memory layouts and branch mispredictions, but don't really know how to apply that knowledge or, maybe more importantly, measure the effects of optimizing for some of those things over others. I'd appreciate any advice anyone has to offer.
That Big-O thing is mostly theoretical. Most useful application is job interviews.
It might stop you from doing some things, like bubble-sorting 1GB datasets, or inserting many elements to the start of a large array.
However, in other cases it’s just fine to bubble-sort (for very small collections), or insert many elements to the start of the array (when you’re reading much more often so the profit from memory locality while reading outweigh the losses from relocating those elements while inserting).
Modern hardware is extremely complex. No amount of knowledge will help you to 100% predict the performance effects of your changes. Two advices.
(1) Learn to profile your code. Both externally, using a sampling profiler that will tell you where’re you spending the CPU time, and internally using e.g. std::chrono::high_resolution_clock that will tell you the performance effect of your changes.
(2) Don’t forget about rollback feature of your version control, and don’t hesitate using that as necessary. I routinely discover some of my changes I viewed as performance optimizations actually decreased performance. The reasons include branch misprediction, compiler optimization issues, cores competition for L3 cache, RAM bandwidth, thermal issues…
I don’t think engineers working at Ericsson and Apple (the companies listed in the copyright section of that source file) forgot about big-O. What I think happened, when they wrote that, their assumption was people won’t be using those to pass 16MB blobs. And for small messages, O(n^2) works just fine.
I can understand those people. HTTP is better than Web Sockets for large chunks of data. With HTTP, servers and proxies may cache, clients may resume downloads, clients may check for changes with if-modified-since header, and so on.
The problem here isn’t with big-O, it’s with changing world.
If in the future someone will send gigabytes in that event, the current version will break despite O(n), because RAM is finite. Relatively easy to fix, you just need to save data to a hard drive instead or RAM.
Does it mean it’s a good idea to do so now? Don’t think so, it would be over-engineering and premature optimization, and the current version is currently fine. Just like the original O(n^2) code was fine for small messages.
At face value, I don't actually see where you contradict the article. The recursive nature of this solution is akin to recursive matrix multiplication. Which can be a way to setup memory fetches to stay optimal. (Could... I have no data to say it is.)
Any chance you know of any benchmarks exploring this?
To be clear, I can see why coding it purely recursively would be a bad idea. Same for using linked structures, if that was the concern with the quad tree.
However, it seems that you could use the general recursive definition to find some optimizations that one could do. For example, define a recursive structure to break down a problem until it is small enough that a more traditional method works. (So, don't drop down to the size of a pixel, but do drop down till you have things at cache line size?)
Perusing the wikipedia for Strassen, I see that it scoffs at a ~10% benefit for large matrices. Curious how that holds in light of "deep learning" environments. Especially since 1k x 1k matrices are probably not that uncommon nowdays. Seems like every little bit would help.
Edit to add: THinking of the context of the original post. Yeah... probably not needed for anything in a "snappy" user application. I was thinking more for very large data sets.
The mention of "around 10% or less" is based on a 2005 paper, and even in the past ten years there's been a lot of change in computer architecture (the machines used in the paper were mostly ultrasparcs, and they write "Note that for the (pentium 3 system), we could find no problem size for which Strassen’s is faster than ATLAS’s").
That is an interesting note. I remember reading that the lower bound on multiplications was lower due to swapping a lot of them for additions. Knuth indicated that this did not matter in practice, since few people are multiplying large matrices.
My interest here is really just if that is still a good assumption. With the multiplications that happen in "deep" fields, I'm curious if there are now reasons to look to these algorithms.
We might be talking at cross-purposes, my angle is that of dense linear algebra, single or double precision, on NUMA architectures with vector operations, and all respect to Knuth but it requires a different model for the machine.
If that's changed, like for example if you're computing over a different ring where arithmetic ops are expense as compared to cache misses and there's no stability issues, then yes the issue is back open and Strassen is put back on the table.
Knuth seems to be in agreement with you, from my understanding. My comments are only of a general interest and curiosity, not of knowledge. (If that makes sense.)
That is, I certainly was not trying to second guess what you are saying. Just noting my curiosity on whether or not this has, or will, change. Knuth's view that I'm quoting is from Vol 2, which is pretty old at this point. I'd be a little surprised if it was outdated. But not shocked.
Very large matrices is >50kx50k (maybe even bigger). Strassen is then just a tiny bit better when implementing both cache and allocation aware.(Tested in Fortran)
It is always interesting to see just how valuable a "tiny bit" can be, though.
Granted, I confess most is almost always navel gazing. Increases in the "best" sorting/map algorithm have probably not lead to any noticeable improvements in modern codebases. Usually, those benefit more from being the low hanging fruit, not most computationally complex. (That is, I'm sure in these large matrix problems, there are other low hanging fruit that will yield better improvements.)
Recursion is often times less efficient than a well designed and concrete imperative solution, even if the imperative solution uses more memory. This is because most languages, the stack is not free. There's a lot of meta data and house keeping that goes on when you invoke a function to support things like stack traces and introspection.
The provided example (at least to me) is also convoluted and fails to illustrate the elegance of using a recursive data structure.
In general parlance, an affine transform or rotation matrix would both be more elegant and readable.
Although I'm satisfied with the choice I made for what to put in my essay, I want to thank you for pointing out the gaps between my essay and the real world, and for doing so in a forceful way without crossing into being querulous.
This kind of discourse is, IMO, an example of Hacker News at its finest.
Are you aware of a programming language/compiler that optimizes on recursively written code? I'm finding it difficult to imagine a PL where stack _isn't_ free except tail-recursive code.
My understanding is that it's the opposite -- function calls expend a cost to save (and later restore) the registers and push the locals onto the stack. Tail recursive functions on the other hand can be implemented as a jump to the top of the function, right after all the stack bookkeeping, and so saves on that expense.
Tail-merging doesn't only apply to recursive functions. As soon as you don't need to store intermediate result, you can jump to another function without allocating a frame.
Haskell (sometimes) does this kind of thing much more efficiently - it doesn't specify or always provide a stack, it models the program as a graph to be evaluated by reduction. And any language where you can implement continuations allows you to write this kind of thing efficiently (e.g. https://github.com/Atry/memcontinuationed ).
(Ultimately you can do "userspace continuations" in any programming language by explicitly transforming your code into CPS, using an explicit FSM if the language doesn't support tail-calls as such, the only price being developer sanity)
I do think there's a real gap in many languages between the natural way to express graph/tree transformations (recursively) and a cache-friendly/mechanical-sympathy-style runtime - see e.g. https://medium.com/@gkossakowski/kentucky-mule-limits-of-sca... . And really this gap shouldn't be there - the language implementation should be able to transform and fuse this kind of tree-walking operation into an efficient implementation - but at the same time it's hard to see how you'd ever make the language spec permit this without resorting to a full-Haskell "everything is lazy everywhere and the language implementation is allowed to evaluate whatever it likes however many times it likes (including 0) in whatever order".
Maybe an actual stack language like Forth could do so? The stack overhead of calling a word is a lot lower to begin with, and ColorForth for already has tail call optimisation[0]. More advanced concatenative languages like Factor[1] or Kitten[2] might be able to perform more interesting optimisations.
I really enjoyed this article because it introduced me to a new data structure and some very elegant and (subjectively) beautiful algorithms. I came onto HN expecting some insightful commentary and instead got 80 comments griping about how it wasn't 100% optimized.
The algorithm to rotate images can be illustrated by running
/usr/lib/xscreensaver/blitspin
blitspin - rotate a bitmap in an interesting way
The blitspin program repeatedly rotates a bitmap by 90 degrees by using logical operations: the bitmap is divided into quadrants, and the quad‐ rants are shifted clockwise. Then the same thing is done again with progressively smaller quadrants, except that all sub-quadrants of a given size are rotated in parallel. So this takes O(16×log2(N)) blits of size NxN, with the limitation that the image must be square, and the size must be a power of 2.
Movie here, for those without XScreenSaver installed:
Quadtrees are neat. Though I guess general purpose image processing would be better served by either raster or vector representation. Maybe a better showcase would be something like collision detection or geospatial indexing.
I have a bit-rotting ray tracing library where I put all geometric primitives in an oct-tree. When you fire a ray through a 2x2x2 cube it intersects at most 4 of the child nodes. You can see that because the ray must originate in one octant, and it must cross one of the 3 partition planes to reach another octant and it can only cross each one once. There are lots of other interesting properties.
One of my favorite algorithms is one that takes an oct-tree and a triangle and inserts the triangle into the tree. I allow primitives to occupy more than one tree node, so first I decide what size node they will occupy and then I use a recursive algorithm to "voxelize" the triangle. I found a heuristic for determining optimal node size.
The ability for node to contain and arbitrary number of primitives and for each primitive to be in an arbitrary number of nodes resulted in a very interesting data structure that allows fast deletes and inserts of the primitives. But I'm really starting to ramble now.
I've been hoping to put the code online for a long time, but I'm suffering from "it's not cleaned up enough" syndrome. But I'll talk a little here.
I'd been after real time ray tracing since the 1990's so I had a goal of putting both static environments and dynamic objects/players in the same acceleration structure. A lot of the old BSP-tree stuff or other BVH are used only for static geometry and I didn't like that. The problem with those structures is that if you move some things around it may require regeneration of the entire structure. Or at the very least, local updates that result in a different structure than if you regenerated the entire thing.
Part of the reason for that is trying to partition the geometric primitives roughly in two groups so a partition roughly cuts the scene in half. I decided that with a regular octtree structure, which nodes a primitive is contained in could be independent of the other primitives. They may share tree nodes, but I never decide whether to split a node based on content - instead each primitive is inserted into the nodes that make sense for it. This does lead to a few nodes having a lot of content - thing of the node that encloses a triangle fan - but those are rare and you won't typically be shooting lot of rays through them.
Also, with the primitives deciding what nodes they occupy, we can now do local deletes and inserts without having to regenerate the whole world structure.
For inserting a primitive we take its bounding box. If that is outside the root box of the world, I repeatedly double the size of the world by adding a new root node and placing the existing roots children and some new parents inside it - that's a neat recursion too. Next, we traverse the tree from the root calling on only the children that overlap the bounding box until we reach the node size that was requested. This decent is trivial for bounding boxes, but doing it with high precision specifically for triangles was harder.
Once the nodes that need to contain a primitive are reached, it's a matter of adding that primitive to a linked list off the tree node. But each primitive will need to exist in multiple lists. This means not putting the primitives directly in a list, but having a list of little link structures that are added to the nodes list and point back to the primitives.
For deletions, each primitive will need a list of link nodes that point back to it. Deletion consists of following this list of links and deleting them from the list that contains them.
So the links themselves will each be a part of two lists. They are part of a list coming off a primitive and they are part of a list coming off the tree node. It took a lot of thought to shrink the link node data structure, but at present it is 4 pointers in size. It needs to contain two "next" pointers, a "previous" pointer (to allow deletion from the tree nodes list of primitives), and it also has to have pointers to both the tree node AND the primitive.
The reasons for those last two pointers is thus: when checking which primitives a ray intersects, we want to start at the tree node that contains it and quickly get to the primitives for ray/object intersection tests. This is done by following the list of links via their "next" pointer which also following their links to "primitive". For deletions, we start at the primitive and follow the other set of "next" pointers and do deletes from the other list by using both the next and previous pointers. We never need a previous pointer for the list off the primitive because that entire list is always deleted, where the other list may still contain other primitives. There remains a problem that when an Octree node becomes empty, I need to prune back the tree - this is why the links need to also contain a pointer back to the tree node that contained the link. So now we're looking at a link structure that contains 5 pointers {2 nexts, one previous, one primitive, and one treenode}
Then I had the insight that no matter which list we're traversing, we always have a pointer to either the treenode or the primitive. So those two pointers can be combined via sum or XOR into a single value. If your deleting a primitive you just XOR a pointer to the primitive with that value to recover a pointer to the tree node. When you're doing intersection tests, you XOR the treenode pointer with that value to recover a pointer to the primitive. This change resulted in no significant performance change while reducing the structure to a nice size of 4 pointers.
Another problem was that having a linked list with a previous pointer meant the octree node needed to always contain one of these link structures because deletion depended on that "back" pointer pointing at another link. Turns out you can have the back pointer point directly to the "next" pointer of the previous link. That means the octree node now only needs a single pointer to the first "link" node in the list and that links back pointer points directly to that pointer.
So for fitting an arbitrary number of things (primitives) each into an arbitrary number of boxes (tree nodes) we add a single pointer to the thing structure, a single pointer to the box structure, and then use these little link structures that are the size of just 4 pointers.
It was a long journey to arrive at that data structure, but for a while it was real point of pride for me. The structures are nice, but the code is still a bit messy with the remains of several experiments still around. Also notice that the list in one direction does not have a back pointer, so you can never delete a box directly - well you could but it's not as easy. There isn't complete symmetry between the boxes and things.
Before reading this, I would have said that we need to use recursion to model constructs like programming languages that are specifically designed to have a recursive structure. They are naturally recursive because someone designed the language that way. (By contrast, building a basic database-backed website doesn't require any recursion, unless you're modeling a hierarchy.)
But with a compiler, there is the same pattern of converting to a recursive data structure (parsing), performing manipulations, and then converting back to a non-recursive structure (object code generation).
Isn’t `multirec` a function for making hylomorphisms? The recursive dividing is the anamorphism, and the combining of the results is the catamorphism.
As written, however, `multirec` does not efficiently compose. For two constructions to be compatible, they’d have to share their decomposition strategy, including `indivisible` and `divide`. `value` probably composes neatly, but `combine` is a little thorny as the current implementation does two jobs: Processing the data and reconstructing the form of the data.
A composable `multirec` does sound interesting, however. Hmmm...
It's exactly a hylomorphism. Here's a possibly more familiar-looking Haskell form:
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE LambdaCase #-}
import Data.Functor.Foldable
import Data.List.Split
data QuadTreeF a r =
NodeF r r r r
| LeafF a
| EmptyF
deriving Functor
builder :: [a] -> QuadTreeF a [a]
builder = \case
[] -> EmptyF
[x] -> LeafF x
xs -> NodeF a b c d where
[a, b, c, d] = chunksOf (length xs `div` 4) xs
consumer :: QuadTreeF a [a] -> [a]
consumer = \case
EmptyF -> []
LeafF a -> [a]
NodeF ul ur lr ll -> concat [ll, ul, ur, lr]
rotate :: [a] -> [a]
rotate = hylo consumer builder
I liked this article because I enjoy reading about data structures that can be used in places where the "obvious" implementation is a list of lists. A lists of lists data structure is general enough for many applications and readily available in many languages, but when using it, I often get the sense that I am spinning wheels writing far more code than necessary to do the desired manipulations. It makes me happy to read about more novel solutions.
The argument would be stronger if he wasn't swapping an O(n) algorithm for O(n log(n)). With the recursive implementation you have to touch every element once for each recursion level.
For quadtrees come into their own when there are optimizations that fit the problem ___domain, such as the coloured quadtrees explained at the end of the post: They are much faster for images with large blank areas.
They aren’t discussed in the post, but quadtrees are also very amenable to memoizing common operations.
The footnotes say that all the functions can be adjusted to deal with non–power-of-2 squares, but it's not obvious how this is done when working with region quadtrees.
One way is to represent squares as a square-that-is-a-power-of-two, plus a “clipping size” that is used when the square is rastorized. If you’re using a lot of optimizations for blank regions, the extra padding has a negligible effect on performance.
I suppose that would work, but all the operations you do on the region quadtree have to be aware of the "clipping size" as well. For example, the rotation transformation that this article implements would also have to rotate that "clipping size" as the unwanted portion of the square is now in a different ___location.
One can use recursive properties better if the structure divides equally. So if not having a strict quadtree is okay, they can divide the square into 3x3, 5x5, etc
Another approach would be to allow a cell's value to be null, meaning it should be clipped, but then you run the risk of having transforms on the region quadtrees that cause the null cells to migrate inwards into the square (such that they cannot simply be clipped off later).
Yes, that made me lose a lot of confidence in the article. This shouldn't have been a footnote. It should have been stated in the beginning that the author was only considering powers of two, and the footnote should have been a reference to ways to deal with other sizes - my very first question, before I got to the footnotes, was "How would I deal with a 3x3 square?"
JavaScript is the language most likely to be familiar to the audience I'm trying to reach. But at a certain point, it gets in the way of explaining an idea rather than making it easier.
I think this essay stays on the right side of that trade-off.
Wait, the way he defines the quad tree color property requires that you manually input the color for each region. If you wanted this to be part of a general purpose algorithm in real software then you'd need another algorithm that determines the color of each region - a very expensive operation if you wanted the maximum benefit.
I stopped reading at the first sentence when he improperly used `isomorphic`. I thought it made no sense at all so I looked up the footnote... just to get a big middle finger from the author.
> When someone else uses it in conversation, it's absolutely guaranteed that the person is a pretentious asshole.
I don't get your angle here. You realise an isomorphism is actually a well-defined mathematical concept, right? Are you advocating we use another, plainer word for an isomorphism, or do just naturally get angry when you hear words you don't understand?
Considering that TFA pokes fun at the word in the very first footnote, I took this comment to be mostly in jest. But if we’re going to be serious about it, technical jargon is really only appropriate when the speaker is 100% sure the audience 100% understands it.
There is sometimes a problem (I wouldn't use personal terms to describe it, but still) where technical terms are slung about without regard for whether people understand them. That’s poor communication, and as a bad side-effect, it encourages people to misuse the words in ways that don’t match the formal definitions.
In any event, if a comment is criticizing the article for using the word “isomorphism,” we could quite fairly ask if there is a bijective between algorithm and data structure, and whether the morphisms preserve behaviour.
> But if we’re going to be serious about it, technical jargon is really only appropriate when the speaker is 100% sure the audience 100% understands it.
I disagree. Half the reason I come here is to be exposed to new ideas or concepts. This isn't a lecture, I'll gladly take time to learn about a new concept if it seems relevant.
Fair enough, and here is an interesting thing: In a talk, if you stop to explain every technical term, you break up the flow and make it tedious for those who know the words.
But in a web article, we have hyperlinks, footnotes, asides, and other devices for allowing some readers to skim and others to dive in.
So to me, the choice for whether to use a word in an essay is different than the choice for whether to use a word in a talk at a conference, which is different than the choice for whether to use a word in a conversation where I am familiar with the participants.
Since we're on the subject of recursion, there is a problem when somebody uses a term that they only understand 90%. Somebody else reads that and understands 90%, and then they use, and then another person reads it...
And then we arrive in the current situation where people use terms like "rainbow table" or "HFT" to describe who knows what.
With that standard, we'd all be stuck at grunting.
Words exist for a reason. They allow precision, brevity, and – as Reginald so often shows – beauty. I expect readers to be willing to right-click an unknown word and quickly learn its meaning.
I'd also expect programmers to know what isomorphic means. Heck, you could even guess it: iso, like the standards. Morph as in morphology, "shape".
Pretentious or not, the term is used erroneously here.
Recursive data structures and recursive algorithms are, quite plainly, not isomorphic. They are not two equivalent representations of the same thing.
Something that is isomorphic to a recursive function that works on a quadtree would be an imperative function that works on a quadtree with identical functionality.
I don’t know that an algorithm and a data structure have to be equivalent representations of the same thing to be isomorphic. In the case of the algorithm, is it the code that must be equivalent? Or the runtime behaviour?
The runtime behaviour of the ‘whirling regions’ algorithm clearly is not an equivalent representation of the two-dimensional array representation of an image. However, the runtime behaviour of the quadtree algorithm exactly matches the quadtree data structure.
This is my proposition: That what matters is whether the algorithmic behaviour is equivalent. I would say the same thing about `binrec` matching a binary tree.
...Or at least, I would have, up until today. I am open to being disabused of this notion.
Isomorphic means "have the same shape". It means that there is some structural property that the two objects have in common. The more interesting/complex the property is (and the more superficially different the two objects look), the more defensible the use of the word is.
I am following the mathmatical definiton of isomorphism:
The two sets are Quadtree instances and calls of `myself`. The map from calls to instances is "instance that was given as argument". This map is obviously bijective. The structure that is preserved is "is a child of".
> They are not two equivalent representations of the same thing.
Algorithms encoded as a program end up as an abstract syntax tree in the compiler. Algorithm -> Data structure. The compiler then turns that ast into some other representation (e.g. machine instructions, JavaScript) which is again the algorithm. Data structure -> algorithm. Bijection identified.
An isomorphism is more than a bijection, unless it is between sets and the category is the one of all functions from one set to another.
Isomorphism is about preserving relations as well. For example, consider the ring of integers and the field of rational numbers. One can construct a bijection between one to the other (they both have the same cardinality as the natural numbers), but they are significantly different constructs. The bijection would not be able to preserve the operations - in fact, the rational numbers as a ring is significantly different, and unless one wants to use information about the cardinality, a bijection between the two is otherwise worthless.
Correct. A bijection implies isomorphism in the category of sets. Thus an isomorphism exists. I suppose we need to convince ourselves that algorithms and abstract syntax trees do indeed form sets. (Exclude such things as "the algorithm that computes the set of all algorithms", etc.)
But that is typically not how the word is used because to substitute bijection with isomorphism, it would only make sense when talking about cardinality - that is not how you used it.
Even with a non-mathematical definition such as yours, I don't see the isomorphism. Recursive code that acts on a quadtree does not itself take the shape of a quadtree. Not the text, not the AST, not the machine code.
The code is treelike, but that's because all code is treelike(1).
As a counterexample, consider code that acts on arbitrary graphs. DFS on an arbitrary graph doesn't itself take the form of an arbitrary graph. It is still treelike.
(1) My old Atari 800 BASIC programs notwithstanding.
The thing that has the shape of a quadtree is the callgraph. That means there is one call for every instance of the data class, which makes it easy to reason about the code.
OK so the callgraph is isomorphic in some sense when using recursion. If this was the intent of the author, it was not clear.
I would still argue that call-graph isomorphism to the data structures they work is of little relevance to software practitioners.
1. Multithreaded algorithms acting on trees do not necessarily have callgraphs that are isomorphic to the trees.
2. As I mentioned elsewhere, arbitrary graph data structures, or even the more constrained subset of DAGS, are not isomorphic to call graphs of the algorithms that work on them. Recursive algorithms have the same exact applicability to these structures as they do quadtrees.
3. Also as I mentioned elsewhere, recursive algorithms can be implemented with loops and an explicit stack without any function calls at all. This is at times the best way to implement algorithms on trees (particularly where trees are deep and the platform has a small stack - yes I've had to do rewrite recursive code because it overflowed a small, non-configurable stack).
I was speaking to the relationship between the code's runtime behaviour and the structure of the data.
When I say the word "algorithm," am I talking about the way the code is noted in a particular language? Or the steps one takes to perform the operation?
In some other languages, even the notation itself has obvious parallels between data definition and process definition. Consider a list and `map` in Standard ML:
datatype 'a list =
Nil
| Cons of 'a * 'a list;
val rec map : ('a -> 'b) -> 'a list -> 'b list =
fn f => fn xs =>
case xs of
Nil => Nil
| Cons(x, xs') => Cons(f x, map f xs');
The case analysis of a list is obviously isomorphic to the definition of a list, in the sense that the data and the process have the same shape: a list is a `Nil` or a `Cons(item, more list)`, while a function that traverses a list checks if a list is `Nil` and does something, otherwise it's a `Cons(...)` and it does something else. The linear recursive nature of the list definition lends itself naturally to algorithms which are similarly linearly recursive. (`map` itself can be said to create homomorphisms, but that's beside the point here; I use this particular example only because it happens to be the simplest non-trivial one I could think of).
Okay, so perhaps this isn't quite what "isomorphic" means in category theory, or topology, or graph theory, or set theory, or what have you. Even in mathematics, though, it's obvious that terminology gets borrowed between the branches when there are concepts that are somehow related. Why should programming be any different? So, I argue that it's not wrong to use "isomorphic" in the sense that you have—in fact, it's a term that I've seen used in the same sense on numerous occasions, and something that I think experienced programmers have internalized. In this case, the structure of the data, that is, a linearly recursive collection of elements, precisely matches the structure of the process, that is, a linearly recursive traversal of elements. Whether or not there actually exists an isomorphism between the two is irrelevant to the notion that the two are isomorphic, that is, they have "equal shapes".
I think a habit of thinking of words in an etymological way leads naturally to borrowing those words for similar concepts. This is neither the first time such a thing has happened, nor will it be the last. To knowledgeable parties, your use of the word communicated a characteristic of your program in a succinct way that you continued to illustrate—that the shape of the quad tree data structure reflects the shape of the algorithms used to process instances of said structure.
One might say that the various meanings of the term "isomorphic" are... isomorphic ¬_¬
> You realise an isomorphism is actually a well-defined mathematical concept, right?
I think a prime example of how words loose their precise mathematical meaning is the word "function". It has a precise definition in maths but we programmers use it often with a different meaning. We used to have the terms "subroutine" and "procedure" for what we use "function" for nowadays. We even came up with the term "pure function" to refer to the original mathematical meaning of "function".
I found the article to be very pretentious too. Although the author doesn't really try to convince you otherwise, if you read the footnotes: "In computer science, two things are isomorphic if the person explaining a concept wishes to seem educated.". But you know, if you claim that isomorphism makes the code easier to understand, support the claim by at least something. If anything the code with the "isomorphism" looks unnecessary complicated and harder to understand.
Using the same didactic style as the author, I would like to show how it is possible to replace the image rotation algorithm with a NOOP. First, look at any of the pictures shown. Next, turn your head to the side 90 degrees. Although the computer's output hasn't chamged, by redefining requirements we can achieve the the desired change. The fastest way to rotate a square is a noop (or O(1) message to the user): turning your head. Never forget it!
This is one of them.
Rotating images that way isn’t CPU bound problem. It’s RAM bandwidth and RAM latency bound problem. The optimal data structure for that is not recursive, it’s 2D array of small square blocks of pixels. If pixels are bits, I’d bet on either 8x8 or 16x16 blocks. If they are RGB, one good format for those images, optimized for rotations, is BC7 https://www.opengl.org/registry/specs/ARB/texture_compressio... the decompressor being implemented right in the graphics hardware.
> If we were writing an image manipulation application, we’d provide much snappier behaviour using coloured quadtrees to represent images on screen.
That statement was true 30 years ago, no longer a case. Branch misprediction is expensive, memory latency is huge, RAM is a block device now (the block size typically being 16 bytes for dual-channel DDR). In addition, some displays now show 8+ megapixels, and most show 16M colors.
For an image manipulation application, that recursive thing ain’t good at all.