Hacker News new | past | comments | ask | show | jobs | submit login
Why recursive data structures? (raganwald.com)
404 points by udfalkso on Jan 2, 2017 | hide | past | favorite | 121 comments



Apparently, every computer science problem has easy to understand solution that’s easy to implement but awfully slow in practice.

This is one of them.

Rotating images that way isn’t CPU bound problem. It’s RAM bandwidth and RAM latency bound problem. The optimal data structure for that is not recursive, it’s 2D array of small square blocks of pixels. If pixels are bits, I’d bet on either 8x8 or 16x16 blocks. If they are RGB, one good format for those images, optimized for rotations, is BC7 https://www.opengl.org/registry/specs/ARB/texture_compressio... the decompressor being implemented right in the graphics hardware.

> If we were writing an image manipulation application, we’d provide much snappier behaviour using coloured quadtrees to represent images on screen.

That statement was true 30 years ago, no longer a case. Branch misprediction is expensive, memory latency is huge, RAM is a block device now (the block size typically being 16 bytes for dual-channel DDR). In addition, some displays now show 8+ megapixels, and most show 16M colors.

For an image manipulation application, that recursive thing ain’t good at all.


>> we’d provide much snappier behaviour using coloured quadtrees

> That statement was true 30 years ago, no longer a case.

The colored quadtrees are an optimization over the plain quadtrees earlier in the article, reducing the number of comparisons for some images. That will be more efficient 300 years from now just as much as it was 30 years ago. In context it is perfectly obvious that "much snappier" does not refer to the totally different techniques used by actual graphics software.

It's a toy implementation used as an example, if you get hung up on the example, you miss the whole point.


>It's a toy implementation used as an example, if you get hung up on the example, you miss the whole point.

When to use such techniques, and whether they make sense in the real world, performance wise, should have been part of the "whole point" all along.


By that argument, any blog post that does not contain a comprehensive software development education can be criticized.

You should ask whether these techniques are useful in writing correct code and if they save developer effort, rather than picking apart irrelevant aspects of the toy example.

I don't think modern graphics code represents individual pixels as strings either, would you dismiss the post for that reason?


> The optimal data structure for that is not recursive, it’s 2D array of small square blocks of pixels. If pixels are bits, I’d bet on either 8x8 or 16x16 blocks.

If the image is very large and contains large contiguous regions of solid colors, then a quadtree that doesn't subdivide homogeneous regions can be a much faster approach than a flat pile-o-pixels.

Either way, this article isn't about rotating images.


Well, hierarchical representation of those images is indeed a good idea for some data. Saves both RAM space and bandwidth, and CPU time.

Just not the quad-tree way described in the article.

There are ways to implement hierarchical structures that work well on modern hardware. One is hashmaps keeping dense 2 bits per pixel blocks (again, 16-64 bytes per block) with the values saying filled/empty/mixed. Another one is B-trees or B+ trees.


How would you improve on the quad-tree representation used to implement hashlife? http://www.drdobbs.com/jvm/an-algorithm-for-compressing-spac...


I wouldn’t.

I am not convinced hashlife is still the best of them. The invention is from early 80-s. Many old heuristics about caching versus calculation is no longer true today. Here’s my article on the related topic, lookup tables: https://github.com/Const-me/LookupTables

If I wanted to implement efficient life simulator, I’d probably used the fact that with AVX2, it’s possible to count those neighbors for 8x8 block of pixels in a single __m256i register (with 8 neighbors max, 4 bits / value is more than enough). Might be faster than L3 cache access, and will definitely be much faster than main memory access.

That’s why I’d put those pixels in 8x8 blocks, single uint64_t value per block.

Putting those blocks in a hash table takes care about the infinite field. std::unordered_map will work, but if you care about performance prefer CAtlMap from Microsoft, dense_hash_map from Google, or something similar.


I believe most modern implementations, including hybrid implementations of HashLife, take advantage of register and cache sizes. HashLife as I might implement it in JavaScript, is really a very naïve approach. I think Bill Gosper even had versions in the 1970s that did things with registers and assembly, not just doing everything in lists.

But the big issue with HashLife is not whether you can write a faster implementation, it's whether the particular pattern(s) you are exploring are amenable to HashLife. It wins for highly repetitive patterns, but is much slower for patterns with a very large amount of entropy in time and space, like methuselahs.

HashLife can be fabulous for exploring purpose-built machines that cycle though regular states.


Looking at golly-2.8-src. At the first sight, it does not use SIMD. Only uses 32-bit registers in 32-bit builds, 64-bit registers in 64-bit builds.

All x86 CPUs manufactured after 2003 have at least 16 128-bit SIMD registers. Using them for calculations is the only way to get anywhere near advertised CPU performance.

For languages other than C and C++, like JavaScript, where there’s no SIMD instructions, calculations are generally slower, but the hash tables are provided by the platform and hence implemented in highly-optimized C code, HashLife should be perfectly fine.


For HashLife, I think the chief problems are around an efficient cache, and especially efficient canonicalization. The algorithm is so fast for universes that suit its ___domain that I think most other considerations drop by the wayside.

However, besides the fact that not all problems have enough redundancy in time+space to suit it, it also suffers from the fact that it is most efficient when computing massive jumps in generations. So it’s as not as performant for creating animations as it is for doing things like finding the maximum size of a pattern, calculating its growth rate, and so forth.


For JavaScript you might do quite well with a 32-way bit-slice implementation, like http://fanf.livejournal.com/93032.html That is, if you can't do it on the GPU with WebGL :-)


>If the image is very large and contains large contiguous regions of solid colors

In other words, if the image is not very image-y? (but more vector-y).


It's an entirely plausible scenario. It will rarely be the case with photographs, but graphical images often feature large solid color blocks. It's large reason why the PNG file format exists.


You might still want the precision and detail at the boundaries of the shapes that a raster representation would give you. Think big blotchy cow spots or something along those lines.


I don't think the point of the article was to present an efficient and novel solution for image rotation. It was just a nicely visual example of a recursive data structure.


   For every complex problem there is an answer that is clear, simple, and wrong.


A slow solution isn't wrong. Just inefficient.


We're talking about several orders of magnitude here.

Is a web page that loads in fifty seconds 'correct' or is it wrong? Is a video format that can only decode 2 frames per second correct, or wrong?

Often, timeliness is an unstated part of the requirements, in which case an academically pleasing solution that isn't practical doesn't fulfill the requirements.


I think you're missing the point. A quadtree is a compact way to store a partition of two-dimensional space, much like you'd store a binary heap in an array.

The implementation of a recursive data structure or algorithm can usually take advantage of tail recursion or trampolining. The practical implementation of a theoretical algorithm may look extremely different, much like you wouldn't literally translate abstract pseudocode to the language of your choice.


>Is a web page that loads in fifty seconds 'correct' or is it wrong? Is a video format that can only decode 2 frames per second correct, or wrong?

If your condition of satisfaction is "I am presenting information to someone who will read it", then "loads in fifty seconds" is wrong, yes.


> Is a web page that loads in fifty seconds 'correct' or is it wrong?

Neither: it's probably using JavaScript to perform an AJAX request to get some data in a bespoke markup format, then using JavaScript to parse that markup format and build DOM entries, all because the developer thinks HTML is 'so 90s, dude!'


> Is a web page that loads in fifty seconds 'correct' or is it wrong?

Of course it is correct - this is how the internet was/is with a 14k or 56k dialup modem, I don't blame you for not remembering/not experiencing it. I remember watching JPEGs render progressively from a blurry mess one at a time, or sometimes not at all. It took way more than 50 seconds to load and render all elements on a typical website.


Depends on the business constraints. Wrong is not just "returns invalid results".

There's also the notion of wrong or right solution for a particular problem (which the phrase above already assumes). E.g. for a game where things have to run in 60hz, a slow algorithm would be the wrong solution to use.

The reverse is also true: a solution can return wrong results in the absolute sense (e.g. false positives) and still be correct for our business case. This is the case with things like Bloom filters for example.


> Apparently, every computer science problem has easy to understand solution that’s easy to implement but awfully slow in practice.

I have been doing some data structures/algorithms review as I haven't really used much of that knowledge from my CS degree and, frankly, wasn't that great of a student when I was learning it. I've obviously focused mostly on understanding theoretical performance (Big-O) as I've been implementing and playing with things, but I'd like to know about good resources for things to consider about performance on real systems. I hear about things like cache misses, memory layouts and branch mispredictions, but don't really know how to apply that knowledge or, maybe more importantly, measure the effects of optimizing for some of those things over others. I'd appreciate any advice anyone has to offer.


That Big-O thing is mostly theoretical. Most useful application is job interviews.

It might stop you from doing some things, like bubble-sorting 1GB datasets, or inserting many elements to the start of a large array.

However, in other cases it’s just fine to bubble-sort (for very small collections), or insert many elements to the start of the array (when you’re reading much more often so the profit from memory locality while reading outweigh the losses from relocating those elements while inserting).

Modern hardware is extremely complex. No amount of knowledge will help you to 100% predict the performance effects of your changes. Two advices.

(1) Learn to profile your code. Both externally, using a sampling profiler that will tell you where’re you spending the CPU time, and internally using e.g. std::chrono::high_resolution_clock that will tell you the performance effect of your changes.

(2) Don’t forget about rollback feature of your version control, and don’t hesitate using that as necessary. I routinely discover some of my changes I viewed as performance optimizations actually decreased performance. The reasons include branch misprediction, compiler optimization issues, cores competition for L3 cache, RAM bandwidth, thermal issues…


For a dose of real-world big-O I recommend following http://accidentallyquadratic.tumblr.com


I don’t think engineers working at Ericsson and Apple (the companies listed in the copyright section of that source file) forgot about big-O. What I think happened, when they wrote that, their assumption was people won’t be using those to pass 16MB blobs. And for small messages, O(n^2) works just fine.

I can understand those people. HTTP is better than Web Sockets for large chunks of data. With HTTP, servers and proxies may cache, clients may resume downloads, clients may check for changes with if-modified-since header, and so on.

The problem here isn’t with big-O, it’s with changing world.

If in the future someone will send gigabytes in that event, the current version will break despite O(n), because RAM is finite. Relatively easy to fix, you just need to save data to a hard drive instead or RAM.

Does it mean it’s a good idea to do so now? Don’t think so, it would be over-engineering and premature optimization, and the current version is currently fine. Just like the original O(n^2) code was fine for small messages.


At face value, I don't actually see where you contradict the article. The recursive nature of this solution is akin to recursive matrix multiplication. Which can be a way to setup memory fetches to stay optimal. (Could... I have no data to say it is.)

Any chance you know of any benchmarks exploring this?


Recursive matrix multiplication is in general a bad idea (Strassen, for example, loses a little bit of stability and is not cache-friendly)


To be clear, I can see why coding it purely recursively would be a bad idea. Same for using linked structures, if that was the concern with the quad tree.

However, it seems that you could use the general recursive definition to find some optimizations that one could do. For example, define a recursive structure to break down a problem until it is small enough that a more traditional method works. (So, don't drop down to the size of a pixel, but do drop down till you have things at cache line size?)

Perusing the wikipedia for Strassen, I see that it scoffs at a ~10% benefit for large matrices. Curious how that holds in light of "deep learning" environments. Especially since 1k x 1k matrices are probably not that uncommon nowdays. Seems like every little bit would help.

Edit to add: THinking of the context of the original post. Yeah... probably not needed for anything in a "snappy" user application. I was thinking more for very large data sets.


The mention of "around 10% or less" is based on a 2005 paper, and even in the past ten years there's been a lot of change in computer architecture (the machines used in the paper were mostly ultrasparcs, and they write "Note that for the (pentium 3 system), we could find no problem size for which Strassen’s is faster than ATLAS’s").

See also : https://www.quora.com/What-algorithm-does-BLAS-use-for-matri... https://github.com/flame/how-to-optimize-gemm/wiki#notice-on... http://www.cs.utexas.edu/~flame/pubs/GotoTOMS_revision.pdf


That is an interesting note. I remember reading that the lower bound on multiplications was lower due to swapping a lot of them for additions. Knuth indicated that this did not matter in practice, since few people are multiplying large matrices.

My interest here is really just if that is still a good assumption. With the multiplications that happen in "deep" fields, I'm curious if there are now reasons to look to these algorithms.


We might be talking at cross-purposes, my angle is that of dense linear algebra, single or double precision, on NUMA architectures with vector operations, and all respect to Knuth but it requires a different model for the machine.

If that's changed, like for example if you're computing over a different ring where arithmetic ops are expense as compared to cache misses and there's no stability issues, then yes the issue is back open and Strassen is put back on the table.


Knuth seems to be in agreement with you, from my understanding. My comments are only of a general interest and curiosity, not of knowledge. (If that makes sense.)

That is, I certainly was not trying to second guess what you are saying. Just noting my curiosity on whether or not this has, or will, change. Knuth's view that I'm quoting is from Vol 2, which is pretty old at this point. I'd be a little surprised if it was outdated. But not shocked.


Very large matrices is >50kx50k (maybe even bigger). Strassen is then just a tiny bit better when implementing both cache and allocation aware.(Tested in Fortran)

At least it's at a non practical size.


It is always interesting to see just how valuable a "tiny bit" can be, though.

Granted, I confess most is almost always navel gazing. Increases in the "best" sorting/map algorithm have probably not lead to any noticeable improvements in modern codebases. Usually, those benefit more from being the low hanging fruit, not most computationally complex. (That is, I'm sure in these large matrix problems, there are other low hanging fruit that will yield better improvements.)


Recursion is often times less efficient than a well designed and concrete imperative solution, even if the imperative solution uses more memory. This is because most languages, the stack is not free. There's a lot of meta data and house keeping that goes on when you invoke a function to support things like stack traces and introspection.

The provided example (at least to me) is also convoluted and fails to illustrate the elegance of using a recursive data structure.

In general parlance, an affine transform or rotation matrix would both be more elegant and readable.


Although I'm satisfied with the choice I made for what to put in my essay, I want to thank you for pointing out the gaps between my essay and the real world, and for doing so in a forceful way without crossing into being querulous.

This kind of discourse is, IMO, an example of Hacker News at its finest.


Are you aware of a programming language/compiler that optimizes on recursively written code? I'm finding it difficult to imagine a PL where stack _isn't_ free except tail-recursive code.


My understanding is that it's the opposite -- function calls expend a cost to save (and later restore) the registers and push the locals onto the stack. Tail recursive functions on the other hand can be implemented as a jump to the top of the function, right after all the stack bookkeeping, and so saves on that expense.


Tail-merging doesn't only apply to recursive functions. As soon as you don't need to store intermediate result, you can jump to another function without allocating a frame.


Yes, this is right


Haskell (sometimes) does this kind of thing much more efficiently - it doesn't specify or always provide a stack, it models the program as a graph to be evaluated by reduction. And any language where you can implement continuations allows you to write this kind of thing efficiently (e.g. https://github.com/Atry/memcontinuationed ).

(Ultimately you can do "userspace continuations" in any programming language by explicitly transforming your code into CPS, using an explicit FSM if the language doesn't support tail-calls as such, the only price being developer sanity)

I do think there's a real gap in many languages between the natural way to express graph/tree transformations (recursively) and a cache-friendly/mechanical-sympathy-style runtime - see e.g. https://medium.com/@gkossakowski/kentucky-mule-limits-of-sca... . And really this gap shouldn't be there - the language implementation should be able to transform and fuse this kind of tree-walking operation into an efficient implementation - but at the same time it's hard to see how you'd ever make the language spec permit this without resorting to a full-Haskell "everything is lazy everywhere and the language implementation is allowed to evaluate whatever it likes however many times it likes (including 0) in whatever order".


Maybe an actual stack language like Forth could do so? The stack overhead of calling a word is a lot lower to begin with, and ColorForth for already has tail call optimisation[0]. More advanced concatenative languages like Factor[1] or Kitten[2] might be able to perform more interesting optimisations.

[0] http://wiki.c2.com/?TailCallOptimization

[1] https://factorcode.org/

[2] http://kittenlang.org/


I believe even GCC optimizes tail recursion in some cases.


I really enjoyed this article because it introduced me to a new data structure and some very elegant and (subjectively) beautiful algorithms. I came onto HN expecting some insightful commentary and instead got 80 comments griping about how it wasn't 100% optimized.

Sigh.


Author here. Nevertheless, there's learning gold in those comments. Take the gold, leave the mud.


  maxIndex = size - 1
  for x in range(0, size):
    for y in range(0, size):
      # swap diagonally, invert horizontally
      out[x][y] = in[maxIndex - y][x]


The algorithm to rotate images can be illustrated by running

    /usr/lib/xscreensaver/blitspin
blitspin - rotate a bitmap in an interesting way

The blitspin program repeatedly rotates a bitmap by 90 degrees by using logical operations: the bitmap is divided into quadrants, and the quad‐ rants are shifted clockwise. Then the same thing is done again with progressively smaller quadrants, except that all sub-quadrants of a given size are rotated in parallel. So this takes O(16×log2(N)) blits of size NxN, with the limitation that the image must be square, and the size must be a power of 2.

Movie here, for those without XScreenSaver installed:

https://www.youtube.com/watch?v=UTtcwb-UWW8


Thanks, I have included the video and a credit to you in the essay.


Quadtrees are neat. Though I guess general purpose image processing would be better served by either raster or vector representation. Maybe a better showcase would be something like collision detection or geospatial indexing.


I have a bit-rotting ray tracing library where I put all geometric primitives in an oct-tree. When you fire a ray through a 2x2x2 cube it intersects at most 4 of the child nodes. You can see that because the ray must originate in one octant, and it must cross one of the 3 partition planes to reach another octant and it can only cross each one once. There are lots of other interesting properties.

One of my favorite algorithms is one that takes an oct-tree and a triangle and inserts the triangle into the tree. I allow primitives to occupy more than one tree node, so first I decide what size node they will occupy and then I use a recursive algorithm to "voxelize" the triangle. I found a heuristic for determining optimal node size.

The ability for node to contain and arbitrary number of primitives and for each primitive to be in an arbitrary number of nodes resulted in a very interesting data structure that allows fast deletes and inserts of the primitives. But I'm really starting to ramble now.


Could you go on a bit more (or post a link to somewhere you do)?


I've been hoping to put the code online for a long time, but I'm suffering from "it's not cleaned up enough" syndrome. But I'll talk a little here.

I'd been after real time ray tracing since the 1990's so I had a goal of putting both static environments and dynamic objects/players in the same acceleration structure. A lot of the old BSP-tree stuff or other BVH are used only for static geometry and I didn't like that. The problem with those structures is that if you move some things around it may require regeneration of the entire structure. Or at the very least, local updates that result in a different structure than if you regenerated the entire thing.

Part of the reason for that is trying to partition the geometric primitives roughly in two groups so a partition roughly cuts the scene in half. I decided that with a regular octtree structure, which nodes a primitive is contained in could be independent of the other primitives. They may share tree nodes, but I never decide whether to split a node based on content - instead each primitive is inserted into the nodes that make sense for it. This does lead to a few nodes having a lot of content - thing of the node that encloses a triangle fan - but those are rare and you won't typically be shooting lot of rays through them.

Also, with the primitives deciding what nodes they occupy, we can now do local deletes and inserts without having to regenerate the whole world structure.

For inserting a primitive we take its bounding box. If that is outside the root box of the world, I repeatedly double the size of the world by adding a new root node and placing the existing roots children and some new parents inside it - that's a neat recursion too. Next, we traverse the tree from the root calling on only the children that overlap the bounding box until we reach the node size that was requested. This decent is trivial for bounding boxes, but doing it with high precision specifically for triangles was harder.

Once the nodes that need to contain a primitive are reached, it's a matter of adding that primitive to a linked list off the tree node. But each primitive will need to exist in multiple lists. This means not putting the primitives directly in a list, but having a list of little link structures that are added to the nodes list and point back to the primitives.

For deletions, each primitive will need a list of link nodes that point back to it. Deletion consists of following this list of links and deleting them from the list that contains them.

So the links themselves will each be a part of two lists. They are part of a list coming off a primitive and they are part of a list coming off the tree node. It took a lot of thought to shrink the link node data structure, but at present it is 4 pointers in size. It needs to contain two "next" pointers, a "previous" pointer (to allow deletion from the tree nodes list of primitives), and it also has to have pointers to both the tree node AND the primitive.

The reasons for those last two pointers is thus: when checking which primitives a ray intersects, we want to start at the tree node that contains it and quickly get to the primitives for ray/object intersection tests. This is done by following the list of links via their "next" pointer which also following their links to "primitive". For deletions, we start at the primitive and follow the other set of "next" pointers and do deletes from the other list by using both the next and previous pointers. We never need a previous pointer for the list off the primitive because that entire list is always deleted, where the other list may still contain other primitives. There remains a problem that when an Octree node becomes empty, I need to prune back the tree - this is why the links need to also contain a pointer back to the tree node that contained the link. So now we're looking at a link structure that contains 5 pointers {2 nexts, one previous, one primitive, and one treenode}

Then I had the insight that no matter which list we're traversing, we always have a pointer to either the treenode or the primitive. So those two pointers can be combined via sum or XOR into a single value. If your deleting a primitive you just XOR a pointer to the primitive with that value to recover a pointer to the tree node. When you're doing intersection tests, you XOR the treenode pointer with that value to recover a pointer to the primitive. This change resulted in no significant performance change while reducing the structure to a nice size of 4 pointers.

Another problem was that having a linked list with a previous pointer meant the octree node needed to always contain one of these link structures because deletion depended on that "back" pointer pointing at another link. Turns out you can have the back pointer point directly to the "next" pointer of the previous link. That means the octree node now only needs a single pointer to the first "link" node in the list and that links back pointer points directly to that pointer.

So for fitting an arbitrary number of things (primitives) each into an arbitrary number of boxes (tree nodes) we add a single pointer to the thing structure, a single pointer to the box structure, and then use these little link structures that are the size of just 4 pointers.

It was a long journey to arrive at that data structure, but for a while it was real point of pride for me. The structures are nice, but the code is still a bit messy with the remains of several experiments still around. Also notice that the list in one direction does not have a back pointer, so you can never delete a box directly - well you could but it's not as easy. There isn't complete symmetry between the boxes and things.

Is that enough rambling for now? ;-)


Nice puzzle! I think you can allow both kinds of deletions while still using four pointers per link node:

* prev1 XOR next1

* prev2 XOR next2

* prev1 XOR prev2

* data1 XOR data2

Because while traversing you always know one of {prev1, prev2} and one of {data1, data2}. Right?


If you really want to show quadtrees off...

http://raganwald.com/hashlife/

But yes, I’d love to read more about things like collision detection and geospatial indexing.


(You probably know this but) quadtrees are indeed used extensively for both of those things.

Whole buncha programs can use them (though there are other good options as well) for geospatial:

https://www.elastic.co/guide/en/elasticsearch/reference/curr...


Before reading this, I would have said that we need to use recursion to model constructs like programming languages that are specifically designed to have a recursive structure. They are naturally recursive because someone designed the language that way. (By contrast, building a basic database-backed website doesn't require any recursion, unless you're modeling a hierarchy.)

But with a compiler, there is the same pattern of converting to a recursive data structure (parsing), performing manipulations, and then converting back to a non-recursive structure (object code generation).


Reginald is definitely hinting at hylomorphisms here. I'd be curious to see if stream fusion could be achieved in JS for a hylomorphism.


Isn’t `multirec` a function for making hylomorphisms? The recursive dividing is the anamorphism, and the combining of the results is the catamorphism.

As written, however, `multirec` does not efficiently compose. For two constructions to be compatible, they’d have to share their decomposition strategy, including `indivisible` and `divide`. `value` probably composes neatly, but `combine` is a little thorny as the current implementation does two jobs: Processing the data and reconstructing the form of the data.

A composable `multirec` does sound interesting, however. Hmmm...


It's exactly a hylomorphism. Here's a possibly more familiar-looking Haskell form:

    {-# LANGUAGE DeriveFunctor #-}
    {-# LANGUAGE LambdaCase #-}

    import Data.Functor.Foldable
    import Data.List.Split

    data QuadTreeF a r =
        NodeF r r r r
      | LeafF a
      | EmptyF
      deriving Functor

    builder :: [a] -> QuadTreeF a [a]
    builder = \case
      []  -> EmptyF
      [x] -> LeafF x
      xs  -> NodeF a b c d where
        [a, b, c, d] = chunksOf (length xs `div` 4) xs

    consumer :: QuadTreeF a [a] -> [a]
    consumer = \case
      EmptyF            -> []
      LeafF a           -> [a]
      NodeF ul ur lr ll -> concat [ll, ul, ur, lr]

    rotate :: [a] -> [a]
    rotate = hylo consumer builder


This is worth it, if only for the footnotes. Don't skip them!


I liked this article because I enjoy reading about data structures that can be used in places where the "obvious" implementation is a list of lists. A lists of lists data structure is general enough for many applications and readily available in many languages, but when using it, I often get the sense that I am spinning wheels writing far more code than necessary to do the desired manipulations. It makes me happy to read about more novel solutions.


The argument would be stronger if he wasn't swapping an O(n) algorithm for O(n log(n)). With the recursive implementation you have to touch every element once for each recursion level.


For quadtrees come into their own when there are optimizations that fit the problem ___domain, such as the coloured quadtrees explained at the end of the post: They are much faster for images with large blank areas.

They aren’t discussed in the post, but quadtrees are also very amenable to memoizing common operations.


The footnotes say that all the functions can be adjusted to deal with non–power-of-2 squares, but it's not obvious how this is done when working with region quadtrees.


One way is to represent squares as a square-that-is-a-power-of-two, plus a “clipping size” that is used when the square is rastorized. If you’re using a lot of optimizations for blank regions, the extra padding has a negligible effect on performance.


I suppose that would work, but all the operations you do on the region quadtree have to be aware of the "clipping size" as well. For example, the rotation transformation that this article implements would also have to rotate that "clipping size" as the unwanted portion of the square is now in a different ___location.


That depends If you can live with evenly-sized squares, you can always center the clipping square, and you’re good for rotations, flips, and so on.

If you want odd-sized squares, or other shapes, then yes, you’ll need to know how to transform the clipping path/description as well.

All in a day’s work!


One can use recursive properties better if the structure divides equally. So if not having a strict quadtree is okay, they can divide the square into 3x3, 5x5, etc


For odd size squares, simply duplicate the center row and column before rotating, then deduplicate after.


Another approach would be to allow a cell's value to be null, meaning it should be clipped, but then you run the risk of having transforms on the region quadtrees that cause the null cells to migrate inwards into the square (such that they cannot simply be clipped off later).


Yes, that made me lose a lot of confidence in the article. This shouldn't have been a footnote. It should have been stated in the beginning that the author was only considering powers of two, and the footnote should have been a reference to ways to deal with other sizes - my very first question, before I got to the footnotes, was "How would I deal with a 3x3 square?"


The 'multirec' function here looks to have a lot in common with Clojure's transducers. Can anyone more knowledgeable do a compare and contrast?


`multirec` is a combinator, it transforms functions into other functions. `transducers` are "composable algorithmic transformations.”

`multirec` as presented here is not composable the way transducers are composable. It is, however, an extremely stimulating thing to consider.

https://news.ycombinator.com/item?id=13305176


Today I learned a different, much more harsh meaning for "bespoke"

I've never picked up on a hipster connotation from the word, though I did feel it was coming back into vogue.


It's been popular in the derivatives space for a while now.


FYI, this article on recursive combinators from 2008:

http://www.lazutkin.com/blog/2008/06/30/using-recursion-comb...


Got an opportunity to do this recently using a graph mapped on to a K^2 Tree to find out if a relationship existed or not. See https://maxdemarzi.com/2016/12/27/connected-2/


Interesting article. While reading it I wrote a version of the code in Racket. YMMV but I think it turns out cleaner, especially towards the end.

https://gist.github.com/bon/801fdb6dbced0efdb83a358bb4f6285f


Umm... or, I could just rotate a square a quarter-turn by iterating over the square's elements:

for (x = 0; x < N; x ++) { for (y = 0; y < N; y ++) { newSquare[x][y] = oldSquare[y][N-1-x]; } }

Of course, in certain representations of the square (such as using a 1-d array) even simpler algorithms can be used.


As a web developer, I appreciate this rare use of JavaScript in a CS lesson.


JavaScript is the language most likely to be familiar to the audience I'm trying to reach. But at a certain point, it gets in the way of explaining an idea rather than making it easier.

I think this essay stays on the right side of that trade-off.


Raganwald is the poet of code.


But why recursive data structures?

(Both a recursion joke and a Zoolander joke)



Wait, the way he defines the quad tree color property requires that you manually input the color for each region. If you wanted this to be part of a general purpose algorithm in real software then you'd need another algorithm that determines the color of each region - a very expensive operation if you wanted the maximum benefit.


I stopped reading at the first sentence when he improperly used `isomorphic`. I thought it made no sense at all so I looked up the footnote... just to get a big middle finger from the author.

What a con artist. Moving on.


If you suspend your disbelief until the end, it's an inductive anti-hypothesis. Induction is recursion under intuitionism.


It's a great read, you shouldn't be so thin skinned


I love the word "isomorphic". It's by far the most efficient word in the English language.

When someone else uses it in conversation, it's absolutely guaranteed that the person is a pretentious asshole.

When you use it in a conversation, the same is also true.

It's only ever acceptable in written contexts or if you happen to be Douglas Hofstadter.


> When someone else uses it in conversation, it's absolutely guaranteed that the person is a pretentious asshole.

I don't get your angle here. You realise an isomorphism is actually a well-defined mathematical concept, right? Are you advocating we use another, plainer word for an isomorphism, or do just naturally get angry when you hear words you don't understand?


Considering that TFA pokes fun at the word in the very first footnote, I took this comment to be mostly in jest. But if we’re going to be serious about it, technical jargon is really only appropriate when the speaker is 100% sure the audience 100% understands it.

There is sometimes a problem (I wouldn't use personal terms to describe it, but still) where technical terms are slung about without regard for whether people understand them. That’s poor communication, and as a bad side-effect, it encourages people to misuse the words in ways that don’t match the formal definitions.

In any event, if a comment is criticizing the article for using the word “isomorphism,” we could quite fairly ask if there is a bijective between algorithm and data structure, and whether the morphisms preserve behaviour.

If not... Perhaps TFA is at fault.


> But if we’re going to be serious about it, technical jargon is really only appropriate when the speaker is 100% sure the audience 100% understands it.

I disagree. Half the reason I come here is to be exposed to new ideas or concepts. This isn't a lecture, I'll gladly take time to learn about a new concept if it seems relevant.


Fair enough, and here is an interesting thing: In a talk, if you stop to explain every technical term, you break up the flow and make it tedious for those who know the words.

But in a web article, we have hyperlinks, footnotes, asides, and other devices for allowing some readers to skim and others to dive in.

So to me, the choice for whether to use a word in an essay is different than the choice for whether to use a word in a talk at a conference, which is different than the choice for whether to use a word in a conversation where I am familiar with the participants.


Since we're on the subject of recursion, there is a problem when somebody uses a term that they only understand 90%. Somebody else reads that and understands 90%, and then they use, and then another person reads it...

And then we arrive in the current situation where people use terms like "rainbow table" or "HFT" to describe who knows what.


Isn't that just the general nature of human communication, though?

I mean, yeah, maybe one could argue about math, but even math (proofs, especially[1]) can get a bit fuzzy.

[1] Which one could argue are a prerequisite for true understanding of a mathematical concept.


With that standard, we'd all be stuck at grunting.

Words exist for a reason. They allow precision, brevity, and – as Reginald so often shows – beauty. I expect readers to be willing to right-click an unknown word and quickly learn its meaning.

I'd also expect programmers to know what isomorphic means. Heck, you could even guess it: iso, like the standards. Morph as in morphology, "shape".


You're correct. I was trying to make a joke.

Didn't go over very well. So it goes.


There was this one time I made an enormously racist joke on HN... No, that didn’t go over well either. So maybe I know how you feel!


Pretentious or not, the term is used erroneously here.

Recursive data structures and recursive algorithms are, quite plainly, not isomorphic. They are not two equivalent representations of the same thing.

Something that is isomorphic to a recursive function that works on a quadtree would be an imperative function that works on a quadtree with identical functionality.

tl;dr: I think the author meant "synergistic".


I don’t know that an algorithm and a data structure have to be equivalent representations of the same thing to be isomorphic. In the case of the algorithm, is it the code that must be equivalent? Or the runtime behaviour?

The runtime behaviour of the ‘whirling regions’ algorithm clearly is not an equivalent representation of the two-dimensional array representation of an image. However, the runtime behaviour of the quadtree algorithm exactly matches the quadtree data structure.

This is my proposition: That what matters is whether the algorithmic behaviour is equivalent. I would say the same thing about `binrec` matching a binary tree.

...Or at least, I would have, up until today. I am open to being disabused of this notion.


Isomorphic means "have the same shape". It means that there is some structural property that the two objects have in common. The more interesting/complex the property is (and the more superficially different the two objects look), the more defensible the use of the word is.


I am following the mathmatical definiton of isomorphism:

The two sets are Quadtree instances and calls of `myself`. The map from calls to instances is "instance that was given as argument". This map is obviously bijective. The structure that is preserved is "is a child of".

QED :)


> They are not two equivalent representations of the same thing.

Algorithms encoded as a program end up as an abstract syntax tree in the compiler. Algorithm -> Data structure. The compiler then turns that ast into some other representation (e.g. machine instructions, JavaScript) which is again the algorithm. Data structure -> algorithm. Bijection identified.


An isomorphism is more than a bijection, unless it is between sets and the category is the one of all functions from one set to another.

Isomorphism is about preserving relations as well. For example, consider the ring of integers and the field of rational numbers. One can construct a bijection between one to the other (they both have the same cardinality as the natural numbers), but they are significantly different constructs. The bijection would not be able to preserve the operations - in fact, the rational numbers as a ring is significantly different, and unless one wants to use information about the cardinality, a bijection between the two is otherwise worthless.


Correct. A bijection implies isomorphism in the category of sets. Thus an isomorphism exists. I suppose we need to convince ourselves that algorithms and abstract syntax trees do indeed form sets. (Exclude such things as "the algorithm that computes the set of all algorithms", etc.)


But that is typically not how the word is used because to substitute bijection with isomorphism, it would only make sense when talking about cardinality - that is not how you used it.


"Isomorphic" literally means "of the same shape".


Even with a non-mathematical definition such as yours, I don't see the isomorphism. Recursive code that acts on a quadtree does not itself take the shape of a quadtree. Not the text, not the AST, not the machine code.

The code is treelike, but that's because all code is treelike(1).

As a counterexample, consider code that acts on arbitrary graphs. DFS on an arbitrary graph doesn't itself take the form of an arbitrary graph. It is still treelike.

(1) My old Atari 800 BASIC programs notwithstanding.


The thing that has the shape of a quadtree is the callgraph. That means there is one call for every instance of the data class, which makes it easy to reason about the code.

See also my answer to the parent: https://news.ycombinator.com/item?id=13306762.


Reading through some ancient conversations...

OK so the callgraph is isomorphic in some sense when using recursion. If this was the intent of the author, it was not clear.

I would still argue that call-graph isomorphism to the data structures they work is of little relevance to software practitioners.

1. Multithreaded algorithms acting on trees do not necessarily have callgraphs that are isomorphic to the trees.

2. As I mentioned elsewhere, arbitrary graph data structures, or even the more constrained subset of DAGS, are not isomorphic to call graphs of the algorithms that work on them. Recursive algorithms have the same exact applicability to these structures as they do quadtrees.

3. Also as I mentioned elsewhere, recursive algorithms can be implemented with loops and an explicit stack without any function calls at all. This is at times the best way to implement algorithms on trees (particularly where trees are deep and the platform has a small stack - yes I've had to do rewrite recursive code because it overflowed a small, non-configurable stack).


> The thing that has the shape of a quadtree is the callgraph

Thanks, I have borrowed your succinct explanation and added it to the essay. MuchIappreciate the insight.


I was speaking to the relationship between the code's runtime behaviour and the structure of the data.

When I say the word "algorithm," am I talking about the way the code is noted in a particular language? Or the steps one takes to perform the operation?


In some other languages, even the notation itself has obvious parallels between data definition and process definition. Consider a list and `map` in Standard ML:

    datatype 'a list =
      Nil
    | Cons of 'a * 'a list;
    
    val rec map : ('a -> 'b) -> 'a list -> 'b list =
      fn f => fn xs =>
        case xs of
          Nil          => Nil
        | Cons(x, xs') => Cons(f x, map f xs');
The case analysis of a list is obviously isomorphic to the definition of a list, in the sense that the data and the process have the same shape: a list is a `Nil` or a `Cons(item, more list)`, while a function that traverses a list checks if a list is `Nil` and does something, otherwise it's a `Cons(...)` and it does something else. The linear recursive nature of the list definition lends itself naturally to algorithms which are similarly linearly recursive. (`map` itself can be said to create homomorphisms, but that's beside the point here; I use this particular example only because it happens to be the simplest non-trivial one I could think of).

Okay, so perhaps this isn't quite what "isomorphic" means in category theory, or topology, or graph theory, or set theory, or what have you. Even in mathematics, though, it's obvious that terminology gets borrowed between the branches when there are concepts that are somehow related. Why should programming be any different? So, I argue that it's not wrong to use "isomorphic" in the sense that you have—in fact, it's a term that I've seen used in the same sense on numerous occasions, and something that I think experienced programmers have internalized. In this case, the structure of the data, that is, a linearly recursive collection of elements, precisely matches the structure of the process, that is, a linearly recursive traversal of elements. Whether or not there actually exists an isomorphism between the two is irrelevant to the notion that the two are isomorphic, that is, they have "equal shapes".

I think a habit of thinking of words in an etymological way leads naturally to borrowing those words for similar concepts. This is neither the first time such a thing has happened, nor will it be the last. To knowledgeable parties, your use of the word communicated a characteristic of your program in a succinct way that you continued to illustrate—that the shape of the quad tree data structure reflects the shape of the algorithms used to process instances of said structure.

One might say that the various meanings of the term "isomorphic" are... isomorphic ¬_¬


> You realise an isomorphism is actually a well-defined mathematical concept, right?

I think a prime example of how words loose their precise mathematical meaning is the word "function". It has a precise definition in maths but we programmers use it often with a different meaning. We used to have the terms "subroutine" and "procedure" for what we use "function" for nowadays. We even came up with the term "pure function" to refer to the original mathematical meaning of "function".


I found the article to be very pretentious too. Although the author doesn't really try to convince you otherwise, if you read the footnotes: "In computer science, two things are isomorphic if the person explaining a concept wishes to seem educated.". But you know, if you claim that isomorphism makes the code easier to understand, support the claim by at least something. If anything the code with the "isomorphism" looks unnecessary complicated and harder to understand.


That was a lot of words to say "because the languages we use to solve problems is naturally recursive."


true, but sometimes people need to be convinced by a preponderance of evidence.


Using the same didactic style as the author, I would like to show how it is possible to replace the image rotation algorithm with a NOOP. First, look at any of the pictures shown. Next, turn your head to the side 90 degrees. Although the computer's output hasn't chamged, by redefining requirements we can achieve the the desired change. The fastest way to rotate a square is a noop (or O(1) message to the user): turning your head. Never forget it!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: