On C Linked Lists (Profiling and Optimizing)

tptacek · on Dec 12, 2010

Generally, if you care about performance, you're using a collection that doesn't malloc individual nodes, and allows random access without crawling pointer chains and tearing up your cache.

Instead of wasting more energy on improving the (overused) linked list, write a little resizable array library that doubles size on each fill-up. It's easy and much faster.

neild · on Dec 12, 2010

Instead of wasting more energy on improving the (overused) linked list, write a little resizable array library that doubles size on each fill-up. It's easy and much faster.

Don't do this.

Doubling the size of your allocation whenever you expand can easily lead to large amounts of wasted storage in common scenarios. "Yes," you may say. "But memory is cheap." Which it is at the small scale, but when you're allocating 1GB to store a 512MB array you have a problem.

A vastly better approach is to use some form of adaptive rescaling which avoids excessive overcommits.

Here, for example, is the code (and explanatory comment) Python uses to determine the new allocation size for an expanding list:

        /* This over-allocates proportional to the list size, making room
         * for additional growth.  The over-allocation is mild, but is
         * enough to give linear-time amortized behavior over a long
         * sequence of appends() in the presence of a poorly-performing
         * system realloc().
         * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
         */
        new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6) + newsize;
        if (newsize == 0)
                new_allocated = 0;

Note that you will also want to avoid using the same calculation for expanding and contracting the allocation size, to avoid thrashing allocations when the list size varies around the pivot point. (i.e., add one element, triggering an expansion; remove the element, triggering a contraction; repeat.)

tptacek · on Dec 12, 2010

This is a great comment and I guess it's good advice, but doubling vectors have never caused a problem for me, I profile, and I've had them dealing with data sets like "each side of every TCP connection traversing 1/8th of an entire tier 1 ISP backbone".

I'll steal the Python trick, because it's neat, but I don't think worrying about your resize function is a good reason to procrastinate moving away from linked lists, which are mostly evil.

antirez · on Dec 12, 2010

It is not possible to argue on data structures without the use case at hand. Linked lists have time complexity for head/tail addition and removal of elements that is O(1), and you can't have this with resizable arrays. So there are good use cases for both.

Instead the trick you mention allocating multiple elements per node is actually a linked list of fixed size arrays: there you are trading a constant factor per storage efficiency. This is a good approach for many use cases as you are still O(1) and the cache locality will likely help to compensate for the additional operations you need to perform.

Still it's worth to note that there is to handle such a data structure to make sure, in the case of deletions in the middle, a few adiacent nodes that will end with just a few elements are "compacted" from time to time while traversing the data structure.

tptacek · on Dec 12, 2010

An array with invalidation instead of deletion doesn't have to be O(n) on deletion-from-the-middle. I think anyone would have to concede that if you routinely need insertion in the middle, you can't use a flat data structure.

repsilat · on Dec 12, 2010

If the order of elements isn't important (fairly common) then you can do even better than invalidation - just swap the element you want deleted with the element on the end, and pop it off.

Of course, you have to be a little careful - if you're doing a `remove_if` loop, you have to remember to process the element that used to be on the end, and you need to make sure you don't iterate past the end of the (now shorter) array.

As for insertions in the middle, a vector may still be better than a linked list. If the vector is short you could win on that alone, but even if it's long it can be worth it.

Consider that the program you're writing probably doesn't consist entirely of middle-insertions into these data structures. If you iterate over those data structures with the same frequency as you insert into the middle, you're never not going to be O(n) there. The O(n) + O(1) = O(n) of the linked list iterate/insert may well be far slower than the O(n) + O(n) = O(n) of the vector iterate/insert for any n.

(And with binary search, your O(log n) + O(n) = O(n) may be faster again...)

tedunangst · on Dec 12, 2010

Did you miss the point that these lists aren't being crawled? That's not the slow part.

There are lots of reasons to keep track of things even if finding them is actually rare.

And an array doesn't solve any problem. Searching it is still linear, you're still tearing up the cache doing so because in a case like this you're storing pointers in the array, not data, and now you another memory allocation that can fail. structs with builtin links can always be added to a list without fail, growing an array not so much.

tptacek · on Dec 12, 2010

No, the point of resizable arrays is that you don't store pointers in them.

tedunangst · on Dec 12, 2010

I think you would be disappointed in the performance of a kernel that stored all procs in an array vs a linked list.

tptacek · on Dec 12, 2010

Maybe. I tend to believe stuff in the Linux kernel has been aggressively profiled (as opposed to C code in general, which the article also claims to be about). Why do you think so, though?

tedunangst · on Dec 12, 2010

Because deleting from an array is linear, not constant.

Every time a process exists, you have to remove it from the parent's children. If it was the first of 1000 children, that's a lot of memmove.

Also, another disadvantage: Once you start storing data in resizable arrays, you can never, ever, let anyone keep a pointer to an element (such as in a different tree or list). One resize later and boom, that pointer is broken.

tptacek · on Dec 12, 2010

Deleting from the tail of an array is constant. But I agree, of course, that there are times that lists make sense.

The pointer problem is straightforwardly addressed by storing offsets instead of pointers; high(er)-performance encodings of the list (and, esp. tree) ADTs already do that.

tedunangst · on Dec 12, 2010

I don't like offsets because I don't they are that straightforward. First, the offset is useless without the base pointer. So that either needs to be global or you need to store that too (and now you have fat pointers). But the base can change, so you really need to store a reference to it. yuck.

And you still can't delete anything when using offsets. Forgetting about random deletions (which are important) an array works for a stack, but not a fifo, and fifos are at least as common.

tptacek · on Dec 12, 2010

It's not as if the design alternative you're suggesting --- ie, storing "struct-proc-star" all over the place --- is much safer. You still have to track down and invalidate every stale "struct-proc-star" when you delete from the list. And this notion of having to store a base pointer and the pointer --- you're storing the identity of the collection somewhere. I think this is a straw man.

If you have a collection where random deletion is central to the design, then by all means, use a list. The process table is perhaps one example.

But just because you might at some point need random deletion doesn't mean it's a win to use a linked list. The process table will constantly have random deletions, but pid 47481 is far more likely to die at any moment than pid 10. Do you know how not-expensive it is to memcpy 1-3 "struct procs"? How often is the process table indexed or traversed? Roughly every 10th scheduling quantum? Are you sure you're optimizing for the right thing?

Incidentally, if you need FIFO behavior, a ring buffer is a nice thing to have in your toolchest.

nkurz · on Dec 12, 2010

I don't think this a a good suggestion. Sure, it's commonly true, but it completely depends on your use case.

This is a very specific article about optimizing the linked lists as used in the Linux kernel, with wonderful data about real world usage. I find it highly doubtful that the Linux kernel would benefit from dropping linked lists in favor of resizable arrays. Possible of course, but I'd want to see the data before making any recommendations.

In general, it would depend on what's in your list and what you do with it. If each item is the size of a cache line, you're probably not trashing your cache[1]. If the list is large and you are keeping it ordered, you might benefit from easy insertions and deletions. If you've got millions of little lists of non-power-of-2 sizes, maybe doubling the size on a fill up would be problematic.

Obviously you know this and I'm not giving you any new information, but I'm confused why you'd make this recommendation. Is it aimed at Rusty? Or do you truly feel that the linked list has no place in the world?

[1] Well, maybe, but this depends on how often you need to do a full scan through the list. If you're storing unordered, you probably are scanning the array; if ordered, you'll be shuffling memory a lot on inserts. Again, use case matters.

tptacek · on Dec 12, 2010

When modification happens often relative to accesses, and occurs in the middle of the container, lists make sense. When else do they make sense?

(In this article, modification is the most frequent operation, but "replace" is the least frequent; for some of those lists, tail insertion might be the most frequent use case).

For me, the question is, what's the right go-to collection for general purposes? For most C programmers, that's the linked list, and I think that's a bad default.

nkurz · on Dec 12, 2010

They make sense where you want essentially linear time additions and deletes regardless of list size. They also play well with parallel systems, as there exist lock-free implementations.

I'm happy to buy the argument that they aren't a great default for cases where you know you will be dealing with a small unordered collection. Storing HTTP headers? Don't use a linked list. Large enough to benefit from binary search? Don't use a linked list.

In general I'd agree with you that a simple table is a better first approach. Brute force and realloc can be surprisingly efficient. Others have made the argument that a hash beats a table even for very small numbers, and thus is a better default: http://news.ycombinator.com/item?id=1860917

The Naggum article (parent of the linked comment) is a fun rant in favor of linked lists over hash tables, although I think Naggum would probably also agree with the argument that you shouldn't use even a linked list when a simple array will do.

tptacek · on Dec 12, 2010

What's "large"? Even at thousands or tens of thousands, vector-style lists can be a win simply because they're faster to iterate through, they optimize locality, and they don't require pointer chasing.

At very large sizes, deletions from the interior can be painful, but at very large sizes you need to be customizing for performance (if that matters) anyways. Instead of deleting, for instance, you can invalidate, and then amortize the memcpy across many deletions; a straightforward time/space tradeoff.

The big problem I have with "performant" linked lists is that malloc is death to performance. The single biggest profiler payoffs I've ever gotten --- order of magnitude improvements --- have been from getting rid of calls to malloc. Yes, you can custom-alloc a linked list, but you're then getting to a place where lists and vectors are starting to converge on each other.

I think C programmers use linked lists instead of "vectors" because realloc resizing is scary, and because every CS student learns linked-lists in their first month of class.

gsg · on Dec 13, 2010

The C-style linked lists that TFA discusses don't involve any dynamic allocation. They're intrusive structures that link locations together, not containers that are responsible for memory ownership and management.

Intrusive data structures are actually very malloc friendly because you can include multiple lists in structures that are allocated sequentially, and insert/remove at will without any allocation or deallocation whatsoever. This is of course why C programmers use linked lists heavily.

("realloc is scary"? Really?)

RodgerTheGreat · on Dec 12, 2010

If those CS students pay attention in their algorithms class, they'll learn that the amortized cost of insertions and removals at the end of a load-based resizable array is still constant anyway. As you say, the overhead of a malloc tremendously dominates a few instruction's worth of logic for capacity checks.

rdtsc · on Dec 13, 2010

Well it depends on your patterns of access. If you remove nodes a lot then you might end up moving large blocks of indices around. Unless you use a sentinel value, but then before you know it, you are implementing a hash table.

To reduce memory allocation time penalty you can use a memory pool.

However if memory size is a problem (maybe another 2 pointers per element) or if you do mostly adds and random access of elements an array could be a good choice.

FooBarWidget · on Dec 12, 2010

Arrays don't allow O(1) insertion and deletion except at the end.

kvs · on Dec 12, 2010

Not necessarily all evil. Only text-book implementations require that one allocate individual nodes. Most implementations I have seen in Linux and FreeBSD can pool allocate the nodes. One can also pull off neat tricks to append link chains in these implementations to expand the list without the need to reallocate memory as needed by arrays.

tptacek · on Dec 12, 2010

Pool allocation only addresses one of the problems with linked list; it still leaves you chasing pointers to iterate, it still decreases locality, and it still adds 4-16 bytes to every element in the container.

Meanwhile: you aren't really saying most people use pool-allocated lists, right? I see people hand-hacking malloc'd lists all the time. I rarely see custom pool allocators. It's what people were taught to do.

ced · on Dec 12, 2010

How does deletion work in ring-style lists when there's only one element in the list? Wouldn't that break a branchless implementation? How is the empty ring represented, anyway?

As for his test purportedly showing that A is better than B on Linux... It's only right if you assume that Linux can only use one type of list in all cases.

rwmj · on Dec 12, 2010

An empty ring never needs to be presented. The point about the Linux list is that it lives inside another struct. If there are no instances of that struct, then there is no need to represent an empty list at all!

tedunangst · on Dec 12, 2010

I think you're confused. The list head exists outside of the structs that are in the list. The list links (next and prev) live in the struct, but not the head.

To answer the original question, you decide a list is empty when prev == next.

ced · on Dec 12, 2010

But isn't that true if the list has one element? Then A->A, thus prev==next, but the list still isn't empty.

And if the head lives outside of the list proper, then you would need special-case code to handle deletion of the first element in the list (since the head contains a pointer to it). It wouldn't be branchless.

tedunangst · on Dec 12, 2010

There is a fake element that is the head. It doesn't count as part of the list, but it's always in the list. With one element, the list is

    H->next = A, A->next = H
    H->prev = A, A->prev = H

ced · on Dec 12, 2010

Oh, that's a clever solution. Thanks.

ced · on Dec 12, 2010

Yeah, I thought about that. But he mentions "emptiness testing" a few times in the rest of the post. If there's no empty list representation, what's the point?