> ...functional programming is mostly about avoiding mutation at all costs
Slightly different perspective from Grokking Simplicity[1]: functional programming is not about avoiding mutation because "mutation is bad." In fact, mutation is usually the desired result, but care must be taken because mutation depends on when and how many times it's called.
So good FP isn't about avoiding impure functions; instead it's about giving extra care to them. After all, the purpose of all software is to cause some type of mutation/effect (flip pixels on a screen, save bits to storage, send email, etc). Impure functions like these depend on the time they are called, so they are the most difficult to get right.
So Grokking Simplicity would probably say this:
1. Avoid pre-mature optimization. The overhead from FP is usually not significant, given the speed of today's computers. Also performance gains unlocked by FP may counter any performance losses.
2. If optimization via mutation is required, push it as far outside and as late as possible, keeping the "core" functionally pure and immutable.
This is similar to Functional Core, Imperative Shell[2]; and perhaps similar to options 1 or 2 from the article.
> A lot of people think that functional programming is mostly about avoiding mutation at all costs.
People should try to stop thinking of mutation as something to be avoided, and start thinking of it as something to be managed.
Mutating state is good. That's usually the whole point.
What's bad is when you create "accidental state" that goes unmanaged or that requires an infeasible effort to maintain. What you want is a source of truth for any given bit of mutable state, plus a mechanism to update anything that depends on that state when it is changed.
The second part is where functional programming shines -- you have a simple way to just recompute all the derived things. And since it's presumably the same way the derived things were computed in the first place, you don't have to worry about its logic getting out-of-sync.
I like to say that immutability is a really good idea in the 1990s, especially considering how counterculture it would have been at the time. I don't mean that as diminutive or patronizing, I'm serious. It was a good cutting edge idea.
However, nobody had any experience with it. Now we do. And I think what that experience generally says is that it's a bit overkill. We can do better. Like Rust. Or possibly linear types, though that is I think much more speculative right now. Or other choices. I like mutable islands in a generally immutable/unshared global space as a design point myself, as mutability's main problem is that it gets exponentially more complicated to deal with as the ___domain of mutability grows, but if you confine mutability into lots of little domains that don't cross (ideally enforced by compiler, but not necessarily) it really isn't that scary.
It was a necessary step in the evolution of programming ideas, but it's an awful lot to ask that it be The One True Idea for all time, in all places, and that nobody in the intervening decades could come up with anything that was in any way an improvement in any problem space.
> I like to say that immutability is a really good idea in the 1990s, especially considering how counterculture it would have been at the time. I don't mean that as diminutive or patronizing, I'm serious. It was a good cutting edge idea.
> However, nobody had any experience with it. Now we do.
Working intimately with C++ in the '90s, immutability as a concept was neither considered counterculture nor without significant experience employing it. At that time and in C++, it was commonly known as "const correctness" and was a key code review topic.
Go back another decade or two when K&R C ruled the land and that's a different story ;-).
The funniest thing is that to compile Rust, enabling various compiler features (like borrow checker etc), the code has to be transmuted into non-mutable one Single Static Assignment form :V
And in the functional world, Lean 4 achieves language level support for local mutation, early return, etc. in the same way, by converting it into non-mutating code:
All this immutability discussion is more about going against "old school object orientation" which promoted keeping state spread out around a lot of mutable object instances.
This is a hill I will die on. Probably literally. If you are writing in a metaphor of equations, than yes, mutation is almost certainly going to bite you. If you are writing in a metaphor of process, you almost certainly want to manage, as you say.
I feel that early texts were good at this. Turtle Geometry is my personal favorite book in this vein. I seem to recall we spent a long time going over how to double buffer graphics so that you could be working on one buffer while letting the system draw the other. Not sure what texts we used for that, back in the day.
Later texts, though, go through a lot of hurdles to hide the fact that things are actively changing. The entire point is to change things.
I'm not entirely clear what you mean. I can easily see malloc/free in the realm of "incidental complexity." Many of the abstractions in process descriptions are absolutely not incidental, though?
Control-flow is an awkward choice there, as goto is not necessarily fundamental for how control-flow is run for a lot of code. And I have absolutely used labeled break/continue in Java before for a control loop that ran great until people tried to refactor to use more indirect control.
I also think it is interesting as I greatly prefer code where you can do basic left/right and top/down reading to know what is intended by the code.
At any rate, my original intent was to discuss code that is controlling something works really well if you embrace a metaphor for the code you are in.
> The second part is where functional programming shines -- you have a simple way to just recompute all the derived things. And since it's presumably the same way the derived things were computed in the first place, you don't have to worry about its logic getting out-of-sync.
Thinking this further, this is also performance related. I think there is an interesting relationship between FP and DOD:
A technique of data oriented design is to keep state minimal and lazily derive data when you actually need it, which may involve recomputing things. The rationale is that compressed, normalized data requires less fetching from memory and computation on it is faster.
In contrast caching and buffering, both of which are heavily stateful and require a lot of additional memory, are often necessary, because they minimize inherently slow operations that are out of your control. Those kinds of things are often best implemented as (computational) objects with encapsulated, internal state and small, general interfaces, like OO has taught us.
But once the data in your control, this mindset has to be flipped on its head. You want to model your in-memory data not that differently from how you'd model for databases: Neatly aligned, normalized data, with computed columns, views and queries to get richer answers.
Interestingly if you follow this approach, then code starts to look more similar to functional code, because you potentially need the whole context to derive values from it and a lot less like independent objects that send messages to each other.
That should read, “plus one mechanism”. Another place where people get into trouble is thinking “a” means >= 1 instead of exactly one.
When the state has multiple entities trying and failing to manage it consistently is when things get bad. Functional tends to make for more friction which discourages doing a lot of state, which to some extent controls the superlinear complexity of state management by making each piece dearer.
> Rust's shared XOR mutable references […] makes linearity nearly useless or inevitably creates a parallel, incomplete universe of functions that also work on linear values.
Yup. Rust can't abstract over mutability. For owned values, it defaults to exclusive ownership, and needs explicit Rc<T> and clone() to share them. For references, in practice it requires making separate `foo()` and `foo_mut()` functions for each type of the loan.
Even though this sounds horribly clunky, it works okay in practice. Ability to temporarily borrow exclusively owned objects as either shared or exclusive-mutable adds enough flexibility.
Rust is known for being difficult, but I think a lot of that is due to lack of GC. Rust can't make any reference live longer, and programmers have to manually get scopes of loans right, or manually use Rc<RefCell> to have a mini DIY GC. Perhaps a shared XOR mutable with a GC could be the best of both?
> Ability to temporarily borrow exclusively owned objects as either shared or exclusive-mutable adds enough flexibility.
Rust quietly has several other features in order to improve the quality-of-life of its ownership model. Two examples: if you have a mutable reference then Rust will coerce it to an immutable reference if one is required, and if you have a mutable reference then Rust will transparently re-borrow it when calling functions that accept mutable references in order to allow you to use the mutable reference more than once despite the fact that they do not implement Copy.
There are a couple of proof-of-concept libraries adding a `Gc<T> where T: Traceable`, so it is doable.
You can't change the existing Rc, because it's a concrete type. Making it use a tracing GC internally would require a lot of compiler magic.
However, I don't think a GC will catch on in Rust in its current form. Rust's userbase likes it as a no-GC language. Plus a 3rd party library GC wrapper type can't save you from having to learn ownership and lifetimes used by literally everything else. Once you invest time to learn the "zero-cost" references, a runtime Gc is less appealing, and won't be as ergonomic than the built-in references.
Swift, OCaml, and Mojo are trying to add some subset of Rust-like ownership and borrowing, but in a simpler form.
You can, but it turns out that, as one may intuitively expect, a GC is never needed unless implementing a VM for a GC-based language or an API that required GC like fd passing on unix ___domain sockets, and those generally want an ad-hoc GC instead tailored to whatever you are implementing.
Since it's not needed and it's massively worse than reference counting (assuming you only change reference counts when essential and use borrowing normally) due to the absurd behavior of scanning most of the heap at arbitrary times, there is no Rust GC crate in widespread use.
Almost all GCs used in practice today only scan the set of live objects, which in normal operation is much smaller than the entire heap. They also allow much more efficient allocation and de-allocation.
The problems with GC are threefold, and why you might not want it in a systems language:
1. GC requires more memory than strictly necessary to be efficient (usually about 1.5x - 2x the amount you absolutely need). You're basically trading runtime efficiency for memory.
2. GC performance is harder to predict and reason about than certain other allocation strategies
3. GC languages tend to encourage excessive heap allocation for various reasons, ending up with much more junk than a typical Rust or C program that has a similar amount of entities
Note that item 2 is the one that's least understood. The best part about GCs is that they make heap allocation trivial, and they make de-allocation a no-op. In contrast, both malloc() and free() are extremely complex and costly operations. The GC does impose a cost on every pointer write, similar to (but typically less than) the overhead of Arc<T> over a T*, but that has a very uniform and predictable cost. The problem of unpredictability only comes in the collection phase, and is mostly related to (a) when the collection happens, (b) how much data actually has to be scanned (how many live objects are present on the heap and stack), and (c) what type of collection needs to happen (is it enough to collect from this thread's young generation, or do you need to collect all generations from all threads).
Note that many of these problems are in fact solvable, and there actually exist GCs with constant predictable collection times, suitable even for realtime applications (which malloc/free don't support). They are very sophisticated technology that no one is distributing for free though (e.g. you have to pay Azul for a realtime-compatible Java).
Assuming you use a set of slabs of fixed size objects and keep free objects in a linked list, both malloc and free are trivial O(1) operations.
Destructors with cascading deletions can take time bounded only by memory allocation, but you can solve that for instance by destroying them on a separate thread, or having a linked list of objects to be destroyed and destroying a constant number/memory size of them on each allocation.
You still need locking to make this work in a multithreaded environment, or at least atomics. And all malloc implementations used today are more complex than this, especially when allocating large objects, because you can't actually maintain a list for every possible size of allocation. That means they need to do extra work to handle fragmentation.
Plus, the free list has to occasionally be walked to actually free pages back to the OS. If you don't, then memory is never freed by free(), it is only marked for potential reuse.
There are several popular implementations of malloc (and their corresponding free), and they are all quite complex and have different tradeoffs. And, in fact, none of them is any more suitable for high performance or realtime code than a GC is. The golden rule for writing this type of code is to just never call malloc/free in critical sections.
And I will note that probably the most commonly used malloc, the one in GNU's glibc, actually uses Linux locks, not atomics. Which means that you are virtually guaranteed to deadlock if you try to use malloc() after fork() but before exec() in a multi-threaded process.
> Destructors with cascading deletions can take time bounded only by memory allocation, but you can solve that for instance by destroying them on a separate thread, or having a linked list of objects to be destroyed and destroying a constant number/memory size of them on each allocation.
Which means calculating how long free going to take for a given object is incredibly difficult. Which is my entire point. Of course it's not non-deterministic unless someone 's destructor call s some random code but it is not easy to calculate and in the real world is basically unpredictable.
Spitting up a separate thread and doing an unknown amount of work is no more predictable than doing on the main thread.
Fundamentally destructors can do whatever the hell they want which means you never know how long destruction is going to take.
And of course if you're freeing up an object that has a linked list inside of it and you need to clear out the link list, now you're jumping all around memory and that's never fun, and the run time sure as heck is not constant based on what needs to be paged in and out and what exists in cache lines where.
I'm not saying these problems are unsolvable for a given use case and there are good reasons why custom allocators abound throughout the industry, My main point is that manual memory management does not necessarily lead easy to predict run times for freeing up memory.
This especially true because a lot of developers think that malloc and free are just magic somehow.
In reality, garbage collectors often aren't that complicated, and when it comes to large complicated user applications, you're going to be spending a lot of time in the allocator and deallocator no matter what memory management schema you choose to use.
(Edit: and then there's fragmentation which is the death of many manually managed memory systems! A naively Java or c-sharp application can run for far longer than a naively written C++ application!)
But having such a naive malloc implementation would put it well below what a state-of-the-art tracing GC can do in allocation speed, and then we didn’t even get to fragmentation, elegantly solved by moving GCs. Of course, all these are tradeoffs, I’m just saying that it isn’t as easy.
> Destructors with cascading deletions can take time bounded only by memory allocation, but you can solve that for instance by destroying them on a separate thread, or having a linked list of objects to be destroyed and destroying a constant number/memory size of them on each allocation.
You can run your garbage collector on a separate thread, too. Or use a real time garbage collector.
> 2. GC performance is harder to predict and reason about than certain other allocation strategies
I'm not sure what you mean by this being the least understood. It seems like it's very well understood: GC introduces latency if you want good throughput, or it reduces throughput if you want excellent latency (sub-100us is possible).
Of course, that doesn't mean you can predict what specific throughput or latency properties your specific program will have, except for GCs that have maximum bounded latency guarantees.
The point is that it's very hard to predict how much of an impact the GC will have a priori. You can of course measure after the fact, and try to improve, but it's hard to architect your system in a way that you can more or less guarantee will get good GC performance (other than just not allocating any memory, of course). Malloc actually suffers from a similar problem: it's just hard to know a priori if your access patterns will work well with the internals of your allocator/GC.
IBM Metronome and similar approaches provide very strong guarantees (if you keep a certain bound on garbage creation ofc) at the expense of throughput.
What they lose is that they introduce also a certain guaranteed loss of available cpu time, due to optimizing for latency and real-time predictability.
Respecting generational hypothesis is a good step towards engineering a GC-friendly design. Mind you, this brings back the necessity to reason about object lifetime, something a GC-based language absolves you from, but it does help. It tends to teach you a lesson that sometimes higher allocation count of objects that die in Gen 0 is preferable to fewer larger allocations which have higher chance of surviving until Gen 1 (.NET has three main generations for GC objects - 0 and 1 aka ephemeral and 2 aka long-lived, there are also LOH, POH and NonGC-heap).
That's fast enough for real-time audio, so yes, that's excellent. And that includes compaction and only a small hit to throughput. You can get sub-2us latency at a serious hit to throughput.
No, the more memory you give them, the more efficient they are, and generally they are much more efficient than an equivalent program using malloc() + free() for every piece of memory that gets allocated.
Take the extreme case of a program where every allocation is permanent. The GC allocation will be much much faster than malloc() (GC allocation is normally just bumping a pointer to the end of the heap, while malloc() typically does a lot more work to segregate objects by size etc), and collection will never run. So, overall, the program will be significantly faster.
Edit: More realistically, a program that typically produces 1:1 live mmeory:junk, but occasionally spikes, will consume 0 GC time in normal operation (the GC will just let the junk sit there), while free() needs to run on all that junk as soon as it was created, otherwise it leaks forever.
Also, the fact that GCs can defer collection to later than the object going out of scope can often make them more efficient than a traditional free(). For example, if you have a large graph data structure that is now junk, free() needs to do work that's O(size of graph) to free all the nodes. A copying collector, the most popular design, never even looks at the graph, so it frees it in O(1). Edit: of course, the converse is also true: if the graph survives N collections, the copying GC has to do O(N * size) work, where malloc()/free() do nothing at all.
This is in fact heavily related to "ancient" Unix approach, and partially why it optimized so hard for many small processes.
PDP-11 unix (without split instruction/data space) had 32kB of memory for userland process, total, plus another 24kB for kernel (8kB are reserved for I/O devices).
The memory management interface that gave us malloc() and free() ultimately boiled down to brk() and sbrk() which simply set a pointer describing where stack and heap met in memory. Using many small programs, you'd have as little as possible of that 32kB used by code, and prefer to simply allocate using bump pointer and never free (better reuse object pools, which is also a technique used with GCs). A program in a pipeline would mostly allocate some stuff at the start plus buffers, and while some of the early bits might become garbage it doesn't really matter because you can't share parts of your 32kB block (or at best you can use 8kB segments). So you do "missile-style GC" and just use a bump-pointer allocator (brk() + sbrk() + sizeof combo), then exit to OS when your work is done and the entire memory space gets cleaned.
EDIT: Some GCs (like in later versions of Symbolics Lisp Machines, or presently in SBCL behind some experimental options) provided multiple arena support explicitly to be able to force-declare everything allocated in an area as garbage.
I'm guessing you're referring to the fact that the most popular GC languages allocate almost every object on the heap, even when used as a field of another object. This is by no means a constraint of the GC, it is a separate design choice.
And in fact C# has always supported "value objects", which do get allocated in-place, Java is adding the same support probably in the next release, and Go actually defaults to it.
Worst case, 2× is actually a steal compared to a non-moving allocator such as malloc/free or RC built on top of that, which cannot[1] do better than about 1 + ½log₂(largest allocation / smallest allocation). For example, if you have allocations from 1K to 16K bytes, any malloc/free implementation can require at least 3× the memory you actually use; if from 16 to 16K, 6×; etc.
At this point I must mention that paged memory can get you just enough movability that catastrophic fragmentation of this kind is avoided[2] with high probability. Paging tricks can be seriously expensive on a modern multicore processor, though, so I’m not sure this is the way forward. (The paper report only 1% perf overhead for Firefox; I don’t know if that means Firefox is less multithreaded than it could be or what.)
It depends what you mean by “efficient”: a world that controls when someone can free memory can be surprisingly efficient at dealing with the fundamental fragmentation problem in terms of overall churn over time (throughout) at the cost of space and immediate time (latency). Both are different forms of efficiency.
Manual memory management as a solution to the fragmentation problem trades that off, not knowing anything about when free might be called, and so has to lean towards optimising space and immediate time rather than throughout. But there’s still a memory manager behind the scenes that has to deal with fragmentation as well; there’s no get out of jail free card for that, and that complexity is still hidden.
(Helpful memory usage disciplines like arenas/pools have their desirable properties for the same reasons: it’s a discipline on when you free memory in order to avoid fragmentation.)
Faster than ref counting AND than manual memory management. Papers that have recorded the traces of memory allocation and liveness info and then replayed them using GC or by inserting manual new/free show GC has a throughput advantage here.
Many use GC languages for productivity in code where that’s more important. Far as slowdowns, there are concurrent and real-time (fixed timing) GC’s out there which reduce or avoid the problems you mention. JX OS also let you mix different GC’s for different components.
So, there’s a performance hit of some kind with optional tuning that takes no expertise. Many people would go for those tradeoffs for some applications. Especially if they got to keep using safe, no-GC code in their libraries with their efficiency benefits.
> You can, but it turns out that, as one may intuitively expect, a GC is never needed unless [...]
Rust uses plenty of reference counting. You could replace most of that with a GC; depending on your GC implementation and your reference counting implementation, and your requirements in terms of latency and throughput and ability to handle cycles.
> Since it's not needed and it's massively worse than reference counting (assuming you only change reference counts when essential and use borrowing normally) due to the absurd behavior of scanning most of the heap at arbitrary times, there is no Rust GC crate in widespread use.
There are choices. You can have real time garbage collectors. And if you want to handle cycles, you need some kind of tracing anyway.
Also, if you only care about throughput and not about latency, garbage collection can easily be faster than reference counting.
If you don't allow cycles, you can also take advantage of that in your garbage collection. See how Erlang does it. (In Erlang, the garbage collector relocates your objects so that they are in topological order, so all references only point forwards in memory. You can combine that with generational gc, too, of course.)
> Since it's not needed and it's massively worse than reference counting
Lol, what? Maybe don’t go asserting stuff you clearly know little about. Reference counting is a fine tradeoff for manual memory-managed languages, but it is absolutely smoked out of the water by a tracing GC on most counts. It’s almost like JVM, V8, etc engineers know a thing about the topic and don’t have RC for a good reason.
Tracing GC doesn’t burden the mutator threads with additional work, almost everything can be done in parallel, resulting in vastly better throughput. Imagine dropping the last reference to a huge graph, one can actually observe it when exiting a c++ program, it might hang for a few seconds before returning control to you, as all the destructors are recursively called, serially, on the program thread, literally pointer by pointer jumping across the heap, the very thing you are so afraid of. And I didn’t even get to the atomic part, bumping a number up or down with synchronization between CPUs is literally the slowest operation you can do on a modern machine. Tracing GCs elegantly avoid all these problems at the price of some memory overhead. None of these GC algorithms (yes, RC is a GC) is a silver bullet, but let’s not joke ourselves.
That's because, unlike Rust, those languages with RC would have a lot of unnecessarily refcounted objects because they don't have value objects, do a whole lot of useless reference count updates because they don't have borrowing and always have to use atomics because they can't ensure that some objects are not shared between threads (and also would need a cycle collector in addition to the reference counting).
If you use reference counting properly in a well-designed language then it's obviously better than GC since it's rarely used, fast, simple, local and needs no arbitrary heuristics.
The destructor cascades are only a problem for latency and potential stack overflow and can be solved by having custom destructors for recursive structures that queue nodes for destruction, or using arena allocators if applicable.
So, RC is better than tracing GC, when it’s not used as memory management, and it is special cased everywhere.. got you!
Like, as I explicitly wrote, it is probably the correct choice for low-level languages close to the metal, that want easy compatibility with other languages through FFI. But the method itself has still got a much slower throughput than a tracing GC, when used in a similar manner. Anything else is a useless comparison, like is a bicycle better than a tank.
> But the method itself has still got a much slower throughput than a tracing GC, when used in a similar manner
That is correct, but the issue is not with reference counting, but rather with having unnecessary extremely frequent RC/GC operations.
Once frequency is reduced to only necessary operations (which could be none at all for many programs), reference counting wins since its cost is proportional to the number of operations, while GC has fixed but large costs.
You can very well have a Rust-like language with Rust's model of 'compile-time memory management', but you replace all uses of reference counting with tracing garbage collection.
You probably already know this, but the gc crate is an implementation of a somewhat more sophisticated GC, but it uses `derive`s to implement tracing, so it's not exactly the Rc<T> interface.
The article utterly falls apart in its first paragraph where it itself acknowledges that the whole ML family including Ocaml has perfect support for mutation, rightfully assume most Ocaml programmers would choose to not use it most of the time but then assume incorrectly that it’s because the language makes it somehow uneasy. It’s not. It’s just that mutation is very rarely optimal. Even the exemple given fails:
> For example, let's say you're iterating over some structure and collecting your results in a sequence. The most efficient data structure to use here would be a mutable dynamic array and in an imperative language that's what pretty much everyone would use.
Well, no, this is straight confusion between what’s expressed by the program and what’s compiled. The idiomatic code in Ocaml will end up generating machine code which is as performant than using mutable array.
The fact that most programming languages don’t give enough semantic information for their compiler to do a good job doesn’t mean it necessary has to be so. Functional programmers just trust that their compiler will properly optimize their code.
It gets fairly obvious when you realise that most Ocaml developers switch to using array when they want to benefit from unboxed floats.
The whole article is secretly about Haskell and fails to come to the obvious conclusion: Haskell choice of segregating mutations in special types and use monads was an interesting and fruitful research topic but ultimately proved to be a terrible choice when it comes to language design (my opinion obviously not some absolute truth but I think the many fairly convoluted tricks haskellers pull to somehow reintroduce mutations support it). The solution is simple: stop using Haskell.
> which is as performant than using mutable array.
I get what you're trying to say, but that is provably false.
As great as the OCaml compiler is, it currently is not capable of the aggressive optimizations that GHC can do with lists.
More often than not, the compiler mostly won't have enough static assertions to reliably generate machine code like that in a real world application (unless explicit mutation is used, of course).
> Functional programmers just trust that their compiler will properly optimize their code.
Precisely.
This is why having safe local mutation as a language level feature can give more control to the programmer.
We no longer have to rely on the compiler to correctly guess whether a routine is better expressed as an array or a cons list.
> The whole article is secretly about Haskell.
and ML, Koka, Clean, Mercury.
The article is about allowing local mutation without breaking referential transparency at the language level.
"Stop using haskell" is a very shallow conclusion, IMO.
> Well, no, this is straight confusion between what’s expressed by the program and what’s compiled. The idiomatic code in Ocaml will end up generating machine code which is as performant than using mutable array.
This cannot be true in general. There are machine code patterns for which arrays are faster than linked lists. The OCaml compiler, great as it is, won't turn linked list source code into array machine code. Therefore, there is idiomatic code in OCaml that will not be as performant as arrays.
> It gets fairly obvious when you realise that most Ocaml developers switch to using array when they want to benefit from unboxed floats.
This is one example why your statement above is not true.
> This is one example why your statement above is not true.
You are misreading my comment. I’m not intentionally contradicting myself in two paragraphs next to each other (I’m not always the brightest but still).
The point is that contrary to what the article states ML developers are not avoiding mutations because they are uneasy to use but because they trust their compiler when they know it will do good. Proof is that in other case they will use mutations when it makes sense to do so because the compiler does not do a good job.
The first paragraph refers to the specific case my comment quotes just before: data structure traversal and storage of elements in a set.
> The point is that contrary to what the article states ML developers are not avoiding mutations because they are uneasy to use but because they trust their compiler when they know it will do good. Proof is that in other case they will use mutations when it makes sense to do so because the compiler does not do a good job.
It will do a good job, yes. Will it do the best possible job compared to some other algorithm or data structure? It can't. Not in general.
And maybe not in the specific case either:
> The first paragraph refers to the specific case my comment quotes just before: data structure traversal and storage of elements in a set.
$ for len in 5_000_000 10_000_000 25_000_000; do echo "-- ${len} elements --"; ./a.out list $len; ./a.out dynarray $len; ./a.out my_dynarray $len; echo; done
-- 5_000_000 elements --
list: 0.191551 sec
list: 0.196947 sec
list: 0.192806 sec
dynarray: 0.301362 sec
dynarray: 0.268592 sec
dynarray: 0.266118 sec
my dynarray: 0.163004 sec
my dynarray: 0.142986 sec
my dynarray: 0.143634 sec
-- 10_000_000 elements --
list: 0.377447 sec
list: 0.367951 sec
list: 0.312575 sec
dynarray: 0.607158 sec
dynarray: 0.582378 sec
dynarray: 0.538621 sec
my dynarray: 0.319705 sec
my dynarray: 0.296607 sec
my dynarray: 0.286634 sec
-- 25_000_000 elements --
list: 0.971244 sec
list: 0.953493 sec
list: 0.922049 sec
dynarray: 1.515892 sec
dynarray: 1.319543 sec
dynarray: 1.328461 sec
my dynarray: 1.119322 sec
my dynarray: 0.971288 sec
my dynarray: 0.973556 sec
-- 50_000_000 elements --
list: 1.852812 sec
list: 1.848514 sec
list: 1.505391 sec
dynarray: 3.065143 sec
dynarray: 2.941400 sec
dynarray: 2.672760 sec
my dynarray: 2.115499 sec
my dynarray: 1.963535 sec
my dynarray: 1.995470 sec
-- 75_000_000 elements --
list: 2.942536 sec
list: 2.910063 sec
list: 2.354291 sec
dynarray: 4.567284 sec
dynarray: 4.342670 sec
dynarray: 3.979809 sec
my dynarray: 2.528073 sec
my dynarray: 2.225738 sec
my dynarray: 2.226844 sec
A simple dynamic array implementation (my_dynarray) beats a list over a wide range of lengths. But not at all lengths! OCaml's built-in Dynarray is not competitive, but that's because it wants to make certain strong guarantees.
To be clear, I agree with your general point that we can do just fine writing nice clean pure functional OCaml code for most of our code and can hand-optimize where needed. But your very specific claims rub me the wrong way.
if d.length = Array.length d.values then begin
d.values <- Array.(append d.values (make (length d.values) x))
end;
the array reallocation should actually be:
if d.length = Array.length d.values then begin
let new_array = Array.make (Array.length d.values * 2) x in
Array.blit d.values 0 new_array 0 (Array.length d.values);
d.values <- new_array
end;
otherwise we allocate about a third more memory than needed. It's telling that even with this performance bug the dynamic array was broadly better than lists.
New results for the previously slowest cases:
-- 25_000_000 elements --
list: 0.977002 sec
list: 0.963903 sec
list: 0.950473 sec
dynarray: 1.476165 sec
dynarray: 1.281724 sec
dynarray: 1.343222 sec
my dynarray: 0.872558 sec
my dynarray: 0.755902 sec
my dynarray: 0.753746 sec
-- 50_000_000 elements --
list: 1.914777 sec
list: 1.886989 sec
list: 1.542614 sec
dynarray: 2.922376 sec
dynarray: 2.783559 sec
dynarray: 2.537473 sec
my dynarray: 1.725873 sec
my dynarray: 1.545252 sec
my dynarray: 1.515591 sec
-- 75_000_000 elements --
list: 2.827154 sec
list: 2.835789 sec
list: 2.318733 sec
dynarray: 4.354404 sec
dynarray: 4.150271 sec
dynarray: 3.781488 sec
my dynarray: 1.887360 sec
my dynarray: 1.929286 sec
my dynarray: 1.814873 sec
This turns an uneasy head-to-head into a clear win for dynamic arrays. Honestly, how could it be otherwise?
It can't be otherwise if you're just assuming a straightforward compilation model where your written roughly reflects the assembly code that's generated. This just isn't the case with better compilers though.
For instance, Haskell can often perform rewrites or fusion passes that entirely eliminates the loop and all intermediate data structures, effectively giving a near-infinite speedup compared to alternatives. You typically can't perform such optimizations when side-effects are in the middle of the computation, for numerous reasons.
You could have backed this statement up with a benchmark, and you chose not to.
(You are right on the general hand-waving level that such optimizations are sometimes possible. But there are no fuseable intermediate data structures in this particular problem.)
But they are still very much in the same order of magnitude... pretty impressive that these solutions are all in the same ballpark, I would've expected much bigger differences.
The dynamic array spends most of its time copying old elements as it grows exponentially. If you pre-size it to the right size, you eliminate this copying, and the difference becomes 5x. In practice you often have an idea of the size, at least as a rough estimation, so you would win by a larger margin. But that was not part of the spec.
Also, other ways of organizing dynamic not-quite-an-array-but-not-quite-a-linked-list data structures exist. It could be a dynamic array (or linked list) of dynamic arrays to eliminate repeated copying of the oldest elements.
No, not this. You are doing a map.
A traversal of a data collection is not a map but a fold. The exemple in the article is specifically about collection which you would do by consing to a list (mostly free once optimised) in a fold_left (so tail call optimised to a loop). See my other comment.
Also, List.map is not optimised in Ocaml. It uses a naive implementation and not a tail call. You have to use rev_map to get good perf.
> A traversal of a data collection is not a map but a fold. The exemple in the article is specifically about collection which you would do by consing to a list (mostly free once optimised) in a fold_left (so tail call optimised to a loop).
The exact wording in the article is "let's say you're iterating over some structure and collecting your results in a sequence". You are interpreting a lot into this. Also, your description is of a map (reversed, and expressed as a fold). Anyway, where is your benchmark?
> Also, List.map is not optimised in Ocaml. It uses a naive implementation and not a tail call.
You are again contradicting yourself. Previously you were praising OCaml's optimization capabilities and now you are questioning them. Specifically in a case where there is a magic optimization that is explicitly motivated by List.map: https://ocaml.org/manual/5.2/tail_mod_cons.html . A magic optimization implemented using, guess what, mutation.
> The OCaml compiler, great as it is, won't turn linked list source code into array machine code.
Why not? If the compiler can see that you have a short-lived local linked list and are using it in a way for which an array would be faster, why would it not do the same thing that an array would do?
Well, it doesn't make the substance wrong. This paragraph rightly summarizes it:
"The fact that most programming languages don’t give enough semantic information for their compiler to do a good job doesn’t mean it necessary has to be so. Functional programmers just trust that their compiler will properly optimize their code."
The substance of the statements I quoted was wrong. As I wrote elsewhere, I do agree with the broad statement that the combination of functional and imperative features in OCaml works just fine. But if the semantic information you give to the OCaml compiler is "linked list", it will use a linked list rather than a data structure that might be better for the task at hand.
> Well, no, this is straight confusion between what’s expressed by the program and what’s compiled. The idiomatic code in Ocaml will end up generating machine code which is as performant than using mutable array.
I disagree with. There are different ways to get close to the performance of `Array.map` with lists (best case scenario you don't care about order and can use `List.rev_map`), but you will always have memory (and GC) overhead and so lists are strictly inferior to arrays for the presented use case.
That’s not what the article is talking about. The proposed exemple is a traversal of a different data structure to collect results in an array. That’s a fold and will properly be tco-ed to something equivalent to adding to an array if you use list cons in the aggregation, might actually be better depending on how much resizing of the array you have to do while traversing.
Works if you are building one list, but what if you are building multiple? What's suggested on the OCaml site and what's taught in most of academia is to use a recursive function with accumulator arguments that are reversed before returning to make the function tco:able. I doubt OCaml can optimize that pattern well, but idk.
You can benefit from TCO while building multiple lists.
let rec f evens odds = function
| [] -> (evens, odds)
| x :: xs ->
if x mod 2 = 0 then f (x :: evens) odds xs
else f evens (x :: odds) xs
OCaml optimizes this fairly well and will compil it down to a single loop with two variables. If you reverse the lists there is going to be additional loops (and corresponding allocations).
However OCaml also provides the "tail mod cons" optimization that allows to get some of the benefits of TCO without needing to reverse the list (this is implemented as a program transformation that uses mutability under the hood), and that one will only work if you are building a single list.
I think HN ate some characters because that code doesn't look valid. But yeah, that's how you do it. In my opinion it is not pretty (e.g., what if you have some mutable context?). I also don't see how OCaml could turn the cons operations into dynamic array appends.
I think `Array.map` is a perfectly reasonable reading of "you're iterating over some structure and collecting your results in a sequence".
But sure, in the `fold` scenario where you don't know the number of results in advance (you are more likely to know if you use imperative data structures, e.g. `Hashtbl.length` is constant-time whereas `Map.cardinal` is not), lists might be faster than growing arrays with copies. They are still going to use more memory, and they are unlikely to to be faster than a rope-like structure that grows with no copies.
Yep, and moreover, the combination of most library design and general culture around the language reinforces the dynamic of using mutability only when it's needed or the most straightforward way to implement something, and contain that with immutable interfaces wherever possible.
One thing that many people miss is that Haskell's monadic style is a direct consequence of lazy evaluation. It all started because they thought lazyness was nice, and wanted to make a language that brought that front and center. But then they found out that they had to come up with a new way to do side-effects, because traditional side-effects don't work when the order of evaluation is unpredictable.
> One thing that many people miss is that Haskell's monadic style is a direct consequence of lazy evaluation. It all started because they thought lazyness was nice, and wanted to make a language that brought that front and center. But then they found out that they had to come up with a new way to do side-effects, because traditional side-effects don't work when the order of evaluation is unpredictable.
But this is not true, is it? I thought that Haskell solution for side effects was just tagging the functions with side effects, and make the compiler handle functions with side effects as regular programs, no laziness, no memoization, no reordering. 'Monad' is just a mathematical structure that regular programs obey, also 'do notation' is just a style that allows Haskell programmers to write imperative style code.
Monads don't have anything to do with laziness but historically the need for them arose because of laziness. It's the first thing explained in the introduction of Tackling the Awkward Squad:
"Call-by-need (or lazy) languages, such as Haskell, wear a hair shirt because their evaluation order is deliberately unspecified. Suppose that we were to extend Haskell by adding side-effecting “functions” such as printChar. Now consider this list
xs = [printChar 'a', printChar 'b']
(The square brackets and commas denote a list in Haskell.) What on earth might this mean? In SML, evaluating this binding would print 'a' followed by 'b'. But in Haskell, the calls to printChar will only be executed if the elements of the list are evaluated. For example, if the only use of xs is in the call (length xs), then nothing at all will be printed, because length does not touch the elements of the list.
The bottom line is that laziness and side effects are, from a practical point of view, incompatible. If you want to use a lazy language, it pretty much has to be a purely functional language; if you want to use side effects, you had better use a strict language.
For a long time this situation was rather embarrassing for the lazy community: even the input/output story for purely-functional languages was weak and unconvincing, let alone error recovery, concurrency, etc. Over the last few years, a surprising solution has emerged: the monad. I say “surprising” because anything with as exotic a name as “monad” — derived from category theory, one of the most abstract branches of mathematics — is unlikely to be very useful to red-blooded programmers. But one of the joys of functional programming is the way in which apparently-exotic theory can have a direct and practical application, and the monadic story is a good example. Using monads we have found how to structure programs that perform input/output so that we can, in effect, do imperative programming where that is what we want, and only where we want. Indeed, the IO monad is the unifying theme of these notes."
Haskell showed that a monadic bind is a nice solution to chaining together IO operations once you have boxed them inside their own type and boxing IO operations inside their own type was a nice solution to the issue arising from being lazy by default.
Haskell was actually widely successful in showing that, no doubt about that.
The oldest mention of Haskell Monads that I ever had in my hands starts the description with "this nice hack from the Haskell group at University of Glasgow to make I/O nicer without bragging laziness"
Well, it depends on exactly what you mean by side-effects.
First, obviously you have unsafePerformIO, so that everything can have side-effects.
Second, you have side-effects like using memory or using the CPU. You are not supposed to worry about those. Though a more serious side effect you do have to worry about is non-termination. Haskell doesn't track that in its type system.
You are right that the way input/output is handled can be described not as _side-effects_ but as _effects_ of interpreting values of the IO datatype.
They did, but they also did land explicitly to make I/O suck less in lazily evaluated language instead of magic main function signature working to provide explicit ordering.
There's a reason why Monads aren't exactly monadic and why IO was the original monad in GHC -as well as why non-lazy, non-super-pure languages never really go for Monads
Actually, lots of those more pragmatic languages go for monads, they just don't use them for input/output.
The way JavaScript handles async is pretty close to monadic. Error handling via the local equivalent of Maybe / Either is monadic. Tuples are monadic. Sequences can be monadic. Etc. Some languages have a flexible enough type system to expose this (like Haskell), some don't. Some like Rust generally don't expose monads to the type system, but their users are aware enough of the shared monadic structure that you can see it reflected in the naming conventions for functions that do essentially the same thing (in a monadic sense) but for different structures.
It's literally impossible on a CPU. Some people claim FP languages can make optimisations that are based on the fact that values are immutable and which other languages can't, and that's definitely true. But the problem is that those optimisations are almost never actually made by real programming languages, and when they are, they're still slower than a low level imperative language like C or Rust in very nearly every case.
No one needs to prove you wrong because that's not where the goal post actually is.
You could craft hand written assembly code which will be faster than optimised C code most of the time yet you don't. Plenty of programmers are perfectly writing imperative code in Java doing a ton of necessary boxing and unboxing. It's all a trade off between performance and usability. The fact is that the Ocaml compiler does a good enough job with functional code that its performance is actually comparable to imperative solutions most of the time.
If you compile your immutable program with LLVM, literally one of the cure steps is transforming it into functional form that does not allow mutations.
This is called Single Static Assignment form and its denial of mutation is crucial to optimization, from common expression removal, to efficient register allocation, and all sorts of control flow analysis.
> If you compile your immutable program with LLVM, literally one of the cure steps is transforming it into functional form that does not allow mutations.
You probably wanted to write something like 'If you compile your _mutating_ program with LLVM, [...]'?
> The article utterly falls apart in its first paragraph where it itself acknowledges that the whole ML family including Ocaml has perfect support for mutation
Was the article updated since you wrote this? I don’t see the text you’re referring to.
> Well, no, this is straight confusion between what’s expressed by the program and what’s compiled.
You’re getting at an important point here, but then you seem to fall into this same trap when you write:
> … the many fairly convoluted tricks haskellers pull to somehow reintroduce mutations
Monads started out as a way to represent the semantics of effects in a mathematical context, to support formal representations of the semantics of programming languages that were more tractable from the perspective of analysis and proofs. Even mainstream compilers ended up using related techniques, like static single assignment, for which an equivalence to continuation-passing style exists, and they did this for the same kinds of reasons: tractability of analysis and to support automated transformations.
The use of monads for writing ordinary code - as opposed to language semantics - in Haskell exploited these techniques, allowing effects to be expressed in a purely functional way. But at its root, this is a rigorous way of expressing scenarios that require effects, it’s not just some sort of “convoluted trick”. There are benefits to doing this that go beyond just a hack to implement effects in a pure language.
Which is why it’s unlikely that people who understand these issues will “stop using Haskell”, despite the learning curve barrier it seems to cause (arguably because people tend to learn to program in ad-hoc ways, which Dijkstra notoriously bemoaned.) But many of the most powerful languages have such a barrier, it just takes different forms depending on the nature of the language.
> Monads started out as a way to represent the semantics of effects in a mathematical context
I mean, that statement is untrue but even then I wasn't talking about monads here (monads are not a convoluted trick as far I'm concerned). I was thinking of lenses.
> Which is why it’s unlikely that people who understand these issues will “stop using Haskell”
Plenty of people who value being able to analyse program and do proof don't use Haskell. The heart of the debate is "Is being pure worth the cost?".
> The fact that most programming languages don’t give enough semantic information for their compiler to do a good job doesn’t mean it necessary has to be so. Functional programmers just trust that their compiler will properly optimize their code.
While everyone has to trust their compiler will make reasonable optimizations to some extent, there becomes a certain point of complexity where it becomes difficult to intuitively know if a "sufficiently smart compiler"[1] will properly optimize which is problematic.
I realize you're arguing Haskell is worse than Ocaml in this regard, but I'd argue it's harder to reason about how functional code will be translated into machine code than comparable procedural code in general.
I second the conclusion as (a brutal conclusion, but still) to stop using Haskell. Haskell allows imperative-like code but the ergonomics for day-to-day big-tech engineering is far from good. The state monad or lens are excellent tools to re-create a controlled imperative language in a vacuum, and is frankly impressive how much mutation we can conjure up from purity, but the error messages or the required understanding of PLT-ish things makes it non-scalable to "real" teams.
Haskell almost seems like it was intentionally designed to perform poorly on real computers, primarily because of space leaks and secondarily because the non-strict evaluation gets compiled into a lot of function pointer jumps, which branch predictors hate.
I think it's funny that they make you write linked list code as a metaphor for generators, but it seems like it should be the other way round.
(Also, it has exceptions which are a bad language feature, and typed throws which are a worse one.)
It seems that way because it kind of is. The early days of functional research were equally focused on designing alternative computer architectures that were more suited to functional paradigms.
Now that hardware angle has not been very successful on the whole, and we are left with languages that end up feeling a bit out of place on the hardware we have today.
Another thing to note is that there is a lot of untapped potential in fb compilers. It’s suffering from underinvestment.
By simulating exceptions, do you mean a `Result t e` type (which Haskell calls `Either l r`)? You can use these and the Functor/Applicative/Monad hierarchy to handle errors.
What is presumably talked about is that Haskell also has actual exceptions, generated by calling e.g. `error` or `undefined`.
The semantics of these are.. interesting, mostly thanks to lazy evaluation. For example, `fst (5, error "second") ` is safe to evaluate because the second half of the tuple is a thunk and does not get evaluated. Additionally, there is, to my knowledge, no way to handle exceptions in pure code, presumably due to the undefined evaluation order.
That said, I'm not sure what the alternative would be, because a function like !! (list indexing) can fail and dealing with its fallibility would be a big burden on the programmer.
> By simulating exceptions, do you mean a `Result t e` type (which Haskell calls `Either l r`)?
Yes, exactly.
> The semantics of these are.. interesting, mostly thanks to lazy evaluation. For example, `fst (5, error "second") ` is safe to evaluate because the second half of the tuple is a thunk and does not get evaluated
Correct, and the semantics of loops is also.. interesting. For example `fst (5, last [1..])` is also safe to evaluate.
> What is presumably talked about is that Haskell also has actual exceptions, generated by calling e.g. `error` or `undefined`.
Well, I'm not sure, that's why I asked. I'm trying to understand what astrange meant by "it has exceptions which are a bad language feature, and typed throws which are a worse one". (Throwing exceptions from pure code should be left to such cases, that are impossible to recover from, in my opinion.)
> there is, to my knowledge, no way to handle exceptions in pure code, presumably due to the undefined evaluation order
Correct
> That said, I'm not sure what the alternative would be, because a function like !! (list indexing) can fail and dealing with its fallibility would be a big burden on the programmer.
Indeed. Even more so, what is one supposed to do when an invariant has been violated due to a programming error and there's no way to make progress? Haskell's exceptions are essential. It's even better when they're used in a well typed, well scoped manner, such as provided by my effect library Bluefin
Hm yeah, I'm not sure what astrange's point was, other than probably dissatisfaction with the combination of exceptions and Either (which I think is awfully named, as it implies a certain 'equality' between l and r).
I haven't used Bluefin, but don't Bluefin exceptions suffer from the same issue where it, essentially, 'infects' the program flow? For example, `fst (5, error "second")` seems like it would translate to `fst $ (5,) <$> throw e "second"`, where you would also need to pull an `e` from somewhere. More realistic, something like head would probably be quite awkward:
head e [] = throw e "head: empty list"
head e (x:xs) = pure x
You now need to pass in the exception handle, and the return value is now `Eff es a` instead of `a`. This means that you cannot just use the result, so you will likely need to fill your program with monadic stuff like do-notation, <$> and <$>, complicating the program flow and likely also reducing laziness.
Sure, exceptions are very different to Result-style error handling - even checked exceptions. Here are some differences:
* Errors must be explicitly listed as part of the function signature. Checked exceptions and the equivalent for exceptions but they are rarely used in practice. I think Android uses them, but they were so unpopular in C++ that they removed them from the language!
* The syntax to catch and handle errors is very different and more more verbose for exceptions. It can also make flow control a real pain in some languages where you can't declare a variable outside the try body (e.g. references in C++).
* Result errors need to be explicitly handled whereas exceptions are silently propagated by default.
Even though they're similar enough that you could translate one to the other in most cases, they're different enough that saying one is "an emulation" of the other is just stupid.
In my experience Result-based handling is far superior with two exceptions:
1. In functional code like map & filter where it can become quite awkward to explicitly deal with returning errors.
2. It's hard to get a stack trace from where the Err was created rather than from where it was unwrapped. Less of a problem with exceptions which record a stack trace from where they were thrown (in most languages anyway - C++ is an annoying exception).
> saying one is "an emulation" of the other is just stupid.
I did say that indeed. Am I to conclude doing so was stupid? If so I would find that very rude.
(For what it's worth I was trying to understand what astrange meant by "it has exceptions which are a bad language feature, and typed throws which are a worse one" and offering that characterization as a way of trying to tease out exactly what he/she meant. Your response contains many interesting points and I would otherwise be interested in discussing with you further, but I'm not too inclined to now that you have suggested you might think I'm stupid.)
They don't get stack traces, for one. (That's arguably the biggest problem with Rust: .unwrap() gives you a stack trace, but has problems; whereas ? erases your stack trace.)
In principle, static analysis could identify unhandled exceptions, then trace the exception, then make that information available to the top-level "Err returned from main" handler. In practice, that's never going to happen in Rust.
Sum types can have stack traces by adding a stack trace on creation. `Result<(), &'static str>` does not have a stack trace, but `Result<(), std::backtrace::Backtrace>` sure does. YMMV as to whether it makes sense to do that in any particular circumstance.
Sure, if your definition of exceptions includes "must include a stack trace", then sum types can't simulate exceptions. But by that definition Haskell hasn't had exceptions until the last year or two. You might agree with that (I don't) but I'm trying to understand astrange, who said "[Haskell] has exceptions which are a bad language feature, and typed throws which are a worse one". It seems doubtful that "having a stack trace" is part of what he/she considers bad about exceptions, so that aspect is probably not relevant to my line of questioning. What exactly is bad about exceptions? That's the point of me forking off this thread. So far no one has offered an answer.
And not even always how the code is compiled. Canned runtime library routines can you destructive techniques to produce their outputs. The program doesn't see an aggregate object until the function returns it. (If we set aside lazy structures for a moment, but those are actually another example of something that can be built destructively under the hood. As you probably more deeply into the object it is mutated to make more of it materialize.)
I'm not convinced about the dismissal of option 2. I agree ST is clunky but not for the reasons given. It's clunky because it's impossible to mix with other effects. What if I want ST and exceptions, for example, and I want the presence of both to be tracked in the type signature? ST can't do that. But my effect system, Bluefin, can. In fact it can mix not only state references and exceptions, but arbitrary other effects such as streams and IO.
Nice, first I'm hearing of bluefin - I'll be sure to check it out.
As an aside, I watched an Alexis King stream (which I can't now find) in which she did a deep dive into effect systems and said something along the lines of: algebraic effect systems should not change their behaviour depending on nesting order e.g. Either<State<>> vs State<Either<>>.
Does bluefin have a particular philosophy about how to approach this?
I agree with Alexis. Bluefin approaches this by not actually having a nesting order. The effects that can be performed as specified as function arguments, so they can be freely reordered without changing behaviour. effectful, which was one of the inspirations for Bluefin, is similar but uses constraints instead of function arguments, which are even more free to reorder.
Oh, I meant it's impossible to mix ST with actual exceptions as implemented in the RTS, rather than with ExceptT which simulates exceptions in pure code (like StateT simulates mutable state in pure code).
You're right, through a stroke of luck it's possible to use `ExceptT e ST r` and either handle the exception part first, to get `ST r`, or handle the ST part first to get `Either e r`, so in that sense you can "mix" exceptions and ST. That doesn't work for all transformers though. If you have `Stream (Of a) ST r` then you must consume the stream first. You can't run the ST part an get a pure stream `Stream (Of a) Identity r`. So in that sense ST can't be mixed with other effects. Bluefin does allow you to do that, though.
Yes, but transformers have a few drawbacks: the order of stacking alters behaviour, and you need to write n^2 instances for n transformers.
Compare ExceptT e (StateT m a) and StateT (ExceptT e m a): if you just want your computation to have state and exceptions the difference shouldn’t matter.
> if you just want your computation to have state and exceptions the difference shouldn’t matter
But... you don't just want that. You almost certainly care whether state changes are discarded when an exception is thrown. I don't claim that the types there are the most obvious or natural way to specify that, but there is a meaningful difference that shouldn't be handwaved away.
The fact that with transformer stacks you can wind up with state changes being discarded on exception if you get the stack arranged in a particular order isn't what I'd call a _feature_ of transformer stacks :-)
If effect A is invoked before effect B, effect A should happen and stay that way: if I update state before an exception is thrown, the update should persist. If I send a network message before an exception is thrown, the network message is still sent. If I launch the nukes before an exception is thrown, the missiles are still flying.
If I want to batch up state updates to only happen together as a unit, I'll use a transaction effect.
I might have this wrong but I think if you want state and exceptions you probably want StateT (ExceptT e m a). The alternative would be to have state or exceptions, i.e. when you have an exception you no longer have state (which might be a legitimate type in some cases).
Remember that transformers are "inside-out", i.e. `StateT (ExceptT e m) a` is isomorphic to `m (Except e (State a))`. If we want to keep state if an exception occurs, you need a `m (State (Except e a))` which is `ExceptT e (StateT m) a`.
The way I remembered it, before I internalized it, was to think about applying the run functions one at a time. runSomethingT will take a `SomethingT ... m ... a` and give you some kind of `m (... a)`.
I don’t know how Swift and Koka handle things, but I’ve written a lot of Tcl that uses the same CoW reference-counting trick. (Tcl is an under-appreciated FP language: everything is a string, and strings are immutable, so it has had efficient purely declarative data structures for decades).
The downside in Tcl is that if you refactor some code suddenly you can add a new reference and drop into accidentally quadratic territory because now everything is being copied. This leads to some cute/ugly hacks that are only explainable in terms of interpreter implementation details, purely to reduce a refcount in the right spot.
In Swift you occasionally have to introduce a temporary local variable to avoid accidentally quadratic behavior, but I've never seen it require anything complicated or hard to explain.
class Foo {
var foo = [Int]()
var bar: [Int] {
get {
foo
}
set {
foo = newValue
}
}
}
let obj = Foo()
Calling `obj.foo.append(i)` in a loop takes linear time, while `obj.bar.append(1)` is quadratic time. `obj.foo.append()` does a borrow operation resulting in there never being more than one reference at a time, while `obj.bar.append()` does a get followed by a set, meaning that there's always two references to the array and every append does a copy on write. `let bar = obj.bar; for i = 0..<max { bar.append(i) }; obj.bar = bar` would do just a single CoW.
Usually of course your computed properties actually do something so this difference feels less surprising. Swift does offer the undocumented `_modify` property accessor to let computed operations do borrows, but making it an official feature is waiting for noncopyable types to be finalized.
A variant of option 4 is to keep track of references you know cannot possibly be shared, and update those by mutation. Compared to reference counting, it misses some opportunities for mutation, but avoids the false sharing.
To what extent is this already being done by other functional blanguages that have CoW mutability?
This seems like a legal compiler optimization to make in most cases no?
Clean has been doing this for nearly as long as people have been using monads, but it never got the attention Haskell did, which I think is quite unfortunate. Rather than implictly keeping track of references, uniqueness types are marked explicitly to inform that their values cannot be aliased. They can also be used with monads to improve ergonomics a bit.
Granule has uniqueness types like Clean built onto a linear type system, which offers some additional advantages.
Disclosure: I work on Koka's FBIP optimization (Option 4).
> The most efficient data structure to use here would be a mutable dynamic array and in an imperative language that's what pretty much everyone would use. But if you asked an OCaml programmer, they would almost certainly use a linked list instead.
I agree with this sentiment. However, OCaml does have mutable arrays that are both efficient and convenient to use. Why would a programmer prefer a list over them? In my opinion, the main benefit of lists in this context is that they allow pattern matching and inductive reasoning. To make functional programming languages more suited for array programming, we would thus need something like View Patterns for arrays.
A related issue is that mutation can actually be slower than fresh allocations in OCaml. The reason for this is that the garbage collector is optimized for immutable datastructures and has both a very fast minor heap that makes allocations cheap and expensive tracking for references that do not go from younger to older elements. See: https://dev.realworldocaml.org/garbage-collector.html#scroll...
> Unfortunately, this makes it impossible to use any standard functions like map on linear values and either makes linearity nearly useless or inevitably creates a parallel, incomplete universe of functions that also work on linear values.
You can implement polymorphism over linearity: this is done in Frank Pfenning's SNAX language and planned for the uniqueness types in a branch of OCaml.
> This might sound a little dangerous since accidentally holding on to a reference could turn a linear time algorithm quadratic
No, the in-place reuse optimization does not affect the asymptotic time complexity. But it can indeed change the performance drastically if a value is no longer shared since copies are needed then.
> A tracing garbage collector just doesn't give you this sort of information.
> for now even these struggle to keep up with tracing garbage collectors even when factoring in automatic reuse analysis.
I investigated the linked benchmarks for a while. The gap between Koka and Haskell is smaller than described in that initial comment, but a tuned GHC is indeed a bit faster than Koka on that benchmark.
The author didn't write a good objection to option 2. Both the ST monad (real mutations) and the variety of State monads (simulated mutations) work fine in practice. What's even better is the STM monad, the software transactional memory monad that is not only about mutations but also solves synchronization between threads in a way that's intuitive and easy to use. But let's stick to the ST monad. Has the author looked at how hash maps and hash sets are implemented in Haskell? It's arrays and mutation!
> And if you only need mutation locally inside a function, using ST makes your code fundamentally more imperative in a way that really forces you to change your programming style.
This isn't great either and doesn't exactly help with readability, so the mental overhead is rarely worth it.
What?? You are explicitly opting into writing mutation code. Of course that's going to change your programming code. It is expected to be different. Readability is increased because it clearly delineates a different coding style. And it's not even that different from other monadic code, even when you compare to non-mutating monadic code.
Ben Lippmeier's Disciplined Disciple Compiler (DDC), for his language later called Discus, was interesting. It was/is an experimental language that managed mutation through an effect typing system. In the intro to his thesis he talks about it some.
The thesis is the more interesting of those two links IMHO. The intro is chapter 1 that starts at page 17 of the pdf. It has one of the better critiques of Haskell that I've seen, and explains why uncontrolled mutation is not the answer. Reference types ala ML aren't the answer either, in his view.
I’m not even a big OCaml fan (you can use Algolia on my comment history…), but this article is just factually wrong.
> For example, let's say you're iterating over some structure and collecting your results in a sequence. The most efficient data structure to use here would be a mutable dynamic array and in an imperative language that's what pretty much everyone would use.
> But if you asked an OCaml programmer, they would almost certainly use a linked list instead.
What? One of OCaml’s most notable features as a functional programming language is how it was designed to support mutation (see “the value restriction”, e.g. https://stackoverflow.com/questions/22507448/the-value-restr...) In my own OCaml programs I used mutation whenever appropriate (my only complaint would be that I wish there were a little more syntactic sugar around e.g. hash table access).
I wanted to like this post but it seems like low-effort clickbait.
How come the CoW method requires runtime reference counting? A lot of the same benefit (but not all) should be available based on static analysis right?
Especially if the approach isn't really Copy on Write, but Copy only when someone might want to use the old value. Default to trying to mutate in place, if you can prove that is safe.
For most locals, that should be rather doable, and it would be a pretty big gain. For function parameters it probably gets hairy though.
> How come the CoW method requires runtime reference counting?
Because it doesn’t do copy-on-read, you have to know whether there are references other than yours that can read the data. A single bit “at some time there were at least two references to it” doesn’t suffice, as it would mean you can’t detect when the last reference goes away, so it would leak memory (lots of it)
> A lot of the same benefit (but not all) should be available based on static analysis right?
That’s an (very important) implementation detail that makes reference counting perform reasonably well. You don’t want increase-decrease cycles in tight loops, for example.
There is a way to do FBIP without reference counting - use immutable store semantics (always copy). At that point you are doing a lot of copying but Haskell etc. are actually pretty good at managing this sort of profligate duplication. And of course it is possible to use RC-at-compile-time and other analyses to optimize away the copying - the difference is that runtime RC is not required like it is in Koka. There is even a static analysis algorithm from 1985 https://dl.acm.org/doi/abs/10.1145/318593.318660 (never implemented in Haskell because they went with the ST monad) There is a theorem in the paper that for "natural" translations of imperative programs into FP, their algorithm will optimize the FP back to the imperative program or better.
Say you have an array x = [1,2,3]. You do x[2] = 4. Now you have two arrays, x_1 = [1,2,3] and x_2 = [1,2,4]. x_2 is a "copy" with all elements the same except one. The "always copy" here means that (semantically at least) every array modification creates a copy of the array like this. Naturally this is slow, if you implemented it as-is, but then there are the optimizations I mentioned. And it is much easier to optimize a simple semantics consistently than to debug a more complex one. It is somewhat similar to Swift's value semantics.
I learned programming in the 1980s based on examples from the 1970s and I would see (and write JNI wrappers for) FORTRAN codes in the 1990s that were built around algorithms that could work on data in place with minimal duplication of data structures such as sorts, FFTs, even when it wasn't obvious that they could do so.
I enjoyed this article. As someone who has written too much haskell and ocaml, and now writes mostly Rust, I am biased but I think this problem is mostly solved by rust. (The author mentions rust in option 3, but I think underappreaciates it.)
The author mentions linear types. This is a bit of a pet peeve of mine because, while very useful, linear types are not the concept that many people think they are and they are not implemented in rust (and neither are affine types). What rust implements is referred to as a "uniqueness type".
The difference has to do with how they deal with what linear-types-people call "exponentials". A linear type, as the article mentions, is the type of a value must be "consumed" exactly once (where consuming means passing it into a function, returning it, or sometimes destructuring it). Of course, in this mad world of ours we sometimes need to consume a value more than once, and indeed a language with only linear types would not be turing-complete. This escape hatch is called an "exponential", I guess because exponential is kind of like the opposite of linear. A value with an exponential type can be used as often as you want. It is essentially most types in most programming languages.
IF a function expects a value with a linear type, can you pass an a value with an exponential type to it? The answer is that you can. Try this in linear haskell if you don't believe me. A function taking a value with a linear type just says "I consume this value exactly once, but you can pass me whatever you want". The restriction is that values with linear types can only be passed to functions that expect linear types. A value with a linear type must be consumed exactly once, so you certainly can't pass it to a function that expects a value with an exponential type, because it might use that value twice. In other words, a linear type is a restriction on the callee.
Those familiar with rust will notice that this is not how rust works. If a function takes T, and you have &T, you just cannot call that function. (Ignore Clone for now.) However, in the world of linear types, this would be allowed. This makes linear types not useful for the article's purposes, although they are still very useful for other things.
What rust wants is a constraint not provided by linear types. Where linear types are a restriction on the callee, rust wants to be able to restrict the caller. It wants to be able to restrict the caller in such a way that it can know that there are no other references to a variable that's been passed into a function. People call this a "uniqueness type" because you can say "I want this type to be 'unique'" (where 'unique' means not-aliased). Honestly this name doesn't make a lot of sense, but it makes a little more sense when you think of it in terms of references. If a reference is unique, then it means that no other reference that points to the same object (which is the requirement rust imposes on mutable references). So while a linear type allows you to pass a non-linear variable to a function that expects a linear one, rust doesn't allow you to pass a non-unique variable to a function that expects a unique one.
And adding this requirement to mutations resolves 90% of the issues that make mutability annoying. Mutability becomes challenging to understand when:
1. You have multiple references pointing to the same data in memory.
2. You change the data using one of these references.
3. As a result, the data appears to have changed when accessed through any of the other references.
This simply cannot happen when mutability requires values to not be aliased.
> IF a function expects a value with a linear type, can you pass an a value with an exponential type to it? The answer is that you can. Try this in linear haskell if you don't believe me.
> Those familiar with rust will notice that this is not how rust works. If a function takes T, and you have &T, you just cannot call that function. (Ignore Clone for now.)
I think this is wrong. An exponential type in Rust is a type that implements `Copy`. The analogy in Rust is:
And that compiles fine: `foo` is implicitly copied to maintain that `linear_fun` owns its parameter.
You can ignore `Clone`, but ignoring `Copy` destroys the premise, because without it Rust has no exponential types at all.
EDIT: I agree Rust solves the issue of mutability fairly well. Furthermore, I think practical linear types can be added to a Rust-like type system with Vale's (https://vale.dev/) Higher RAII, where a "linear type" is an affine type that can't be implicitly dropped outside of its declaring module.
I don't know if this is what Vale does, but to enforce "can't be implicitly dropped outside of its declaring module" in Rust I would add two changes:
- Whenever the compiler tries to insert implicit drop code for a linear type outside of its declaring module, it instead raises an error.
- Type parameters get an implicit `Affine` auto-trait, like `Sized`. If a type parameter is `?Affine`, the compiler will refuse to insert implicit drop code. Standard library generic parameters will be `?Affine` wherever possible, e.g. containers like `Vec` and `HashSet` will have `T: ?Affine`, but the methods that could implicitly destroy an element like `HashSet::insert` will have plain `T`.
> You can ignore `Clone`, but ignoring `Copy` destroys the premise, because without it Rust has no exponential types at all.
it somewhat strains the analogy because rust is implemented in a very elegant way (where references can be used multiple times because they implement Copy), but the analogy to exponentials in rust would be references. Just imagine clone and copy aren’t a thing, and that references have a special case that allow them to be used multiple times while owned values can be used at most once. The thing to note is that if you have an owned value, you can pass a reference to it to as many functions as you want (so long as those functions expect references). But if you have a reference, you can’t pass it to a function that expects an owned value unless the type provides you a way to make a copy.
You can imagine starting with this and then building up to rust in a nice way. You first implement passing an owned value as a move. Then first add types that can be used multiple times because they are still valid after a move. (the Copy trait.) And then you make references Copy since they meet that criteria.
I blame Haskell and the sudden fixation on absolute purity that manifested just as the VC-startup set decided it was the next secret sauce, to the point of literally redefining what "functional programming" meant in the first place.
I think that fixation has forced a lot of focus on solving for "how do we make Haskell do real work" instead of "how do we make programming in general more predictable and functional", and so the latter fight got lost to "well we bolted lambdas into Java somehow, so it's functional now".
I recently ran into this issue when trying to memoize a simple numerical sequence in Hoon (yes, that Hoon. I know, I know...).
Let's use the fibonacci sequence as an example. Let's write it the classic, elegant way: f(n) = f(n-1) + f(n-2). Gorgeous. It's the sum of the two previous. With the caveat that f(n=0|1) = n. In Python:
# fib for basic b's
def fib(n):
## Base case
if n == 0 or n == 1:
return n
return fib(n-1) + fib(n-2)
Right off the bat, performance is O(n)=n*2. Every call to f(n-1) will also need to compute f(n-2) anyways! It's a mess. But since Python passes arrays and dictionaries as pointers (cough, sorry! I meant to say references) it's super easy to memoize:
# optimize-pilled memoize-chad version
def fib(n, saved={}):
if n in saved:
return saved[n]
if n == 0 or n == 1:
saved[n] = n
else:
saved[n] = fib(n-1) + fib(n-2)
return saved[n]
Okay, now our version is nearly as fast as the iterative approach.
This is the normal pattern in most languages, memoizing otherwise "pure" functions is easy because you can reference a shared object using references, right? Even with multithreading, we're fine, since we have shared memory.
Okay, but in Hoon, there are no pointers! Well, there kinda are. The operating system lets you update the "subject" of your Urbit (the context in which your programs run), and you can do this via the filesystem (Clay) or daemons (Gall agents, which have their own state kind of).
But to do this within a simple function, not relying on fancy OS features? It's totally possible, but a huge pain the Aslan.
First, here's our bog-standard fib in Hoon:
|= n=@ud
?: (lte n 1)
n
%+ add
$(n (dec n))
$(n (sub n 2))
Now, I memoize on the way down, by calculating just f(n-1) and memoizing those values, to acquire f(n-2):
:- %say
|= [* [n=@ud ~] [cache=(map @ud @ud) ~]]
:- %noun
^- [sum=@ud cache=(map @ud @ud)]
=/ has-n (~(get by cache) n)
?~ has-n
?: (lte n 1)
[n (~(put by cache) n n)]
=/ minus-1 $(n (dec n))
=/ minus-2
=/ search (~(get by cache.minus-1) (sub n 2))
?~ search 0
(need search)
:- (add sum.minus-1 minus-2)
(~(put by cache.minus-1) n (add sum.minus-1 minus-2))
[(need has-n) cache]
and that works in the Dojo:
> =fib-8 +fib 8
> sum.fib-8
21
but it sure is easier in Python! And I'm not picking on Hoon here, it's just pure functional programming that makes you think this way - which as a hacker is fun, but in practice is kinda inconvenient.
I even wonder how much faster I actually made things. Let's see:
> =old now
> =res +fib 18
> sum.res
2.584
> (sub now old)
1.688.849.860.263.936
:: now with the non-memoized code...
> =before now
> +fib 18
2.584
> (sub now before)
1.125.899.906.842.624
Ha! My super improved memoized code is actually slower! That's because computing the copies of the map costs more than just recurring a bunch. This math should change if I try to compute a bigger fib number...
Wait. Nevermind. My memoized version is faster. I tested it with the Unix time command. It's just that Urbit Dojo has a wierd way of handling time that doesn't match my intuition. Oh well, I guess I can learn how that works. But my point is, thinking is hard, and in Python or JS or C I only have to think in terms of values and pointers. And yes, that comes with subtle bugs where you think you have a value but you really have a pointer! But most of the time it's pretty easy.
Btw sorry for rambling on with this trivial nonsense - I'm a devops guy so this is probably super boring and basic for all you master hn swe's. But it's just a tiny example of the constant frustrations I've had trying to do things that would be super simple if I could just grab a reference and modify something in memory, which for better or worse, is how every imperative language implicitly does things.
>The standard way to expel stated mutable state us to push it into function parameters and returned values.
This is precisely what I did in my Hoon solution :) However, I wasn't aware that this approach is the standard way, and I'm glad to have learned that! Thanks
++ fib
|= a=@
^- @
~+
?: (lte a 1) a
%+ add
$(a (sub a 2))
$(a (sub a 1))
Try e.g. (fib 100) (don't try it without the ~+)
The compiled code is itself memoizable at the VM execution level. This memo cache is transient within one system event (i.e., pressing enter after (fib 100) to get the result).
Although I'm curious why Hoon doesn't just detect and cache identical computations by default. I guess it's a tradeoff, since using ~+ is more memory intensive, and you don't always want that either. Especially is Urbit itself is already fairly memory intensive.
There's a lot of reductions that happen during execution, and not many are usefully going to be repeated identically; you'd end up with an extremely large and nearly entirely useless cache if you tried to cache everything. So you hint the VM when it turns out that something does repeat.
What has to repeat is an identical subject (which will have a shape like [argument-to-fib other-local-context standard-library]) and formula (the compiled code for fib). Pretty much the only time this will ever happen is something recursing into itself. Most tail recursion doesn't reevaluate the exact same arguments multiple times. It just so happens that naive fib's exponential self-recursion into itself twice does do that.
So it wouldn't be useful to stick a ~+ on, say, factorial. Except! If you're, say, computing every factorial from 1 to 100 in a loop, it comes in handy - because now you can reuse your computation of, say, 50! when computing 51!, so it's just one multiplication.
But most Nock reductions by volume are not anything that repeats usefully.
IMO, Functional Programming was a zero interest rate phenomenon. Some mathematicians suffering from professional deformation believed that programming should adhere to the purity of mathematical conventions... Meanwhile, there was no proof whatsoever to support the hypothesis that constraining programming to such conventions would be beneficial in a practical sense.
FP proponents spotted a small number of problems which arose in certain specific OOP implementations and stubbornly decided that OOP itself was to blame.
Some FP proponents have pointed out that passing around instance references to different parts of the code can lead to 'spooky action at a distance.' For example, if references to the same child instance are held by two different parent modules and they both invoke state-mutating methods on the child instance, it becomes difficult to know which of the two parent modules was responsible for the mutation of state within the child instance...
The mistake of FP proponents is that they failed to recognize the real culprit of the flaws that they've identified. In the case of 'spooky action at a distance', the culprit is pass-by-reference.
If you keep that in mind and consider what OOP pioneers such as Alan Kay have said about OOP "It's about messaging," it becomes an irrefutable fact that this flaw has nothing to do with OOP but is merely a flaw in specific implementations of OOP... Flawed implementations which neglected the messaging aspect of OOP.
To summarize it simply; with OOP, you're not supposed to pass around instances to each other, especially not references. What you're supposed to pass around are messages. The state should be fully encapsulated. Messages are not state, instances are state. Instances shouldn't be moved around across multiple modules, messages should.
If you approach OOP with these principles in mind, and make the effort to architect your systems in such a way, you will see it solves all the problems that FP claims to solve abd it doesn't introduce any of its problems... Which are numerous and would require an entire article to enumerate.
Interesting: I came to a similar conclusion when I developed a state management system I coined "nation state." (A group of related variables with scope that is between local and global scope.)
Svelte has these stores that are globally reactive. The reactivity is convenient, but the stores can also be globally updated. This could result in chaos that I wished to corral.
So I tweaked stores so they remained globally reactive, but could only be directly updated internally (from "inside the nation").
To update the the state from outside the nation, an event (message) must be sent to the nation state, which handles the direct update.
It's interesting how FP community took one issue which affects many OOP implementations and decided to throw the baby out with the bathwater... Completely ignoring the reason why OOP was invented in the first place.
I've worked on several FP projects. First of all, they're never pure FP projects because that's just not possible. Those which were close to pure FP were a tangled mess of curried functions. Many modules had weird technical names with unclear responsibilities, most features had to traverse many different files/functions with lots of unrelated data passing through various modules with detailed child state which had to be passed down from high up in the 'module' hierarchy and traverse many layers to finally get to the module which would actually use that state. Higher level modules exposed complex functions with complex input and output parameters (had to concern themselves with the highly specific state requirements of child and parent modules; including configurations!) which would have to be modified for every single high level AND low-level requirement change... I'm not surprised why big companies who use FP started using monorepos; FP makes monorepos necessary because every small change requires updating a large part of the code hierarchy because everything is intertwined with everything else (tight coupling). The higher level components have to know exactly what state the child components expect because the data comes from a central place and must be passed down.
The alternative approach would be to have each child sync itself with a global store; but then, it implies that you no longer have a single owner for each piece of state; thus, you might end up with different children which have overlapping responsibilities which update the same part of the global state, creating 'spooky action at a distance' which makes it hard to trace where the change originated... This takes us back to the very reason why FP community invented FP in the first place. In other words, FP doesn't solve the 'spooky action at a distance' problem which was its entire raison d'etre.
Frameworks like React had to invent a whole host of technical concepts and tools to help manage and track complex state changes... E.g. Redux, Saga, etc... But then we still end up with a situation where all our components are tightly intertwined with the specific implementation of that specific framework since the number of variations of how people use Redux/Sagas, etc... across different projects/companies is substantial. This means that components built for one project are generally not immediately compatible with components built for a different project. If the two projects use a different global store mechanism or they use the same mechanism but have different names for the same actions (to dispatch state changes), the components will not be compatible across different projects without a lot of refactoring to bring the component into alignment with both projects. This isn't modular! This isn't easy to read! It's not maintainable!
Slightly different perspective from Grokking Simplicity[1]: functional programming is not about avoiding mutation because "mutation is bad." In fact, mutation is usually the desired result, but care must be taken because mutation depends on when and how many times it's called.
So good FP isn't about avoiding impure functions; instead it's about giving extra care to them. After all, the purpose of all software is to cause some type of mutation/effect (flip pixels on a screen, save bits to storage, send email, etc). Impure functions like these depend on the time they are called, so they are the most difficult to get right.
So Grokking Simplicity would probably say this:
1. Avoid pre-mature optimization. The overhead from FP is usually not significant, given the speed of today's computers. Also performance gains unlocked by FP may counter any performance losses.
2. If optimization via mutation is required, push it as far outside and as late as possible, keeping the "core" functionally pure and immutable.
This is similar to Functional Core, Imperative Shell[2]; and perhaps similar to options 1 or 2 from the article.
[1]: https://www.manning.com/books/grokking-simplicity
[2]: https://hw.leftium.com/#/item/18043058