A data structure cannot be functional. I understand what he's trying to say, and agree with most of it, but the word "functional" is purely wrong. What he wants to say is "good". But not all "functional programming" is good, nor is all good programming functional by necessity, despite what your local Lambda The Ultimate nerd tries to tell you.
The Best, when it comes to data structures is a Directed, Acyclic Graph. For instance your typical linux filesystem is a DAG. But there's one problem with DAGs: When they reach a certain complexity human brains are not fit enough to parse them anymore. (programs still can though)
So in many circumstances at least a human programmer needs to take a look at the state of your program and make assumptions about its correctness, which is called debugging. And that's why in Good programs we often use Good data structures instead of The Best.
Good data structures are key->value stores (which you may know as "hash tables" or "dictionaries"), trees, and trees in a simplified special form: lists, each of them being somewhat able to represent the other two, if one can accept a performance hit and/or increased complexity in source code. Dictionaries, trees, lists. That's it. And you do that in every programming language that is at least a little bit interested in being Good.
So there's nothing special or functional about git's data structures, it's just normal Good programming, and a few programmers who are so good at programming that they don't even need to mention it anymore, they breath good programms.
Then of course to the normal bread-earning coder good programs are a rare sight. But the reason is not that they are really rare, the reason is that successful business doesn't really require Good programs to succeed. Mediocre programs are good enough to earn their rent, and most of us spend most of our coding hours to earn our rent.
All that being said, if you don't just want to make money, go and spend some time studying git internals. It will teach you a lot more than most of your teachers/professors taught you combined. Sadly the source code is written by Linux gurus, who like to encrypt their source code with a very special key that only people from their tribe can understand. But the Git Book is actually good enough that you can study quite a lot of the internals from that book. I also suggest writing your own git in your favorite programming language once, to really understand it.
> A data structure cannot be functional. I understand what he's trying to say, and agree with most of it, but the word "functional" is purely wrong. What he wants to say is "good". But not all "functional programming" is good, nor is all good programming functional by necessity, despite what your local Lambda The Ultimate nerd tries to tell you.
Why is it that whenever functional programming comes up, "real programmers" come out of the woodwork to bloviate, only to expose just how little they actually know about the topic?
"Purely functional" in this case is a jargon term that relates to Okasaki's 1996 PhD thesis, available online (https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf). It's a classification for data structures much in the way "regular" or "context-free" or "turing-complete" serve as classifications for grammars.
For a data structure to be "purely functional" (and no, article author does not mean "good" here) simply means that it's implementable in a pure functional language without mutation. Examples of non-functional data structures would be ones where mutation is intrinsic to how its algorithms work: traditional linear probing hash tables, union-find for graphs, etc.
By this technical criterion, the git object graph is clearly a purely functional data structure, sorry.
Since there is even another branch now, I paste it:
I feel like I already said "I understand but I disagree".
A round wheel is more useful for a car than a triangular wheel. That doesn't mean it's a "car wheel". It's just as good on a horse wagon or a bike.
As others have said, "purely functional data structure" is a jargon term with a precise meaning--that fact that this is true indicates that data structures can, in fact, be functional in this sense. To claim that "the word 'functional' is purely wrong" is, well, purely wrong; the data structure in Git being described is purely functional.
Beyond that, you seem to have shown misunderstanding of what that means (despite your claims of "I understand but I disagree"), as the other examples you give DAGs, dictionaries, trees, lists, and this comment thread, do not have the interesting properties of purely functional data structures that make them worth discussing in the context of Git. Namely, the immutability of existing entries.
The thesis of the article isn't that purely functional data structures are Good, or that functional programming is Good. Those are quality judgments that are not present in the article. The thesis is: Explaining Git in terms of the purely functional data structure it uses might be helpful to learning Git.
Yes, I didn't accept that part of the logic. Neither do I believe that immutability is something good, nor do I believe that git is very immutable, nor that git gains from immutability.
I believe mutability is a trade off. You can make objects more immutable by being less efficient in terms of storage size, search time, source code simplicity. In exchange for these negative attributes you get source code you can proof more easily. It's good when that attribute is needed, but not most of the time.
Git is only immutable on that meta level where we talk about the object key-value store. The implementation uses mutable files though, and even steps away from that immutable key-value store when the size gets too big and needs to store more efficiently.
And on the user level you can amend, edit, squash and delete commits easily, which is actually a strength of git not a weakness. People love it for being more mutable than SVN.
> Git is only immutable on that meta level where we talk about the object key-value store. The implementation uses mutable files though, and even steps away from that immutable key-value store when the size gets too big and needs to store more efficiently.
Functional languages are only immutable on that meta level where we talk about objects in memory. The implementation uses mutable RAM though, and even garbage collects when the size gets too big and needs to store more efficiently.
Unless you want to narrow the term 'functional' so much it describes nothing at all, Git is a functional data structure.
It seems like you say that with the goal of proofing me wrong with an example. But what you say is
(a) an argument helping my thesis by showing that you can program functional languages quite well by using mutable recourses
(b) the biggest users of what you write there are also the biggest enemies of the status quo you describe. Functional programmers really hate that computers are so unlike what they love.
> (a) an argument helping my thesis by showing that you can program functional languages quite well by using mutable recourses
You can implement a functional language with mutable resources, and you can implement a functional data structure like git with mutable resources. How does it help your thesis that git isn't a functional data structure?
> (b) the biggest users of what you write there are also the biggest enemies of the status quo you describe. Functional programmers really hate that computers are so unlike what they love.
Whatever, something that is purely functional all the way down to the wires is impossible.
Any definition of "functional" the rejects every single functional language is an objectively failed definition. A usable definition will go ahead and permit the git data structure to be called functional.
> (a) an argument helping my thesis by showing that you can program functional languages quite well by using mutable recourses
Let me preface this by revealing that I'm neither a functional programmer, nor am I convinced that functional programming or immutability is any sort of silver bullet.
Your original thesis was that "a data structure cannot be functional," which you later defended by saying "git is only immutable on the meta level, its implementation uses mutable files", which, while being 100% true, is also true of the category of programming languages we call "functional" as they rely on mutable RAM. That is what the GP has been trying to explain to you, but your obvious hate for anything labelled "functional" makes you blind to all this.
Sorry mate, you just have to concede the point here. This is not a discussion about whether functional programming is any good, which is what you seem to be treating it as.
I think the author is using the term "purely functional data structure" to mean a data structure that lends itself to an implementation in a purely functional language [1,2].
You can disagree if you want. I just don't want other readers to be misled into thinking that "purely functional data structure" isn't a term of art. Given the number of references to Okasaki's book in this thread, I feel like the interested reader is sufficiently armed to learn more that I don't feel the need to continue this discussion.
You may not find the definition useful, and that's alright. I won't try to convince you.
The Best, when it comes to data structures is a Directed, Acyclic Graph. For instance your typical linux filesystem is a DAG. But there's one problem with DAGs: When they reach a certain complexity human brains are not fit enough to parse them anymore. (programs still can though)
So in many circumstances at least a human programmer needs to take a look at the state of your program and make assumptions about its correctness, which is called debugging. And that's why in Good programs we often use Good data structures instead of The Best.
Good data structures are key->value stores (which you may know as "hash tables" or "dictionaries"), trees, and trees in a simplified special form: lists, each of them being somewhat able to represent the other two, if one can accept a performance hit and/or increased complexity in source code. Dictionaries, trees, lists. That's it. And you do that in every programming language that is at least a little bit interested in being Good.
So there's nothing special or functional about git's data structures, it's just normal Good programming, and a few programmers who are so good at programming that they don't even need to mention it anymore, they breath good programms.
Then of course to the normal bread-earning coder good programs are a rare sight. But the reason is not that they are really rare, the reason is that successful business doesn't really require Good programs to succeed. Mediocre programs are good enough to earn their rent, and most of us spend most of our coding hours to earn our rent.
All that being said, if you don't just want to make money, go and spend some time studying git internals. It will teach you a lot more than most of your teachers/professors taught you combined. Sadly the source code is written by Linux gurus, who like to encrypt their source code with a very special key that only people from their tribe can understand. But the Git Book is actually good enough that you can study quite a lot of the internals from that book. I also suggest writing your own git in your favorite programming language once, to really understand it.