Hacker News new | past | comments | ask | show | jobs | submit login
Principles of TLA+ (pron.github.io)
121 points by pramodbiligiri on May 28, 2017 | hide | past | favorite | 23 comments



Leslie Lamport has a video course[0] on TLA+ that's slowly being released. I created an RSS feed[1] that updates as new episods are released.

[0] https://lamport.azurewebsites.net/video/videos.html

[1] http://fetchrss.com/rss/58d198538a93f8ef3b8b4567741573560.at...


First of all, as always, huge respect for your work, and I'm looking forward to reading the rest of the posts. There is something I'd like to start a discussion on, though:

> nor will reading these posts teach you how to write good specifications; so this series is neither necessary nor sufficient for putting TLA+ to good use, but I think it is both necessary and sufficient for understanding it.

Emphasis on the 'good specifications' part: I think this is a _huge_ oversight in the TLA+ community and the biggest barrier to its current adoption. The problem isn't that people don't understand (or can't understand) the theory behind it, yet it's the thing everybody focuses on teaching. The problem is specification: _nobody_ is talking about how to actually use TLA+. In fact, some of the advice given is actively harmful!

One example: in _Specifying Systems_, Lamport defines a memory cache with the operator `NoVal == CHOOSE v : v \notin Val`. He explicitly calls this best practice, because adding a `NOVAL` constant would be making the model more complex. Except unbounded CHOOSE is not checkable by TLC! While `NoVal` is nice for abstract specifications, it breaks things in the 99% of cases where you're actually using TLA+.

I find it telling that the majority of example TLA+ specs "in the wild" are either abstract algorithms or optimization problems. The things you don't find? How to write good invariants. When you should be using raw TLA+ versus the PlusCal syntax. Example models. Optimization your model checking. The stuff that might get people to start using TLA+ and _keep_ using it.

Again, none of these are criticisms of your work or your writing. I just wanted to openly say that if your goal is to get people interested in the theory of TLA+, it's great, but if it is to get people interested in use of it, a lack of good theoretical introductions isn't the main barrier.


> The problem is specification: _nobody_ is talking about how to actually use TLA+.

I think that is Lamport's focus; he talks about this more than he talks about the theory. Plus, you have the Dr. TLA+ series.

> Lamport defines a memory cache with the operator `NoVal == CHOOSE v : v \notin Val`. He explicitly calls this best practice, because adding a `NOVAL` constant would be making the model more complex. Except unbounded CHOOSE is not checkable by TLC! While `NoVal` is nice for abstract specifications, it breaks things in the 99% of cases where you're actually using TLA+.

You definitely should write `NoVal == CHOOSE v : v \notin Val` in your spec (as that clarifies the intent) and override it as a "model value" (i.e., a constant) in TLC. This is best practice. Even `∃ x ∈ Nat` is not checkable by TLC, yet best practice is to leave it like that (as that is the intent), and override `Nat` in TLC. In either case, the more general specification is verifiable in TLAPS.

> Example models. Optimization your model checking. The stuff that might get people to start using TLA+ and _keep_ using it.

This kind of discussion is very common on the mailing list (there are a lot of hidden gems there), but I would be very happy to see more people write "pro tips" and publishing them.

> Again, none of these are criticisms of your work or your writing. I just wanted to openly say that if your goal is to get people interested in the theory of TLA+, it's great, but if it is to get people interested in use of it, a lack of good theoretical introductions isn't the main barrier.

My personal goal in this particular series is just to get people interested in the theory. But TLA+ is still extremely young when it comes to "use in the wild". It was virtually nonexistent before 2012, and started slowly gaining traction only around 2014, when Amazon published their report. As you can see, writing short and to the point tips, tricks and guides isn't exactly my forté, but they are extremely important. As this tiny community grows, there will hopefully be those that will fill in the missing gaps. Your own tutorial is a great start.


This is precisely the kind of discussion I want to see on the mailing list or /r/tlaplus. HN is not very conducive to internal community discussions.


Very nice. I really appreciate you placing TLA+ in context with alternate approaches. I look forward to seeing the followup posts.

PS: How are you generating these pages? Jekyll? something else?

EDIT: This probably needs editing -- "nor will reading the posts will not teach you how to write good specifications;"

I think you probably meant to write "nor will reading the posts teach you how to write good specifications;"


Yep, Jekyll. And I've fixed the mistake. Thank you!


@ pron

This is a great write-up I learn quite a bit from but way, way, too long. There's a lot of redundancy in here much like my first sentence. You should try to condense this since the redundancy might make a chunk of the modern crowd think the rest of what they see in the scrollbar will be the same and move on to next article. The last write-up you sent me didn't have that problem.

I'm glad I saw this, too, as I recently watched your video citing all the mathematical evidence that correctness may not decompose. I got one or two counterpoints to email you later but overall want your data peer reviewed by people highly experienced in abstractions in large, verification efforts and making VCG's given their output will have to compose, too. Let all the pro's duke it out with arguments, proofs, and attempts at actual projects. I'll reserve judgment until the dust settles. :)


You are absolutely right about the repetition (and part 1 is my least favorite installment in the series), but reducing it in size took too much time. If you have any editing suggestions, please email me.

As to the the "mathematical evidence", it is a published, peer reviewed, 2005 proof, and a rather simple one. Practical experience has little to do with it, because we need to separate two things: the computational complexity of the worst case -- which is provably exponential not only in the number of components but also in the size of the state space of each component -- and cases that arise in practice. This is just like the halting theorem, that states the worst-case undecidability, but does not claim that every instance is undecidable. However, the interesting question is how far are commonly encountered cases from the worst case. When it comes to symbolic model checking (i.e., taking advantage of syntactic structure), even though we would have thought the problem is essentially easier (in the worst case), it is provably just as hard, and moreover, we have years of experience (cited in my post) that common cases are close to the worst-case. When it comes to "decomposability of correctness", I cited no empirical evidence. Nevertheless, it should be noted that no large or even medium-sized software has ever been fully verified, so I don't think anyone believes the problem is easy.


" it is a published, peer reviewed, 2005 proof, and a rather simple one."

Proofs can be wrong or irrelevant in practice. An example is not being able to prove the tetmination of executing arbitrary programs. Sure I can: it's called a watch-dog timer. So, Im wanting it peer reviewed. Plus, it's critical to know for sure that and correctness composition to determine what to invest in for software assurance R&D.

"Nevertheless, it should be noted that no large or even medium-sized software has ever been fully verified, so I don't think anyone believes the problem is easy."

Never easy. :) However, I still think we should consider counting the deep verifications. You keep saying they havent went upward to verify a program of certain size as if it means something. I know only one that tried (Verisoft for Hyper-V). Most went downward toward the gate level including Verisoft where Microsoft's was just a side item. Verisoft did app correctness to some degree, down to C code, its compiler, the OS, the microkernel, and CPU tying it all together. DeepSpec just tied piles of high-level and low-level code together. So, such methods should allow us to achieve the same amount of complexity going upward using same abstraction and shortcut techniques. We havent done that much upward because they just been working the other way.

Now, what they'll ultimately achieve may or may not be that good. Best approach might be to throw DeepSpec's DSL's and tooling at a large, complex application whose code is FOSS and accessible. I suggest an application server with compression, encryption, caching, and error handling as a good way to stress that project from multiple angles.


> An example is not being able to prove the termination of executing arbitrary programs

The theorem states that you cannot prove a program's termination any faster than running it. You're not violating any theorems.

> So, Im wanting it peer reviewed.

But it has been peer reviewed. Here is the publication http://www.sciencedirect.com/science/article/pii/S0022000005... in this journal https://en.wikipedia.org/wiki/Journal_of_Computer_and_System...

> So, such methods should allow us to achieve the same amount of complexity going upward using same abstraction and shortcut techniques.

First, I think you're misinterpreting (and exaggerating) what has actually been achieved. Looking at Hyper-V[1], 30KLOC of (preexisting and tested) C code have been verified in a year (out of a total of just 100KLOC), but only function-by-function, rather than global correctness properties. They wrote some global invariants, but didn't verify them. As I understand, such similar verification of various layers has been done, but no global end-to-end work. Whether low-level or high-level, there has been no composition across abstraction layers.

Second, such methods would not allow us to go "upward", because there's a reason why people concentrate on the lower level. The state space is smaller there.

Finally, I think you're also misinterpreting DeepSpec's work. It is at the forefront of verification of a certain kind, but if you look at their papers, when they talk about end-to-end verification, they're discussing programs that are 3KLOC C. These programs are smaller than many JavaScript libraries. Also, Coq is not really a tool engineers can put to everyday use. This is what Simon Pyton-Jones, one of Haskell's designers, has to say about Coq and Isabelle: https://youtu.be/re96UgMk6GQ?t=50m56s

Formal methods are starting to work, but we are roughly in the same stage as aviation in WWI, while you are talking about "using the same methods" to get a manned craft to Mars. We may well get there, but it will take a while, and the methods will not be exactly the same.

[1]: http://d3s.mff.cuni.cz/research/seminar/download/2010-02-23-... and https://link.springer.com/chapter/10.1007/978-3-642-05089-3_...


"The theorem states that you cannot prove a program's termination any faster than running it. You're not violating any theorems."

Someone explained it poorly to me then. Sounds like I'm still countering it, though, since my program that terminates it can move faster than the program it's terminating. I can have an infinite loop load and begin with a downward counter interrupting it one or more cycles after it begins. I'm still using two programs and context change to mitigate the artificial problem posed by the theorem using just one program looking at another.

"But it has been peer reviewed. "

How many people looked at it? For how long? And what diversity of backgrounds? Peer review of something critical to me doesn't mean people that look at theorems look at it and say "Good one. Done now." It's a social process that should involve mathematically inclined, people who do applied math in that field, and non-mathematical people in same ___domain with a track record of breaking proofs in practice. On the latter, there's been a number of breaks in cryptography because the oversimplified, abstract, elegant proof missed key details. Peter Guttman also had a nice rebuttal of reliability of formal methods showing all the times formal specs and proofs were incorrect with it taking years or so for people to spot the errors or disprove it by executing a program.

So, I have no doubt it looks good to mathematicians. I just want more effort on it from aforementioned kinds of people before I rely on it. Preferably a few man-years of people trying to solidify or counter it.

"rather than global correctness properties. They wrote some global invariants, but didn't verify them."

See, this is where being a specialist can help. I'm a bit unclear here. Can you get implicit, global correctness if you decomposed your app into pieces who are each locally correct? And interface conditions of function calls are correct? Or is that inadequate to you? The literature was unclear to me on this point.

Note: Three functions per day? Maybe we need to apply these tools to deeply-embedded 8-bitters with bytes of RAM or ROM. Then, a programmer doing full-verified work will feel really productive after "filling up the hardware" in a month or two. First time in verification history for someone to feel that productive. ;)

"It is at the forefront of verification of a certain kind, but if you look at their papers, when they talk about end-to-end verification, they're discussing programs that are 3KLOC C. "

Oh yeah, they're small. The reason I bring them up is that they're at the forefront of composing pieces. I figured you might look at what they're doing on absraction and integration of proof to get a better idea of the composing correctness aspect of your research. Then others and I might learn from your observations.

"This is what Simon Pyton-Jones, one of Haskell's designers, has to say about Coq and Isabelle"

You're just not the guy to have at a verification party, Ron. You stay trying to depress us. :P I appreciate the link, though, since that's guy is a hilarious speaker. He says they start learning Coq, disappear for two years, get all their pain receptors destroyed, and emerge (paraphrased) as zealots bringing the Gospel of Coq to all programs. Lmao. I've seen this disappearance or pain process. It's why I'm still ignorant of the details due to not digging into the full, learning process. Life is already rough enough right now haha.

"but we are roughly in the same stage as aviation in WWI, while you are talking about "using the same methods" to get a manned craft to Mars."

Nah, I'm talking about doing a few, key components of the craft with formal methods with the rest build with lightweight, formal specifications, model-checking, test generation, and esp human review. Recovery procedures simple and verified formally if possible. That's my assurance recommendations for your Mars mission. I do like your WWI aviation comparison. It also fits with my style of figuring out just how much I can cheat around the limitations to kick ass. Within your metaphor, the "Biafran Babies" is something that could've been done with WW1 avaiation.

http://www.cracked.com/article_19472_the-5-craziest-soldiers...


> Sounds like I'm still countering it, though

If you think you could so easily circumvent mathematical theorems, why are you such a believer in formal methods? I don't understand exactly what your method is, but I can assure you you're not circumventing the halting theorem. There's a very precise definition of what it means to be faster. The halting theorem poses no artificial constraints on the problem; it's an extremely general theorem, and it certainly doesn't say how the decision must be made. You're perfectly allowed to serve as an OS and simulate the input program if you like, but that's not going to help.

> How many people looked at it? For how long? And what diversity of backgrounds?

The proof is very simple, and very similar to the proof that a system of cellular automata is Turing-complete: it points out that any nondeterministic Turing machine with a tape of length k cells can be simulated by k finite state machines. If it were possible to verify each component separately and combine the result efficiently, then any such TM could be verified, yet it is well known that it cannot be. The paper was written by well-respected academics, published in a respected journal, and has over twenty citations.

Now, the reason our response should be, "so verification is as hard as we thought", rather than "this is the end of the line", is because, like all results in computational complexity, it talks about the general case. It is possible that a large enough class of programs that are of interest are special cases that are easier to solve.

> Peter Guttman also had a nice rebuttal of reliability of formal methods showing all the times formal specs and proofs were incorrect with it taking years or so for people to spot the errors or disprove it by executing a program.

I don't know what you're referring to, but I'm not aware of any cases where mechanically checked proofs or model checker results have ever been found wrong. The spec can very well be wrong, but you're the one arguing the infallibility of formal methods.

> Preferably a few man-years of people trying to solidify or counter it.

No one is going to try and counter a mathematical theorem, with a very simple, well-understood proof.

> Can you get implicit, global correctness if you decomposed your app into pieces who are each locally correct?

I'm not sure what you mean by "implicit". A formal spec requires a formal proposition to check. Things can be true without us proving them, but formal verification is about verification, not about truth. Theoretically, programs could be correct without verifying or testing them in any way. As to building up a global correctness property end-to-end (from very low-level code and up) is possible by very hard, and no one has ever been able to do it for even medium-sized programs. Obviously, working in a modular way is how people do it; that's how both math and software works. Yet, it's very, very hard.

> Then others and I might learn from your observations.

I think you overestimate my abilities. I'm not an academic. Just a programmer with some experience using formal methods.

> You stay trying to depress us.

This is where you're wrong. Formal methods -- like aviation -- are awesome! But let's enjoy them for what they are, not what we imagine them to be. If you see someone speaking of formal verification as if it were a solved problem, and writing fully correct programs is just a matter of learning Idris, you can know one thing for certain: they've never actually used formal methods on non-trivial real-world programs. The FM experts are those lowering expectations the most. Watch some talks by Xavier Leroy.

> formal specifications, model-checking, test generation

All of these are formal methods (the last only if there's a formal contract).

> http://www.cracked.com/...

:)


> I recently watched your video citing all the mathematical evidence that correctness may not decompose

For myself and others, are you referring to the video here: https://pron.github.io/posts/correctness-and-complexity ?


Yeah, that was it.


Great article, and I look forward to the next! I've been meaning to learn about TLA+ for a while.

One thought I didn't see expressed, in all the comparisons of tools and approaches, is that sometimes you don't care about time and state. In purely "transformational" programs, like compilers, for example, all state is incidental complexity, so we can analyze them as pure functions. Meanwhile, all the questions such as whether we can just execute the spec and whether we can prove properties of real programs remain. New web frameworks are trying to keep the parts that deal with time and state as small and self-contained as possible, leaving the rest to be described using functions.

Couldn't we say that the factor of "do you care about time" is behind observations such as that TLA+ is mostly popular with concurrent and distributed systems, or that approaches that only look at function contracts often apply to pieces of a program but not the whole? As a more specific statement than, for example, that distributed systems are particularly hard so they necessitate particularly powerful tools.


> all state is incidental complexity

This is an example of the "Whorfian syndrome" of functional programming. Program state is not the same as the problem of "global mutable state" often cited as a cause of programming bugs. When it comes to program analysis, I see no reason why reasoning about state would be any different, let alone more complicated, than reasoning "trasnformationally". It's exactly like the difference between:

   h(g(f(x)))
and

    let a = f(x)
        b = g(a)
        c = h(b)
Reasoning about the two is exactly the same, and the program state in both cases is also identical: you have three states.

I also disagree that there is no interest in time or space in purely trasformative systems. First, A trasnformative algorithm that is, say, doubly exponential in time or space is just as wrong as a database that fails to give an answer. Second -- and more importantly -- reasoning about a recursive algorithm is exactly like reasoning about state. There is no way to show that Quicksort works without reasoning about state. You may not mention it explicitly in your chosen formalism or you may call it by a different name, but it's there, and you're taking it into consideration.

Also, TLA+ is a universal formalism, and, as you'll see in part 2, there's nothing stopping you from representing everything as functions without any passage of time. The reasoning is identical -- there is no added complexity at all, incidental or otherwise -- but may be a matter of personal aesthetic preference. Because TLA+ is a mathematical notation, substitution and "equational reasoning" works just as well even though time can be expressed.

I would say that the main difference between TLA+ and the functional approaches, is that in TLA+ you reason purely mathematically, while in the functional approaches you reason within a (complex) programming language.

It's very hard to talk about popularity, because, as far as I know, there are no "functional" tools that are used in the industry at all (certainly not without help from academics). Part of the reason may be due to the fact that so far, software companies that require verification usually write reactive realtime software rather than compilers, so they use tools like B-Method, SCADE, SPIN and TLA+. Another reason is that with the exception of the very good, but rather limited Why3, there are simply no industrial-strength verification tools out there aimed at industrial use rather than research.


I mean "state" in the sense that we can distinguish time-dependent systems with internal state or memory from time-independent systems, whose outputs are a pure function of their inputs. Combining pure functions to create pure functions does not introduce any state, in this sense.

An entire program may have no observable state, like a command-line tool that reads stdin and writes stdout with no particular timing. It does have state inside it, of course, but only because it's running on a giant state machine. We can fully describe its behavior without making reference to any internal states, using functions or relations to talk about the relationship between its input and its output.

> I would say that the main difference between TLA+ and the functional approaches, is that in TLA+ you reason purely mathematically, while in the functional approaches you reason within a (complex) programming language.

This statement puzzles me, because we have great math for reasoning about pure functions (mappings). We have lambda calculus and category theory. You don't need a state machine to prove QuickSort works in a purely mathematical way, either, just statements about the equivalence of functions. The recursion can be handled using the fixed point operator and the algebraic laws it obeys.

Edit: In fairness, existing "functional approaches" to specification do seem complex and look like programming languages.

I came across an old comment of yours (https://news.ycombinator.com/item?id=11835178) that seemed to support the idea that FP and TLA+ have strengths for different kinds of programs, depending on whether they are more transformative or more reactive. I'm just as happy if learning more about TLA+ turns my idea of this on its head, though!


> Combining pure functions to create pure functions does not introduce any state, in this sense.

Yes, but that's not the same as state in the TLA+ sense. You cannot have computation without state in the TLA+ sense, whether you call your functions pure or not. In TLA+, an algorithm that "purely" computes quicksort and one that outputs an interim result after each recursive step is the same algorithm. TLA+ doesn't talk about visible state, but state in the most fundamental way, without which there is no computation whatsoever.

In part 3 I ask whether Euclid's algorithm or Quicksort are imperative or pure-functional. There is no way to answer that question, as their descriptions are at an abstraction level above where this distinction can be made. TLA+ can work at that level. You just can't look at Euclid's algorithm (or quicksort) in TLA+, and tell whether it's imperative or functional. It's an abstract mathematical description.

> We have lambda calculus

Lambda calculus is a state machine (or, more precisely, it is naturally described as a state machine).

> You don't need a state machine to prove QuickSort works in a purely mathematical way, either, just statements about the equivalence of functions.

What you're calling state and what TLA+ calls state is not the same thing. Whichever way you prove Quicksort works is the way you'd do it in TLA+ with state. You just don't call it state. I would, however, be very much interested to see how you prove Quicksort works using category theory. AFAIK, all approaches to verify QS work the same way, using an inductive invariant, namely an invariant on state.

> that seemed to support the idea that FP and TLA+ have strengths for different kinds of programs

I was talking about synchronous programming languages inspired by TLA+, and I agree. I would rather write a compiler in a pure-functional language than in a synchronous one, and an interactive program in a synchronous language than a pure-functional one. But that is a separate discussion from that about mathematical reasoning. I would agree that if one day there would be a FP-based verification tool that is aimed at engineers rather than researchers, then for ergonomics/syntax reasons, it's possible that it would be more convenient for reasoning about compilers. But there's nothing in the theory of TLA+ that makes it less convenient.

I do, however, admit that there are tradeoffs between working in "pure math" as in TLA+, or within a programming language. There are often patterns that are specific to certain programming styles, and reasoning within a similar formalism may reduce effort. In other words, sometimes you may want to reason about a specific expression of an algorithm rather than about the algorithm in its pure abstract sense, and a formalism that shares the same expression style may make it easier. But I don't know how much of this is purely hypothetical. Most algorithms that are interesting or subtle, are essentially interesting or subtle, and verifying them in all formalisms is very similar (and usually involves the same proof technique: inductive invariants).


Thanks for bearing with me, and responding, even if it might be more efficient for me to wait and read the next parts.

It seems like if the quicksort of a list L is defined as equal to the concatenation of 1) the quicksort of the subsequence of L consisting of items less than the pivot item, 2) (the list containing) the pivot item, and 3) the quicksort of the subsequence of L consisting of items greater than the pivot, then we can prove properties like "any two items of the resulting list are in sorted order" by case analysis on whether each item is part of (1), (2), or (3).

Maybe you would interpret "the quicksort of L is defined as..." to mean "if you're in state quicksort(L), you can proceed to state the-concatenation-of"? If we're working in the realm of mathematics, don't we get definitions, equations, and inductive proofs for free, without making a state machine do it?

Edit: Reading more, I guess I'm just doing the sorting inside the "next-state relation" of a state machine, so it's done in one state transition, and saying you could look at quicksort as one way to factor the relation, which by math is equivalent to other ways to factor the relation that resemble other sorting algorithms.


> Reading more, I guess I'm just doing the sorting inside the "next-state relation" of a state machine, so it's done in one state transition

While you could describe all of quicksort as a function in TLA+ (taking place instantaneously), that's no what you've done. You described a single state transition, but one that doesn't capture all of quicksort, just one partitioning step. To show that the result is sorted, you'd need to do an induction over such concatenations; the thing you're doing the induction over is called state.


writeup aside, this is a really nice template... similar to tufte margins. I'd love to see the source/style for the site...


It's a combination of the tufte and monochrome Jekyll templates.


There are interesting pieces of information, but its very long




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: