Hacker News new | past | comments | ask | show | jobs | submit login
Make formal verification and provably correct software practical and mainstream (github.com/magmide)
204 points by peanutcrisis on May 28, 2022 | hide | past | favorite | 191 comments



I really want to like this, but it really comes across as more of a wishful thinking project without a lot of experience or intuition about how to solve the very real problems that formal methods run into in this ___domain. Like, the design goals literally include "verify any program" [1], which is almost certainly impossible.

Important questions like how you implement the design pillars without running smack into the issue of decidability seem entirely ignored. They have a whole section on how "this idea exists in an incentive no man's land" without seemingly being aware of the rich history of formal methods in low level programming, from Ada through Java through formal C through Rust itself. The common issues these encountered like decidability, holes in the formal model (which contributed to the downfall of the Java sandbox as a security boundary), and the combinatorial explosion inherent in verification tools are all huge looming questions that should at least be mentioned.

Maybe I'm being overly critical here and partial improvements are still improvements, but it'd be nice to see more moderate claims from authors tackling such an ambitious project.

[1] https://github.com/magmide/magmide/blob/main/posts/design-of...


I agree it sounds like pretty naive enthusiasm. Not just about how hard the formal verification problem is, but about how hard it is to get ANY kind of programming system up to the point where it's actually usable.

However, every time somebody starts going negative on formal verification by talking about decidability, I get itchy. Sure, we know, all interesting properties of programs in all interesting languages are undecidable in general. Rice's theorem, blah, blah, blah. For any given property and any given language, I can show you a program in that language such that you cannot determine whether that program has that property.

... but it turns out that most actually INTERESTING PROGRAMS aren't like that. When you really write code to get something done, you you generally have a reason to believe that code is going to work. That means that the code is least approximately correct by construction, and correct in a way that's comprehensible and locatable. Often the machine will be able to find a proof that it's correct. If not, you'll often be able to provide hints. And if neither you nor your computer can find a proof, that probably means you don't understand the code enough to want to use it to begin with.

Oh, and on computational cost of verification, you can get some relief by passing around precomputed proofs along with the source code, so they only have to be checked on installation, or incrementally recomputed when the code is modified, rather than being rebuilt from scratch every time.


A lot of people take undecidability to mean “no program can be proven to terminate” when in reality it means “there exist programs which are impossible to prove termination,” and like you said most of the useful programs we write can be shown to terminate just fine.


We have general methods for constructing programs that have some desired property, such as termination.

We have no general methods for finding whether or not some arbitrary program has some desired property, such as termination.


I agree in general and also take issue with the "we can never prove interesting things about our programs" statements that are sometimes uttered, but it is surprisingly easy to write certain programs that nobody to this date knows whether they terminate, such as: "find the least even number > 2 that cannot be written as the sum of two primes". :)


You forget about Rice’s theorem. Termination is not that exciting, but the existence of race conditions and a million other properties are - and those are not possible to prove true in general.


No, docandrew is correct. You are incorrectly applying Rice's theorem.

Rice's theorem states that those properties can't be automatically decided in general. But that's irrelevant to this discussion, because this "magmide" tool doesn't claim to do that. It merely checks proofs that have already been found (e.g., by a human), which is trivially decidable.


I disagree that most interesting programs don't need things that are Hard to Verify. For instance, approximately every C program would benefit from correct pointer analysis, yet this remains an open research question.

For a personal example, I had a recent lunch interaction with our security team about some work I wanted to do formally verifying bits of a bootloader we have. Halfway through the conversation, one of them pulls up a report they made about how susceptible our specific CPU is to power-glitching. This is something the system design ought to be resistant to, so now my "straightforward idea" has a bunch of thesis-sized holes in it unless I either cut that out of the proof (what will probably happen) or I can make it someone else's problem.

Formal methods are just hard, and open research questions lurk in every slightly dark corner.


> approximately every C program would benefit from correct pointer analysis, yet this remains an open research question.

To me this falls under "you don't understand the code enough to want to use it". Which in case of unsafe pointers is a well established conclusion. In the C-land even well vetted security-related code keeps on delivering pointer bugs.


Pointer analysis is not just about lifetimes. It asks questions like “can these two names alias?, must these two names alias?, what objects can this name point to?”

This is considerably harder than ownership analysis.


Ok, which of those programs can _only_ be implemented via aliasable pointers?

I don't think anyone is arguing that C's loose memory model and rampant undefined behavior are amenable to verification. The point is that neither of those are necessary in many of the places they've been used.


Passing an object by reference to a function creates an alias. Storing a pointer as a field creates an alias. It would be very odd to have a program with zero aliasing relationships. Do do this you would need to never assign from a pointer type.

This has nothing to do with C (mostly - you havoc more frequently when analyzing C if you want to be sound). Pointer Analysis papers usually target Java and in that space it is still stunningly difficult.


Sure, but how many of those C programs are inherently hard to verify vs how many of them are hard because they happen to be built with tools that make them hard to verify?


I don't think those are fully separable concepts in legacy, spaghetti code.

Regardless, I agree that if you're willing to architect your system around the need to formally verify it, the problem is somewhat tractable and getting more so every year as tools improve. Non-experts still might need the occasional "justifiable shortcut" instead of proving everything, but it's doable.


Sure, I don't think anyone is realistically expecting to verify all existing legacy spaghetti code -- _that's_ an interesting research direction, at best.

The point upthread (at least the point I took) is not that existing code doesn't do things to make it difficult to verify, but rather there's no reason inherent to the desired behavior of those programs that it _needs_ to be difficult to verify them. Nor is there good reason going forward to not lean towards tools that make verification more tractable for anything new we write.


> However, every time somebody starts going negative on formal verification by talking about decidability, I get itchy. Sure, we know, all interesting properties of programs in all interesting languages are undecidable in general. Rice's theorem, blah, blah, blah. For any given property and any given language, I can show you a program in that language such that you cannot determine whether that program has that property.

Decidability has nothing to do with this kind of formal verification. The tool doesn't have to decide the correctness of a program. The tool merely needs to check the validity of a proof of the program's correctness, which is rather trivial. Coming up with the proof is the hard part, and that responsibility still falls mostly on humans.


It is wishful thinking. Iris is cool and all but extracting useful programs from proofs is still an open research area. I don’t want to discourage someone from trying but it would help to be pragmatic.

At least the author seems somewhat aware of this when they describe the repo as a collection of earnings and ravings.

I’ve been interested in this since 2015 or so and even then the cost of doing formal verification had come down a lot. It has continued the downward trend. But we are still a long way from a language that industry programmers can use to write proofs and extract programs from.

One of the major challenges will be teaching industry programmers how to write proofs and convincing them that they want to do this.


> One of the major challenges will be teaching industry programmers how to write proofs and convincing them that they want to do this.

"Industry programmers" are pretty far from being a monolith. There are a myriad of different approaches to testing, many of which are ___domain specific, for all kinds of reasons.

Usually the way that a new testing tool gets adapted is that individual programmers or companies that adapt the tool appear to have a competitive advantage over those that don't. The results generally speak for themselves, at least after a time.

Practically everybody that might use formal verification has already chosen to not go to certain lengths to increase reliability. They're already making an economic trade-off -- even with safety critical systems, or in domains where the cost of bugs is extraordinarily high. Why shouldn't formal verification be assessed in the same way?


Because many of the costs of their irresponsible practices are externalized.


> Because many of the costs of their irresponsible practices are externalized.

I'm sure that that's true in at least some important cases. I don't see what it has to do with formal verification.


To be honest, extracting Rust programs from Iris would be the easy part here. The hard part is, well... using Iris. If you've used it much you know what I mean.


Isn't this what matlab and other tools like it try to deliver to?

Most programs I've seen in the industry use matlab and simulink for formal verification (of course, still ignoring the hardware design part) and I'm asking myself how many dependencies the verification should have.

I mean, even a simple Hello World program could escalate really quickly once you have to deal with TTY quirks or say, %X as a printf parameter.

The issue I see with matlab's approach is that it has a huge set of assumptions in its own compiler chain, which I guess is the core problem of all ABI compatible outputs.


Not at all. See CompCert, seL4, F* (F-star) and Coq for examples of how it is done.


It gives me the same feeling as crypto-contracts. Someone with a very narrow range of low-level experience arm-chairs their way into a solution for a problem. That problem isn't even the hard problem for the ___domain, but they've thought a lot about it in a vacuum.

What is the "verifiability" of an e-commerce funnel? Of a recommendation engine? Of a national identity card system?

It doesn't take a lot of real-world experience to recognize that the specification process is generally more error-prone and imprecise than the coding process, but it does take some.


To follow on this sentiment at a slight tangent, I am happy for the enthusiastic attempts from all quarters but folks seem to misunderstand that incremental progress in academia is often due to the problems being very hard. Formal verification in the presence of weak memory models is an ongoing problem that only recently has seemed tractable thanks to the hard work of the folks at MPI etc over more than a decade of publications and iterating on logics like Iris


Could you elaborate / share resources on "downfall of the Java sandbox as a security boundary"? Sounds interesting.


This all dates from the old days of Java 1.x and applets. The original idea was that the JVM could perfectly sandbox applications from hardware resources, so the smart people at Sun came up a very forward thinking architecture: Untrusted code could be cryptographically signed by a root of trust. Then, bytecode would be formally verified to prove the absence of certain kinds of bad behavior. Then JVM would take care of various low level details like memory management. Finally, for any high level behavior there was something called the Security Manager that allowed 'fine grained' permissions for so-called dangerous APIs. All of this was the sandbox.

What happened in practice:

* The bytecode verifier ended up being accidentally quadratic, leading to straightforward DoS attacks

* The verifier kept having bugs, some of which stood for years

* It was impossible to enforce resource usage policies like "don't use all of my CPU". Since cryptocurrency is now a thing, this problem has become somewhat more severe than it was 20 years ago.

* The security model was useless in practice, because unverified or native code broke the entire thing. This remains an research open question under most conditions.

* Oh god the bugs. They were endless and the practical result was that the sandbox was never secure. [1] recapped 20 years of exploits, if you're curious about examples.

By the tail end of the system's life, even the public was completely fed up with it and mainstream media was publishing articles like [2]. Eventually even Oracle gave up and just deprecated both Applets and the SecurityManager system entirely [3].

[1] http://phrack.org/issues/70/7.html#article

[2] https://www.reuters.com/article/oracle-security/oracle-fixes...

[3] https://openjdk.java.net/jeps/411


Not the GP, but the Java Sandbox is now deprecated.

There is an already closed deprecation ticket at openjdk.java.net.[1] For a blog post at inside.java see here.[2] Both links have some explanatory information.

[1] https://openjdk.java.net/jeps/411

[2] https://inside.java/2021/04/23/security-and-sandboxing-post-...


"And all existing proof languages are hopelessly mired in the obtuse and unapproachable fog of research debt created by the culture of academia."

Yes. As I wrote 40 years ago:

"There has been a certain mystique associated with verification. Verification is often viewed as either an academic curiosity or as a subject incomprehensible by mere programmers. It is neither. Verification is not easy, but then, neither is writing reliable computer programs. More than anything else, verifying a program requires that one have a very clear understanding of the program’s desired behavior. It is not verification that is hard to understand; verification is fundamentally simple. It is really understanding programs that can be hard."[1]

What typically goes wrong is one or more of the following:

1) The verification statements are in a totally different syntax than the code.

2) The verification statements are in different files than the code.

3) The basic syntax and semantics of the verification statements are not checked by the regular compiler. In too many systems, it's a comment to the compiler.

4) The verification toolchain and compile toolchain are completely separate.

None of this is theory. It's all tooling and ergonomics.

Then, there's

5) Too much gratuitous abstraction. The old version was "everything is predicate calculus". The new version is "everything is a type" and "everything is functional".

[1] http://www.animats.com/papers/verifier/verifiermanual.pdf


I am sure we can do better than predicate logic or types, but the code needs to change also. Code should be more abstract and reusable. If possible, it should be a subset of the logic. There is no point in verifying the same stuff in JavaScript, Java, Rust, Swift, ...

Furthermore, something that looks simple might need a lot of abstraction to become provable. Something might be simple to write down in code, but in order to come up with a notion of correctness for it, and actually prove it, might require something significantly more complicated.

Infatuation with Hoare-Logic is part of the problem (that's what the paper you link uses, I think?). Thinking that verifying programs is easier than doing proper math on the computer is a dead end. First get mathematicians to actually like doing proofs with the help of a computer. THEN you might have a chance of tackling more practical applications like program verification without using an insane amount of resources.


Verification is largely a fool's errand[1]. The juicy opportunity is tooling that aids in constructing programs such that they necessarily have the desired properties. Predicate transformer semantics are up to the task. It's basically just applied lattice theory, which is a very well understood field of math and within the learning capacity of any competent programmer.

Edit: Automated verification does however attract a lot of research money as the latest in a long list of fads promising the bean counters that they can hire cheap idiot programmers instead of expensive smart ones. I don't mean to be dismissive though, the automatic verification research is genuinely interesting and they have accomplished impressive things, for example[1].

[1] https://www.semanticscholar.org/paper/Learning-Loop-Invarian...


Automated verification is not the latest in a long list of fads. First, it is not a fad, and second, it has been around for a long time (check for example [0], which dates from 1961).

I agree with you that constructing programs such that they necessarily have the desired properties is the way to go. But we will disagree in how to go about that. Predicate transformer semantics is just another name for Hoare-Logic, and is too narrowly focused on the program code itself. It is basically the opposite of correct by construction! Rather, I would like to see programs as natural outflows / consequences of verified mathematical theories.

[0] https://en.wikipedia.org/wiki/DPLL_algorithm


> Predicate transformer semantics is just another name for Hoare-Logic,

Sir Tony is certainly highly influential, but Hoare triples are a considerably more basic. Not to understate their importance, having a solid base to build on is essential.

> and is too narrowly focused on the program code itself. It is basically the opposite of correct by construction!

Please provide some explanation so that this doesn't read as an absurdity.


PTS and Hoare-logic are really the same, just differently formulated. At least according to Wikipedia: https://en.wikipedia.org/wiki/Predicate_transformer_semantic...

What I mean by my comment is that typically with Hoare-style verifications, you add invariants to the program AFTER it has been constructed. So in this sense it is not correct by construction, but rather verified after construction. Instead, under correct by construction I would understand an upfront proof about some abstract function, and then the concrete code generated for this function will be automatically correct.


> What I mean by my comment is that typically with Hoare-style verifications, you add invariants to the program AFTER it has been constructed.

Ah, perhaps some people misuse predicate transformers that way, I don’t know. I do know however that that’s not how their inventors used them or meant for them to be used. The basic idea is that you define the postcondition you’d like to establish before writing any code. Then you derive a chain of predicate transformers such that the final precondition is suitable. A common choice is the predicate that is true for all states, but for partial correctness another might be used. Since any given predicate transformer has a syntactic representation, the result is the program text. As you can see the result is correct by construction.

I recommend reading Dijkstra’s A Discipline of Programming for pragmatic examples of using predicate transformers to derive correct programs. If you want a more rigorously formal treatment of the same subject there’s Dijkstra and Scholten’s Predicate Transformers and Program Semantics.

Edit: Here[1] is a nontrivial example, although it assumes some familiarity and doesn’t spell everything out. When you get to the acknowledgments section it’s clear why.

[1] https://www.cs.utexas.edu/users/EWD/transcriptions/EWD07xx/E...


Yes, I will concede that of course PTS can be used in a "first the invariants, then the code" way, and so my distinction doesn't make much sense here.

Nevertheless, PTS is for verifying imperative programs, and if whenever possible, I prefer reasoning about more elegant and simple mathematical objects instead.


I find a lattice of functions from a predicate to another predicate to be about as simple as possible for describing the problem space. I’ve still got a lot to learn, but given that state is a reality for the computing automata we can actually build I like an abstraction that captures it directly.

I have found this to be an interesting exchange of ideas and I appreciate your taking the time to answer my queries. Thanks!


> First get mathematicians to actually like doing proofs with the help of a computer.

Many are using type theoretic and HOL theorem proves already. What threshold do we need to reach? Isn't Lean HoTT? Isn't that pretty much good enough (modulo UI and tooling)? I ask the latter question because that was the original promise of HoTT but I haven't kept up to date on Lean, so I'm asking.


> Isn't Lean HoTT?

No. Lean is good old fashioned Martin-Löf type theory. HoTT is that type theory + univalence + higher inductive types. Lean actually has proof irrelevance, which is incompatible with HoTT.

But the good news is you don't need HoTT to verify software. Type theory is already quite capable of it, despite what others in this thread would like you to believe.


> Type theory is already quite capable of it, despite what others in this thread would like you to believe.

I know but mathemeticians wanted HoTT.


The threshold is mathematicians opting to use these tools themselves for their work. There are not many mathematicians using them so far.

A nice example is the recent formalisation in Lean of some ideas by Peter Scholze: https://xenaproject.wordpress.com/2021/06/05/half-a-year-of-...

That's great stuff, which shows what can in principle be done with this technology. But why is a team necessary to formalise Scholze's ideas? Note that Scholze himself doesn't touch Lean. In my opinion, the threshold is reached when Scholze himself sits down DURING the development of his ideas to interact with the proof assistant and develop+verify his ideas.


Most mathematicians are doing work that is, frankly, formally unsound. There's a huge culture of hidden assumptions in most mathematical fields.

Not to mention that the syntax is literally unparseable. For example what does this mean?

  sin(x) + cos(x)
Most mathematicians would say it's the sum of the sine of x and the cosine of x. But it parses fine as the sum of the product of s, i, and n(x), and the product of c, o, and s(x). Obviously that kind of ambiguity is unacceptable in a formal procedure syntax like that used by programming languages. That's just a simple example, traditional math is full of this kind of thing.


> But it parses fine as the sum of the product of s, i, and n(x), and the product of c, o, and s(x).

No it doesn't because math lexes greedily (and is also context sensitive anyway.) 'sin' is a symbol just like 'x'. There's no ambiguity in your example.


Pretty much any mathematician, on seeing `xy`, would think it's `x*y`, and not a new symbol called "xy".


Thank you for illustrating the culture of assumptions I was referring to.


It's a weird example, but you definitely got a point. Of course, an ambiguity like in your example never causes a problem, because this text is parsed by humans, not by machines!


> The threshold is mathematicians opting to use these tools themselves for their work. There are not many mathematicians using them so far.

The kinds of theorems that we need to prove in software are much more elementary than what mathematicians are proving. I think software engineering can benefit from it long before mathematicians start doing cutting-edge math in it.


Not if you do software the right way. Try to prove correctness of a CAD program, for example. Or of a graphics card implementation. Or ...

Furthermore, I don't think cutting-edge math needs a much different approach from cutting-edge software. You need to be able to express your thoughts succinctly, and have the tools to reason about them.

It is often said that software verification is different because there is much more to verify, but on a more shallow level. I instead think software is just not done at the right level of abstraction.

Software is at the same time more and less than math. More, because in addition to understanding a topic, you also need an implementation, which has additional issues like speed and memory usage, battery life, etc. Less, because if you do a nice implementation, nobody is asking about its correctness, or how well you understood the topic in the first place. For software today, a nice implementation is much more important than a correctness proof.


> Try to prove correctness of a CAD program, for example. Or of a graphics card implementation. Or ...

You don't need to verify the entire program for formal verification to be useful. You can adopt it incrementally.

The most common bogus argument I hear against formal verification is that it's impractical to come up with a spec or proof for the entire program, so we might as well not even bother with formal verification at all.


Formal verification is just very costly and has diminishing returns. Let's take a CAD program. Which aspects of it would you formally verify? If you are going for the easy parts, those can already be dealt with nicely with static typing and testing, essentially push-button automated verification. If you are going for the interesting parts, you will be doing math, essentially.


> Let's take a CAD program. Which aspects of it would you formally verify?

Any large program will contain some smaller components with relatively well-defined behavior. CAD is not my specialty, so I can't really comment on what algorithms are used in that ___domain. Forgetting about fancy algorithms for a moment, just having a more expressive type system will allow you to express invariants in your code like the fact that array indices are within the relevant bounds, that you never try to pop an empty stack, etc.—everyday programming issues.

For a more concrete example, lately I've been using Coq to formally verify critical properties about a certain type of graph-like data structure I'm using in a system I'm building.

> If you are going for the easy parts, those can already be dealt with nicely with static typing and testing, essentially push-button automated verification.

Most engineers are already writing tests and using static types. Yet, we still have buggy programs.

And just to be clear, the kind of formal verification we're talking about is based on static typing. It's just a more expressive type system than what most programmers are used to.

> If you are going for the interesting parts, you will be doing math, essentially.

You are doing some form of math, but not the kind of cutting edge math that mathematicians do—which was my original point. You are not going to run into the kinds of tricky problems that mathematicians run into with theorem proving software, like universes being too small etc. Most data in software engineering is finite and reasoning about it involves little more than arithmetic and induction (which is just out of reach for mainstream type systems, but not for the kind of type systems used in proof assistants).


First, theorem proving is NOT the same as an advanced form of static typing. This is a misunderstanding mostly pushed by computer scientists. Instead of propositions as types, I advocate a more practical form of types, based on Abstraction Logic [0, 1].

Second, yes of course, you can carve out components and concentrate on those. If you can find opportunities for this, great! You will still have buggy programs in which you use those components, to copy your argument.

Third, data may be finite, but reasoning about it is often done better in an infinitary context. After all, x^2 + x - 3 is also a finite expression, and much easier to understand than most software. So what? You will find a lot of interesting mathematics done with polynomials, some of it cutting-edge. Saying your software doesn't need cutting-edge math is just limiting yourself and your software. Chances are you will be doing some new (=cutting-edge) math if you try to verify new things. And yes, I run into problems with universes all the time actually, because this is relevant for modular formalisations. It's best to just have a single mathematical universe!

[0] https://obua.com/publications/philosophy-of-abstraction-logi...

[1] https://obua.com/publications/practical-types/1/


> First, theorem proving is NOT the same as an advanced form of static typing.

This Hacker News post is about a theorem prover based on dependent types. That's the context for our discussion.

> You will still have buggy programs in which you use those components

No one is disagreeing with this claim. But eliminating some bugs is better than nothing, even if you don't eliminate all bugs. You and the other commenters repeating this strawman are doing a lot of harm to people trying to socialize their research.

> Chances are you will be doing some new (=cutting-edge) math if you try to verify new things.

Citation needed. Most software is not doing anything interesting at all from a mathematical perspective, just shuffling data around. But either way the point is moot—Martin-Löf type theory (which is what this "magmide" thing seems to be based on) can do arbitrarily fancy math if needed (which is rarely).

I've been verifying bits of software for about 10 years, and I've never needed to invent new math to do it (though I would be happy if I ever did!).


Well, if you created a new data structure not known before, and proved theorems about it, that's new math. If you copied a well-known data structure, and prove theorems about it, that's not new math.

What do you think mathematicians do? They just examine certain things rigorously and with utmost scrutiny. These things are simpler than things appearing in real-life. Software interfaces with real-life, so cutting-edge math is really a subset of what's needed for software. That is obvious to me, but I don't have a citation. You can cite me, if you want to.

Finally, dependent types as it is done today in Coq and Lean etc. is not nice enough a logic to attract mathematicians. The reason for that is that it is not nice enough a logic, full stop. So why would it be nice enough for computer scientists? Oh, because your problems are simpler, so you don't need a nice logic?

Saying that Martin-Löf type theory can do arbitrarily fancy math is both false and right. Just as saying that anything can be programmed in assembler is both false and right. Yeah, with enough focus and discipline you probably can, but who would want to?


> The reason for that is that it is not nice enough a logic, full stop. So why would it be nice enough for computer scientists?

Type theory has many attractive properties over traditional foundations like set theory. See, for example: https://golem.ph.utexas.edu/category/2013/01/from_set_theory...


Citing your link:

> At this point, however, you may be feeling that type theory sounds very complicated. Lots of different ways to form types, each with their own rules for forming elements? Where is the simplicity and intuitiveness that we expect of a foundational theory?

It's just not there, Mike. Type theory is neither simple nor intuitive, and it doesn't make a good foundation.

Type theory certainly has advantages, but it gets lost in esoteric texts like Mike's. There are two simple advantages over set theory within first-order logic: general variable binding, and that things are kept separate, for example that a boolean is not a number. Now, how to do general variable binding without static types I show in [0]. How to still keep things separate, in [1]. All without having to divide the mathematical universe into static types.


Professional mathematicians are today using dependent types in LEAN to prove leading edge mathematics. Google LEAN and mathlib.


I am aware of Lean and mathlib. It's the group around Kevin Buzzard which is doing mathlib, and it is for sure a great effort. Being practical, they latched onto the theorem proving facilities that are currently available, Lean certainly being one of the best at this moment. But just because they can make type theory work for them, with much effort, doesn't mean that it is the right way to do formal math. They are still a drop in the ocean compared to all the mathematicians who have never touched a proof assistant. I find it telling that in the Liquid Tensor experiment Peter Scholze didn't touch Lean himself, but worked through this group as intermediaries, who mediated the type theory details for him.

The state of proof assistants is such that currently almost all of them are based on some form of type theory. There are exceptions like Mizar and Metamath (Zero), which are based on first-order logic. Type theory seemed to be the only way to have proper general variable binding, via lambdas, and that is the main reason for its popularity today. Type theory is a particular mathematical theory, though, and not particularly well suited to most things mathematicians like to do, like forming completions or subsets of types/sets. Type theory also cannot deal with undefinedness properly. Of course, there are methods of working around these issues like coercions and option types, but they are cumbersome.

Until recently I also thought that a minimum of type theory is necessary, to get general variable binding. For example, I tried to embed set theory within simply-typed higher-order logic ([2]).

But since last year I know that you don't need type theory for variable binding (see [0])! Of course, (dependent) types are still useful, but now you can work with them in a much more flexible way, without having to divide the mathematical universe into separate STATIC and A PRIORI chunks labelled by types (see [1], but that needs to be updated with the understanding gained from [0]).

If type theory works for you, fine. But I know there is a better way.

[2] https://link.springer.com/chapter/10.1007/978-3-319-20615-8_...


You might also want to check out nuPRL. It uses a completely different approach (computational types). It might be a better match for what you are talking about.


I've heard of nuPRL, it seems to be also based on type theory, with special emphasis on constructivity. It's basically the same as Coq, foundationally. At a summer school Bob Constable once said that he would refuse to fly in a plane with software which had been verified formally using classical logic, instead of constructive logic... Well, I wouldn't mind an intuitionistic verification, but I'd definitely take even "just" a classical one.


No it is actually quite different. nuPRL starts with an untyped programming language and you then prove that an untyped expression has a certain behaviour. The behaviour is called a Type but it is fundamentally a very different idea from Martin Loff type theory (IMHO). They do say that it is MLTT, and in principle they are right, but MLTT is as powerful as set theory so that is true of anything mathematically. LEAN for example supports non-constructive mathematics. But it is still based on a type theory. Anyways … YouTube has a great talk about the ideas: https://www.youtube.com/watch?v=LE0SSLizYUI


Thanks for a great reply. I would like to check out your references. However you mention more than one but I can only see [2]?


No worries, happy to talk about this stuff all day long! The links are in a higher up post, they are:

[0] https://obua.com/publications/philosophy-of-abstraction-logi...

[1] https://obua.com/publications/practical-types/1/


Same with me. It is really interesting stuff!


It literally is. Martin Luff Type Theory is the mathematical foundation used by most proof assistants today. The type checker is the prover. If your code type checks then you have proven a theory. I highly recommend learning more about the Curry Howard correspondence and Dependent Types. It might just blow your mind.


See my other answers. Curry-Howard is interesting, but making it the foundation of theorem proving is a choice (in my opinion, not a very good one), not a necessity, as most Curry-Howard fans seem to think.


> First, theorem proving is NOT the same as an advanced form of static typing.

Yes it very much is in a dependent type theory.


Yes, and because of that limitation dependent type theory is an inferior logic for theorem proving.


That's silly. Whether it's a good or bad metatheory for theorem proving depends entirely on one's goals and preferences. Most mathematicians use a type theory over something else, so it's hard to even take your view seriously.

On top of that, you seem very opinionated for someone who was just confused about the topic of this thread. I have no idea why you're waging a holy war over this but maybe take a break.


Most mathematicians don't even use formal logic. For sure they don't use type theory! You seem to be the one who is silly/confused here. If you want to lift your confusion, read the [0] link I gave above.


All of those problems can be proven correct today using Coq/HOL/LEAN etc. see CompCert, seL4 etc. The math needed by Computer Scientists to prove correct is very different from what Mathematicians care about. Look at the mathlib project done by mathematicians in LEAN. It is pretty much useless if you want to prove code correct.


The topic of mathlib might be different, but the methods are the same. That's why you can use Lean for both in the first place!


> The threshold is mathematicians opting to use these tools themselves for their work. There are not many mathematicians using them so far.

No, you didn't answer the question at all. How many is many? How many will be enough for you? It is being used by mathematicians and for some pretty important things.


Interesting point. Let's say 1% of all mathematicians. That would be many.


There was a version of Lean that supported HoTT, and I think it helped in making Lean popular. But that support has been dropped in Lean 3, and Lean 4 does not support it either. Lean 4 itself seems to be a radical rewrite, and libraries written for Lean 3 do not work in Lean 4.


Mathematicians are using LEAN today to prove learning edge mathematics correct. Google LEAN and mathlib.


See my reply to you deeper below.


I was with you until:

> 5) Too much gratuitous abstraction. The old version was "everything is predicate calculus". The new version is "everything is a type" and "everything is functional".

Total functional programming with types is literally the same as writing proofs in intuitionistic logic, in a technical sense (the Curry Howard correspondence). This isn't a new fad. It's a deep result that was known more than half a century ago.


As the article mentions, formal verification techniques are primarily used today for two things:

- Creating secure "core" code -- library functions and kernels and stuff, where the things they're supposed to do are very well-defined.

- Verifying specific, narrowly defined properties, like how Rust's borrow checker guarantees that your program doesn't try to write to the same value from two different threads at once.

I'm not sure formal techniques will be as useful when expanded to other areas. Most of the bugs I encounter day-to-day happen because the programmer had the wrong goal in mind -- if you asked them to create a formal proof that their code worked, they would be able to do that, but it would be a proof that their function did a thing which was not actually the thing we wanted. (Similarly to, e.g., unit tests that do not actually test anything because they're just line-by-line reproductions of the original code but with every function call mocked out.)

Has anyone successfully applied proof techniques to reduce defects in UI development, "business logic", or similarly fuzzy disciplines?


I think this sharply discounts the value of the step before the proof which is writing the specification in a formal language/logic. This often surfaces these misunderstandings before a proof is even necessary.

That doesn’t guarantee that the spec will be the right one (as you say) but it’s astounding how often just writing it down formally improves the implementation through rigorous thought.


I've only done a bit of formal verification but I'd estimate that writing that spec was 7-10x harder than writing the actual program and was more complicated than the code.

In the end I had lower confidence that the spec lacked bugs than the program. This was after expending a huge amount of effort on a pretty tiny program.

I dont think this was a tooling thing. I think it spoke to the fundamental limits of formal verification. I think it'll always remain kinda niche.


I've used formal verification in anger, as well as working with a vendor to formaly verify some critical code for us. This rings very true. It is extraordinarily difficult to write correct rules. It ends up being the same problem as wishing against an evil genie.

Most rules that you come up with at first end up having a class of obvious exceptions in the real world, which the verifier finds, and then even more unobvious exceptions, and soon your logic around the exceptions to the rules become at least as complicated as the code you are attempting to verify. And in this any wrong assumptions that falsely allow bad behavior are not caught or flagged because they pass.

Even giving perfect proving software, it's still a far harder challenge to write rules than to write code. And current software is still far from perfect - you are likely to spend a lot of your rules time fighting with your prover.


I think this depends on the spec language and the target system. I’ve never encountered a spec more complicated than the program as the goal is always abstraction but I don’t mean to discount your experiences and complexity is affected by tooling familiarity and quality.

Separately the spec can often have its own properties which can be verified as a means to interrogate its correctness. For example state machines as spec, temporal logic properties and model checking where the state machine is the abstraction for a concrete system. Worth noting that proving state machines are an abstraction of a concrete system is a going research concern though.


> I think this sharply discounts the value of the step before the proof which is writing the specification in a formal language/logic.

If that was at all achievable, we'd have a compiler that took the "specification in a formal language/logic" and emitted native code/applications/programs. We'd then call the "specification in a formal language/logic" a programming language.

Sure, there are a lot of formal languages for specifying logic with checkers that ensure no bugs in the input specification exist, but AFAIK none of them are useful enough to emit programs.

Needing a human to translate one formal language (the formal spec) into another formal language is pointless and useless, because then the human may as well just translate human language specification into formal language.


If you're writing a sorting algorithm or a hash table implementation or something, then the spec is meaningfully different from the code. The spec says "the output array is sorted", the program describes some particular strategy for sorting it, and then you use the proof tools to make sure that the strategy actually works to sort the array in all cases.

But for things like UI code, I too am having trouble imagining a spec that is concrete enough to be useful for formal verification and does not have some trivial 1:1 correspondence to the implementation. (If anyone knows of an example, I'd really be interested in seeing it!)


> If you're writing a sorting algorithm or a hash table implementation or something, then the spec is meaningfully different from the code. The spec says "the output array is sorted", the program describes some particular strategy for sorting it, and then you use the proof tools to make sure that the strategy actually works to sort the array in all cases.

I dunno. Thinking more deeply about the specification for a sorting algorithm, it makes sense that the specification includes the O(n) runtime (or memory usage, or both), or else it's an informal specification.

If the spec is really nailed down formally then the specification language really would be the implementation language too.


> That doesn’t guarantee that the spec will be the right one (as you say) but it’s astounding how often just writing it down formally improves the implementation through rigorous thought.

But that doesn't have much to do with formal methods. You can achieve the same effect grabbing a colleague and explaining your spec to them, it will trigger the same rigorous thought because you want them to understand you.


I would love to see a fuzzer applied to business logic. It should take design requests from PMs and execucritters and ask pointed questions about edge cases.


I love this idea. I bet you could get surprisingly useful results just using a language model like GPT-3.


I am strongly inclined toward verifying my software to the extent possible but there are many practical challenges. I think academic formal verification methods look elegant, which appeals to me, but are extremely human intensive when what I really want to do is throw machines at the problem to the extent possible. There are also some important types of software correctness that are still difficult to capture with these methods, though the state-of-the-art has improved with time.

I've toyed with many methods, tools, techniques, and approaches to get a sense of where the ROI maxima is for my own purposes. In practice, I've found that sophisticated and comprehensive application of less elegant methods amenable to throwing hardware at them, like exhaustive functional testing, thorough fuzzing infrastructure, systematic fault injection coverage, various types of longevity testing, etc when done well often found all the same design flaws as a tractable level of more academic formal verification. Also easier to maintain as code evolves. Furthermore, these less elegant approaches also found the occasional compiler and hardware bug that more elegant formal verification methods typically do not.

I have wondered if developing and standardizing this less elegant tooling to a high level, so that it is easier to be lazy and throw hardware at the problem, would have at least as much impact on software quality as trying to get everyone to apply very academic formal verification methods, with their current limitations and theoretical constraints. As much as I like the concept of very pure formal verification, I lean toward whatever makes maximizing software quality practical and economic.


Outside of mission critical applications, if the cost involved to make software "provably correct" (time, salaries) is greater than the cost of the bugs, it will never be adopted.

Believe me, I see the appeal, but it's kind of like demanding your house have all perfect right angles and completely level surfaces. Living with manageable imperfection is far more realistic.


Why does it always have to be all or nothing? Is your codebase 100% covered by unit tests?

You don't have to make sure all the angles of your house are perfect right angles, but there's probably a couple that really have to be. Formally verify those and live with manageable imperfection for the rest.


"All or nothing" is not the point I'm making, it's "any or nothing." Unittests frequently have justifiable value for their effort, even when only providing partial coverage. I'm arguing that the effort involved in creating formal proofs for any part of your software is almost never worth the benefits, outside of mission critical applications, when a dozen simple unittests could provide 99.99% confidence. And certainly not worth it enough to become "mainstream", which is the title of the post.


Exactly. In many settings it is quite a reasonable business decision to regard a bug as WONTFIX.

Here's one example:

developer: adding proposed feature A to API B may result in displaying inconsistent data to the customer, in this kind of situation <details>

team owning service of API B, some weeks later: thank you for flagging that. we have thought about this, and propose a slightly different API design B'. let's meet to discuss.

product owner: the scenario that would trigger that situation requires a combination of unusual events: it has to coincide with a scheduled batch process, and the customer would need have unusual characteristics that are rather different from the distribution we observe. When the defect happens, the impact to the customer and our business is not so bad.

developer: That's fair. given how the feature works there are many other ways it can show inaccurate data to the customer as well.

engineering manager: how about we don't fix it, does anyone disagree?

everyone agrees not to fix it

A more interesting question is to ask "in which business situations does it make sense for formal verification to be used?". If you want to get paid to play with formal verification in the day job, you'd best find a business context where it is business critical to identify and remove design and implementation defects early, and it is worth spending a lot of money trying complementary techniques for doing so.

For my above example:

- the defect is not related to a core capability of the product. if it occurs, the defect does not cause a huge impact to the customer or the business. it may mildly annoy the customer.

- how the feature works is fairly ill-defined anyway, there are many things that influence it, and it's job isn't to preserve customer data, so data integrity isn't a huge priority

- if the defect did start causing substantial problems in future, it would be possible to change this decision and decide to fix it. the cost of fixing it later might be 5x fixing it now, but it wouldn't e.g. require a product recall or risk bankrupting the company.


Defects, and potential defects, tend to cause a background of mental overhead, and communication overhead, throughout the organization. People have to remember what kinds of assumptions are broken under what circumstances, and relate that to a business impact.

There's a mental unburdening when you can just say "situation X won't happen".


great point, the more of these kinds of shortcuts accumulate, the more complicated it becomes to think about related changes in future, making maintenance, training new team members more time consuming and expensive. maybe not quite the same as "technical debt", but some kind of debt.

tangentially, re: organising things to reduce cognitive load, there is an interesting discussion with the authors of the book "team topologies" who characterise the purpose of platform teams as existing to _reduce the cognitive load of stream-aligned teams_ (who have the goal of providing a user-facing service or product, say). https://www.infoq.com/podcasts/software-architecture-team-to...


We use static type systems all the time, as well as specialized checkers and linters, and none of those showed themselves to have "costs greater than the cost of the bugs". And none of them are even nearly similar to "demanding your house have all perfect right angles and completely level surfaces".

Do you have any reason to believe that all the rest of the verification theory is completely impractical when every piece that was packaged in a usable context became a hit?


This is kind of like AI. When something succeeds in general practice, it is no longer "formal verification". Now it's "robust type systems" and "static analysis tools". Those provide formal verification of some aspects of the code. And that's great! It's progress!

Full formal verification is (probably?) still a long way off, depending on your definition of "full". (My definition would be one where Knuth's observation that "it's amazing how many bugs there can be in a formally verified program" is no longer true.)


> This is kind of like AI. When something succeeds in general practice, it is no longer "formal verification". Now it's "robust type systems" and "static analysis tools".

Yes indeed. That is insightful.

Is there a name for this process?


Well, in AI, the process is called "AI can never succeed". So maybe the general thing is "X can never succeed", where X is some hugely ambitious (and ambiguous) thing, like "AI" or "formal software verification" or "curing cancer".

We're probably never going to cure cancer - that is, have some treatment that conquers all cancers. Instead, we get "for this specific type of cancer, for these specific conditions, this treatment has a higher survival rate than the ones we had before". Over time, that adds up to a lot of people living out their days rather than dying early.

And maybe software verification is the same. Enough ways of verifying specific aspects of software, and bugs have fewer places to hide. It won't find all bugs, but we'll still get better software.

But "X can never succeed" isn't a very catchy phrase. Can anyone coin a better one? (Or, is there already a better one that I don't know about?)


Well, the article has quite a list of "some aspects" of your code the author is working on.

I have no reason why any of them could not be successful. I don't expect all of them to be, but anyone that gets there is already a huge advance.


> Do you have any reason to believe that all the rest of the verification theory is completely impractical when every piece that was packaged in a usable context became a hit?

While I am by no means have enough knowledge to claim such, but let’s just add that ordinary types are a so-called “trivial property”. Many, actually interesting properties can’t be proved as per Rice’s theorem in every case.


That's not what Rice's theorem states. Rice's theorem states that interesting properties are undecidable, not that they can't be proven. Undecidability is not relevant when you are providing the proofs to the computer.


If you would have a proof, you could make an algorithm that verifies it, making it decidable, couldn’t you?


Checking proofs is decidable. Coming up with proofs is undecidable.

This tool does the former, leaving the latter up to humans.


Using a language with provides static typed != using static types, quite a few programs bypass the static types with 'everything is an int' or 'everything is a string'..


I think the cost of provably correct software is actually much lower, but you pay more of it up front. The perceived incentives of being first to market are higher than that of quality software.

I suspect eventually there will be a big lawsuit where the blame can be laid on negligence in the software development and the incentives might change somewhat.


> The perceived incentives of being first to market are higher than that of quality software.

"Perceived"?

It's very rare that shipping later because of $quality-concerns is profitable. In most cases, that "later" never arrives anyway.


The first such successful lawsuit is going to change the landscape. Not just "somewhat" - it's going to be a massive change.


> The first such successful lawsuit is going to change the landscape. Not just "somewhat" - it's going to be a massive change.

I'd argue that, if there was going to be a needle-mover lawsuit, it would have happened by now. Until there is evidence that it will happen, we can continue assuming that it won't.


This sounds like motivated reasoning to me. Have you heard of Therac-25?

You might be inclined to suggest that that could have been prevented if only they'd used formal methods. Perhaps that's true. It's something that could have been prevented in many different ways, though. Yet it still happened.


Hmm. If I recall correctly, Therac-25 destroyed the company. It (and incidents like it) led the FDA to gradually be more stringent on scrutinizing software in medical devices. And yet, to the best of my knowledge, even the FDA has not mandated formal verification of the embedded software in medical devices.

And yet, there almost certainly were lawsuits. So maybe you found evidence that perfectly counters my theory. It changed software, but only a little bit. Very little, given the magnitude of what happened.


> And yet, there almost certainly were lawsuits. So maybe you found evidence that perfectly counters my theory. It changed software, but only a little bit.

There seems to have been something akin to an "accident chain", where a large number of things went wrong. Had any one of these things not happened, there might have been much less harm caused, or even no harm at all.

I will admit to being peevish about stuff like this. Some of the failures with Therac-25 were systems failures that had nothing to do with software per se (I'm not counting "software hubris" as a software problem). They were failures of process, problems with hardware interlocks, and even UI bugs that made the software confusing to operators.

I have nothing against formal methods, but they're no substitute for a deep and abiding paranoia.


The company is still around and still makes hundreds of millions in revenue. https://en.wikipedia.org/wiki/Atomic_Energy_of_Canada_Limite...


Granted it was 1982, before software played such a big role in everyone’s life. Maybe it would be a bigger deal now.


> if the cost involved to make software "provably correct" (time, salaries) is greater than the cost of the bugs, it will never be adopted.

It may already be. Where's the research? It may not be happening just because of quarterly cycles, other misaligned incentives, culture or all kinds of other reasons.

> Believe me, I see the appeal, but it's kind of like demanding your house have all perfect right angles and completely level surfaces. Living with manageable imperfection is far more realistic.

The difference is that those things are all within tolerances and serve their purposes and function correctly. Software often doesn't.


>Software often doesn't.

And yet the world keeps turning, tech companies keep profiting, and customers are generally happy with the value provided, all without formally provable code bases. How does "provably correct" improve on this without extending timelines and costing more?


> customers are generally happy with the value provided

Don't confuse resignation and ignorance with happiness. People are used to computer systems just breaking and being the root of various problems (e.g., identity theft, privacy leaks, systems that just don't work some days for no apparent reason, and so on). The fact that they accept this flaky and unreliable state as the status quo doesn't mean they're happy with it - they just don't understand that better is actually possible.

I work in the security and assurance world. The biggest obstacle we face isn't technical - it's social. Developers want the route of least effort and least time to get products to market, and end users are largely ignorant of the fact that the world doesn't have to be full of garbage software. At this point, I'm rooting for a massive change in the legal landscape to start treating software defects the way we do engineering defects in physical systems. Developers and businesses aren't going to do the right thing by choice, so a giant hammer in the form of the legal system is likely to be the only thing to force change. I am fully aware of the consequences of that (e.g., it will likely severely chill open source, and will likely slow many business sectors down) - and I accept this. I'd take those consequences for the safety/security/assurance outcomes, even if they cause havoc on the revenue/business side and make "10x" python hackers grumpy. People will likely take formal assurance methods more seriously when there are actual consequences to deploying unsafe/insecure systems.


> tech companies keep profiting, and customers are generally happy with the value provided

Obviously the answer is that they could profit more and be happier.

Ever heard of this thing called ransomware, for example? Identity theft? And you must know, this stuff is only the beginning... Just wait until the day everyone's private Facebook chats are available on torrent.


>Ever heard of this thing called ransomware, for example? Identity theft?

Software can still be provably correct and have security holes resulting from an insecure definition of "correct." Formal verification does not solve security.

>Obviously the answer is that they could profit more and be happier.

Please explain how a company could profit more if formal verification does not bring more revenue than it costs? You seem to be assuming that revenue will appear that is greater than the costs. Where is this revenue coming from, exactly?


> Software can still be provably correct and have security holes resulting from an insecure definition of "correct." Formal verification does not solve security.

Nirvana fallacy. The point is that it can be much better and eliminate ALL non-design bugs.

> Please explain how a company could profit more if formal verification does not bring more revenue than it costs? You seem to be assuming that revenue will appear that is greater than the costs. Where is this revenue coming from, exactly?

The revenue could come from savings on fixing bugs, paying for ransomed assets and all other costs that come from bugs. You're just assuming that doesn't add up and that there's no other reason that we don't do formal verification. That's just stupid. Show me the studies. Your claim is just as strong as the claim you think I'm making, but you missed my point entirely.


> Nirvana fallacy. The point is that it can be much better and eliminate ALL non-design bugs.

Serious question: do you really believe that?

> That's just stupid. Show me the studies.

There is a kind of HAL-9000 quality to many of these arguments. Formal verification is perfect by definition. The fact that it hasn't had very much impact in the real world is all the more evidence of the world being full of wicked people.


> Serious question: do you really believe that?

I mean it's a fact, so yes. You can prove programs are correct. The only possible flaw they can have is the specification is wrong.

> The fact that it hasn't had very much impact in the real world is all the more evidence of the world being full of wicked people.

That's very much not what I said. There may be many reasons. Assuming some conclusion without actual research is braindead. Cost-benefit analysis is not the only reason things do/don't happen in businesses and we have no idea whether that's the reason here. It's an empirical question that requires actual research, not a priori jacking off.


> You can prove programs are correct. The only possible flaw they can have is the specification is wrong.

So specifications are kind of like programs?

Have you heard of logical positivism?

> That's very much not what I said. There may be many reasons.

My point was that it always seems to be some external factor. That strikes me as being very convenient.

> Assuming some conclusion without actual research is braindead.

I didn't think I assumed anything. Like anybody else, I have many things that I need to assess in my day to day life, and often deal with considerable uncertainty.


Well, seL4 has verifications of it's mixed-criticality hard-real-time guarantees (sufficiently tight bounds on scheduling latency (and such) to be useful) and data diode functionalities, and it's isolation properties have been verified not just at a fine-grained specification level but at a high-level human-readable level of invariant description. It doesn't cover timing and maybe some kinds of similar, other, side channels, but it's still extremely useful.

Formal verification shines in two situations: complicated optimized algorithms with a naive reference implementation you want to confirm equivalent, and high-level behavioral invariants of complex systems (like seL4's capability system, or a cluster database's consistency during literally all possible failover scenarios).


> with a naive reference implementation you want to confirm equivalent

I'm guessing you know this but in type theories like Agda you can just specify that the input and output to an algorithm has the desired properties, rather than needing to specify any reference algorithms. For example, you can just state that an implementation takes a list of X and that it outputs a sorted list of X. Nothing more is necessary in cases like that in such a system, no code, just the single type.


Well, yes, that falls under the second case: behavioral invariants of complex systems.

And the reference for "sorting" could likely be a deterministic bogosort and certainly a primitive bubblesort.

Even if you're just looking at sorting stability, you're past what your simple "sorted" type would cover.

Most things are far less trivial than "sorted list", including (almost?) all interesting practical applications of formal verification in the life of a "normal" software engineer.


It's a tradeoff though - you could spend the time looking for bugs between the design and implementation, or you could get the implementation out sooner and get feedback and iterations on the design


Even regular static type-checking is seen as a burden by many programmers.


Microsoft developed the excellent Code Contracts[1] project. From the user's perpective, tt was a simple class called Contracts with a few methods such as Require(), Ensure() and Invariant()

Underneath the hood it used the Z3 Solver[2], which is both intuitive, flexible and fast. It validated the contracts while coding and highlighted in the Visual Studio IDE when a contract was broken.

You could write something like: Contracts.Requires(x > 5 && y > x);

Which would get translated to a compile time check as well. Unfortunately, Code Contracts has been dead for years now, and it was even removed entirely from .NET[3] due to being hard to maintain, and the verifier stopped working in newer versions of VS.

Luckily, C# developers now have a small taste of contracts due to nullability analysis[4], but even more exciting is that contracts is making its way into C# as a first-level standard[5].

[1] https://www.microsoft.com/en-us/research/project/code-contra...

[2] https://github.com/Z3Prover/z3

[3] https://github.com/dotnet/runtime/issues/20006

[4] https://docs.microsoft.com/en-us/dotnet/csharp/nullable-refe...

[5] https://github.com/dotnet/csharplang/issues/105


While I laud the goals, I am skeptical of the ability to met them.

I very much believe that there is an industry-wide crisis of terrible software, but I don't believe that it's practical to go directly from "garbage to gold." The path is long, and far from straight.

Best Practices are how engineering disciplines, throughout history, have achieved progress in Quality.

Currently, Best Practices aren't really a "thing," in software development, and it shows. People like Steve McConnell are not really respected, and a general culture of "move fast and break things" is still pervasive. Engineers flit around companies like mayflies, techniques and libraries come and go, and there's an enormous reliance on dependencies with very little vetting. We spend so much time, trying to perfect our tools, without trying to perfect ourselves.

Academics and theorists have been proposing languages, libraries, infrastructure, and management practices that are designed to change lead into gold for decades, yet it never seems to happen.

I have always been a fan of self-Discipline, and the apprenticeship model. That requires a lot of social infrastructure that does not currently exist. It's as old as human history, and absolutely proven to achieve results.

"In theory, there is no difference between theory and practice; while in practice, there is." -Benjamin Brewster

"It is not enough to do your best; you must know what to do, and THEN do your best." -W. Edwards Deming

"The significant problems we face cannot be solved by the same level of thinking that created them." -Albert Einstein

"Everyone thinks of changing the world, but not one thinks of changing himself." -Tolstoy

https://xkcd.com/2030/


> While I laud the goals, I am skeptical of the ability to met them.

Let's start with the easiest part, and rewrite everything in Rust (or a comparably safe language). We'll need something like this anyway to give our software a workable semantics that's free of UB (at least wrt. the "safe" checked subset). Then we can work on the harder problem of using proof to establish that the unchecked parts do not invoke UB, and that intended specifications are not violated even in the "safe" parts.


Tons of, if not most, "move fast and break things" software is written in safe languages, not C or C++.


The tools shouldn't matter. A good engineer can use whatever tools are at hand, to achieve their ends, as long as they take a Disciplined approach, informed by industry Best Practices.

Tools can help us to work faster, and abstract some of the "day to day" trivia, but, at the end of the day, we are still left with ourselves.

If you want a lesson in limited tools, try working on embedded software.

Embedded development systems often have languages that are incredibly dangerous, and extremely limited (It's not uncommon to be working in a subset of ANSI C). Good embedded engineers are usually trained in hardware practices, as opposed to CS ones. They understand the core fundamentals of what they are doing, so their work is not just rote.

I know that it's an incredibly unpopular stance, but I don't see any alternate path to becoming a better engineer, other than through patience, practice, persistence, and Discipline.


Are you saying C# and Java are safe languages or which ones did you have in mind?


Formal verification needs machine readible formal specifications, but any kind of written specification, informal or not was pretty hard to find in my career at internet giants. Maybe you can get a formal spec in aerospace or FDA regulated implanted devices, but cost to write the spec, let alone to follow the spec is way too high when the spec needs to change at the whim of a hat.


In the SPARK subset of Ada, the specifications and contracts live alongside your code in the same language, then you can prove that the specs are satisfied.

You can also leave out the contracts and just prove absence of behaviors like divide by zero, out-of-bounds array access and integer overflow. Proving that code meets the specification can be really difficult but proving absence of bad behavior is usually a straightforward endeavor.


One thing that will help drive adoption is the ability to run SMT solvers more quickly so the proof stage of your design/build has a faster feedback loop.

I ran some experiments with the Z3 and alt-ergo solvers (verifying SPARK/Ada code using GNATprove) on a base M1 Mini and it absolutely screamed, I’m not normally a Mac fan-boy but new chips like the M1 Ultra might have the possibility of driving a mini-renaissance in FV.

I’d like to see more attention being given to GPU accelerated SMT solvers too but haven’t seen much movement outside of a handful of research papers.


I've started watching Lamport's TLA+ course in YT and it totally blew my mind.

What are other good resources in formal verification?


I found TLA+ syntax unapproachable as a beginner. I did more digging and discovered Alloy[0], a "lightweight" formal verification system, which is much simpler, and has industry adoption as well. It's working nicely for me so far.

-----------

[0] http://alloytools.org/


The Bible of formal software logic, free of charge: https://softwarefoundations.cis.upenn.edu/


If you are more practical than religious, try this: https://functional-algorithms-verified.org


State Machines are the natural paradigm in TLA+. OOP is just poorly organized state machines. Try rewriting an OO program with state machines and you’ll have something that directly maps to TLA+, and it should demystify much of formal verification.



Looks interesting, perhaps because it seems to be a bit more down to earth than some of the other proposals where you have to derive Whitehead&Russell before you're allowed to use the + operator. But it also seems quite dead. The latest link is from 2012.


It is still alive, it has just moved to github! It is a big language and it can prove useful programs. Apparently, part of the Ethereum 2 specification was verified using it. https://github.com/dafny-lang/dafny

I have been learning it and the syntax is close to most C style programming languages. As a software developer this makes it much more approachable than Coq. The proof statements also feel more like the math I learned in college rather than the weird magic keywords of Coq.


Thanks. Pretty cool.


“Complexity is the business we are in, and complexity is what limits us.”

Yet, just as Turski recognized, some people seem to be philosophically offended by this notion, and try to fight the proven limitations. They are under the impression that the world — and I’m not just talking about computing now — is essentially simple, and it the stupidity of people and institution that needlessly complicate it (or, in the case of software, “stupid” programming languages). If we apply careful mathematical reasoning, we could find solutions to anything.

Computer science is the very discipline that proved that essential complexity can arise even in the smallest of systems. Yet sometimes it is computer scientists and software developers who attempt to challenge the very foundation of their own discipline. Complexity is essential. It cannot be tamed, and there is no one big answer.

The quote is from excellent https://pron.github.io/posts/correctness-and-complexity and it would be really good if the authors read it carefully first.


I think there is a project quite similar to this one called Verifiable Software Toolchain (VST) in which you can write a C program, convert it into a massive Coq expression, and then write theorems about that expression in Coq. The Software Foundations series has a volume about it [1], which I found to be an order of magnitude harder to understand than the other volumes.

It feels like the magmide project aims to the same goal as VST. It's unclear how it will improve on what VST has done. It may just be that formal verification of real world languages is inherently complex.

[1] https://softwarefoundations.cis.upenn.edu/vc-current/index.h...


While this is a very ambitious goal, Nim implements formal proof for invariants using a theorem prover:

https://nim-lang.org/docs/drnim.html


Do we have formal verification for formal verification yet? I want to make sure my verification does not have bugs.


Yes, it's an active area of research. One very recent example, focusing on Dafny: https://www.amazon.science/publications/testing-dafny-experi...


> ...existing uses of Iris perform the proofs "on the side" using transcribed syntax tree versions of the target code rather than directly reading the original source.

I'm a formal verification dummy, so can someone please confirm if this means these uses of Iris are creating an Abstract Syntax Tree (AST) of the source, then operating upon that AST?

If so, can I please get an ELI5 why there is a salient formal verification outcomes difference between using the AST and "directly reading the original source"?


So this actually looks really neat if it works. However, hopefully in the spirit of constructive criticism, I would be very nervous about sticking this in big letters at the top of the introduction:

> Software can literally be perfect

because that is a wonderful way to get people to invest in really robust, excellent, high-quality software - and then trust it blindly and ignore that even if everything goes well and the software is itself perfect, and the verification has no bugs, and the model that it perfectly implements actually maps the problem space correctly, it will still run on fallible hardware, interfacing with other software that is imperfect, taking direction and data from humans who can make mistakes. Now to the author's credit! Further down, under "Do you think this language will make all software perfectly secure?" and "Is logically verifying code even useful if that code relies on possibly faulty software/hardware?", this is discussed. And I think the writer actually does appreciate the limits of what this can actually do, and I very much appreciate them explaining that in what I'd call clear terms. Just... maybe don't headline with a claim like that when it has caveats and people are liable to read the claim and ignore the caveats?


Honestly, to me this project is a means seeking an end, the same way JS devs love to play around with frontend frameworks, the author saw a bunch of shiny powerful (highly complex) tools and decided that combining them all was the solution to our problems.

I don't want to discourage them from learning Iris, or designing a dependently typed language, but I really think that's missing the difficulty in formal verification.

I think the two areas that need focus are: ease of specification and automation. In short, we need to lower the cost of verifying a line of code, by at least an order of magnitude. These two objectives are also directly opposed to the direction Magmide sets as the goal. Ease of specification means we want to use the least amount of seperation logic possible, and hide it from the user if possible. Doing proofs / writing specs in seperation logic sucks and not for interesting reasons. Automation means favoring simpler logics, specifically we want to stick as much as possible to FOL since that's where we have good automation. By doing everything in a rich dependently typed language from the start it also makes it harder to do incremental verification, I think there is a lot of value in having a 'pyramid of trust' with more and more powerful tools which take you up a level of trust and verification, potentially requiring more input from engineers as they go up.

Finally, I think there's a lot of potential to explore in the interfaces we use to write, read, and debug proofs. I don't think tactic languages (as exist today) are the last word, and I think we should be doing a lot more interesting things to interface with and explore the proofs.



Try the proof assistant Agda without installation directly in your browser:

https://agdapad.quasicoherent.io/

Or Lean: https://www.ma.imperial.ac.uk/~buzzard/xena/natural_number_g...


One of the approaches for formal program verification is to convert an unrestricted grammar G_1 into a context-sensitive grammar G_2 subject to some constraints C. We then derive a linear bounded automaton A_2 that accepts the language L(G_2). We then transform the input program i.e., a string S_1 in L(G_1) to a modified program i.e., a string S_2 in L(G_2). If A_2 halts on S_2 then A_1 halts on S_1. By definition, A_2 accepts S_2. Therefore, A_1 accepts S_1.

Of course, L(G_2) is a subset of L(G_1) which means that many programs written in G_1 that do not meet the constraints C cannot be verified. But the benefit is that programs that do meet the constraints C are provably verified.

The tension lies in keeping C small and maximizing utility of the approach for a wide class of programs/libraries/programming paradigms etc.,


This builds on the success of Rust, but Rust has not been a success when it comes to [number of engineers writing professional code in the language]. By that measure it's still incredibly niche compared to interpreted languages.

The main reason why formal verification has not had even the success of Rust is that most developers (myself included) don't know enough about the area to take an interest, and certainly don't know enough about the area to pursuade skeptical managers.

Unless a big company comes forward with a bunch of case studies about how they used formal verification successfully I can't see the developer mindset changing.


Formal verification predates Rust by decades. The addition of lifetimes and the borrow checker are worthwhile contributions to the field of provably-safe software, but there are a lot of other soundness properties that Rust doesn’t check (though people are working on it).


I meant to say "this hopes to build on the success of Rust" — Rust is explicitly called out in the readme.


For functional stuff, sure, but I don't think this is achievable within the UI ___domain. CSS rules have implementation details that change how you write it (some problems have workarounds), for example there's a documented set of issues in flex implementations maintained here: https://github.com/philipwalton/flexbugs

It might be practical and possible to become mainstream for some domains, but it's doubtful for others. The most practical solution for UI is visual regression testing across browsers.



Is it even realistic to make a provably secure/stable applications on an OS like windows or linux? This (provable correctness of programs, at the expense of performance) is one of the explicit design goals or urbit. I know HN hates urbit and don't want to rehash that, but it seems like a good goal for some use cases and I'm not sure it's possible to achieve without building the OS around it.


Urbit has not put much effort into security, for some reason. To be fair, they don't claim their runtime is secure (yet)[1]. The process downloading and executing code from the network is not sandboxed with seccomp or anything similar, and its "jets" are unverified C libraries which any of this code is allowed to call into. They could sandbox it pretty easily (the worker process which runs third-party code only talks to a host process and not the rest of the world, so it could probably be run under seccomp, not even seccomp-bpf) which makes it all the more surprising that they haven't.

Urbit has also had (and almost certainly still has) bugs where jets give different results than the code they're supposed to accelerate (a "jet mismatch") [2]. I agree that its "axiomatic" bytecode would lend itself well to verification theoretically, but Urbit as she is spoke is not anywhere close. They also at least historically seemed somewhat hostile towards academic CS research (including formal methods) probably for weird Moldbug reasons.

[1]: https://urbit.org/faq#:~:text=The%20security%20of%20the%20ru....

[2]: https://urbit.org/blog/common-objections-to-urbit#:~:text=Ye...



Let's say that Urbit claims to be formally, provably secure. But approximately nobody actually understands Urbit, which means that nobody knows whether the proof is solid. So as an outsider, I have to either take it on faith that it's secure, or I have to spend a fair amount of time immersing myself in this hard-to-learn system to see if the claimed benefits are really there.

But it's not just Urbit. Rust has essentially the same problem.

In fact, perhaps all of formal verification has this kind of problem. How do you prove the benefits to someone who doesn't know the tools?



You don't have to understand the linux kernel to buffer overlow a linux application; If they do something like "Here's the IP of a running urbit with 100BTC in it, good luck!" and it's still up in a few years, that'd be compelling.

But more generally, if it's true that the only way to make a provably secure app is to design the OS and language around that purpose, then the problem you describe is general too - it will always be a challenge to find auditors.


I applaud the research. Of course, those organizations creating and suffering from the most bugs will be the least able to utilize such a language.


I suspect that verifying software is a lot like the termination problem of Turing machines: the more useful properties you want to verify, the closer it is to NP completeness. So a practical verifier should limit its scope to a modest subset of software and settle on verifying sonething with a sufficient degree of confidence, which is lower than 100%.


It turns out that you can write proven correct C compilers (CompCert), proven correct Operating Systems (seL4), and proven correct secure network protocols using proof assistants like Coq and F*. All of those proof assistants are using languages that are not Turing complete. You have to prove termination if the proof assistants can’t automatically prove it. So being Turing complete is not really necessary for writing complex software.



Both termination and verification go beyond NP completess, in that they are undecidable. Also, "verifying so[m]ething with... confidence... lower than 100%" is known as testing.


Termination is undecidable in general but that is not a problem in practice. Proof assistants like Coq, LEAN etc. can automatically prove termination for most practical code. And if not, the developer has to prove termination by hand. Which is not hard in most cases I have encountered. And if you still can’t do it you simply add a “fuel” natural number counter that guarantees termination (exit when out of fuel).


I want to understand what is verification process? How would programming will take new step forward if this is achieved? I have been programmer for a while but I don't understand context and discussion around verification. Please point me any useful resources which can give me deep understanding of what's being discussed here.


I recently came across Ur/web. It makes some promises that are quite attractive. One of the benefits of having a DSL rather than something general purpose is that it can make these promises in a more comprehensive and focused manner.

But despite being around for a while now it didn’t get adopted very widely.


Many engineers and teams are aware that they write bad code, and they love it. You can get very far as a clumsy code vendor. Even if formal verification was practical, it would be pretty difficult to make it mainstream.


You can prove that an algorithm is correct most of the time (yes, halting and decidability but for practical purposes you mostly can).

How do you prove an event driven application is correct?


Modelling your application as a state machine will allow you to use model checking to prove various properties about your model. Proving that your program is equivalent (for some chosen definition of equivalence) to the state machine is left as an exercise for the reader.


> Proving that your program is equivalent (for some chosen definition of equivalence) to the state machine is left as an exercise for the reader.

"It's obvious" :)


Wow, the language here is even more optimisitc than the rosiest descriptions you see from young researchers, which prompted me to check if the author has had much experience deductively verifying interesting "deep" functional properties of non-trivial programs. The answer seems to be no.

Like a newcomer to the field, he focuses on "first-day" problems such as language convenience, but the answer to his question of why this hasn't been done before is because that's not the hard problem, something he'll know once he obtains more experience.

One of the biggest issues — indeed, the problem separation logic tries to address, but does so successfully only for relatively simple properties — is that "correctness" does not feasibly (affordably) compose. The difficulty of proving the correctness of a program made out of components, each of which has already been proven correct for any desired property is not easier than proving the correctness of the program from scratch, without concern for its decomposition. I.e. proving the correctness of a program made of ten provably-correct 500-line components is no easier than proving the correctness of all 5000 lines at once. This has been shown to be the case not only in the theoretical worst case, but also in practice.

Here's an example to make things concrete. Suppose we have the following (Java) method, calling some unknown, pure, `bar` method:

    long foo(long x) {
        if (x <= 2 || (x & 1) != 0)
            return 0;
        for (long i = x; i > 0; i--)
            if (bar(i) && bar(x - i))
                return i;
        throw new Error();
    }
It is very easy to describe exactly under what conditions an error would be thrown. Similarly, it is easy to describe the operation of the following method, and prove that it performs its function correctly:

    boolean bar(long x) {
        for (long i = x - 1; i >= 2; i--)
            for (long s = x; s >= 0; s -= i)
                if (s == 0)
                    return false;
        return true;
    }
However, it is not easy to determine which inputs, if any, would cause foo to throw an error using that particular bar. In fact, we only happen to know that this particular question is extremely hard because it is one that has interested mathematicians for 300 years and remains unanswered.

While most verification tasks don't neatly correspond to well-known mathematical problems, and most require far less than 300 years to figure out, this kind of difficulty is encountered by anyone who tries to deductively verify non-trivial programs for anything but very specific properties (such as "there's no memory corruption", which separation logic does help with). Various kinds of non-determinism, such as the result of concurrency or any kind of interaction, only makes the possible compositions more complex.

In short, the effort it takes to verify a program does not scale nicely with its size, even when it is neatly decomposed, and it is this practical affordability — which is not a result of the elegance of the tools used — that makes this subject so challenging (and interesting), and requires some humility and lowering of expectations even when it is useful (and it can be certainly useful when yielded properly and at the right scope).

Another problem is an incorrect model of how programs are constructed. One might think that if a programmer has written a program, then they must have some informal (but deductive) model of it in their mind, and all that's missing is "just" formally specifying it. But that is not how programs are constructed over time when many people are involved. In practice, programmers often depend on inductive properties in their assumptions, such as "if the software survived for many years, then local changes are unlikely to have global effects that aren't caught by existing tests." Those assumptions are good enough for providing the (often sufficient) level of correctness we already reach, but insufficient for constructing software that can be formally specified, let alone deductively proven correct.

That is why much of the contemporary research focuses on less sound approaches, that aren't fully deductive, such as concolic testing (e.g. Klee), that allow better scaling for both specification and verification at the cost of "perfection".

The reason why both research and industry don't all do what is proposed here is because they know that's not where to real problems are. There are bigger issues to tackle before making the languages more beginner-friendly.


> Another problem is an incorrect model of how programs are constructed. One might think that if a programmer has written a program, then they must have some informal (but deductive) model of it in their mind, and all that's missing is "just" formally specifying it. But that is not how programs are constructed over time when many people are involved.

And that's just looking at how things work within the engineering teams. What is the perfect software for "we need an identity card system for physical and logical authorization of two million military personnel"?


Can we formally verify the software cannot be used for evil?


Sure, given a formal definition of evil.


gonna have to formally define evil


Evil: the privation of a good that should be present. It is the lack of a good that essentially belongs to a nature; the absence of a good that is natural and due to a being. Evil is therefore the absence of what ought to be there.

https://www.catholicculture.org/culture/library/dictionary/i...

https://en.m.wikisource.org/wiki/Catholic_Encyclopedia_(1913...


[flagged]


> Like much of Christian thought, this is lazy to the extreme.

> But it nonetheless fits this idiotic definition.

Your personal bias is showing, to the point of going against the site guidelines; both 'When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."' and, y'know, literal religious flame warring.

Also, you're either wrong or at least not correct;

> Is ice cream that has thawed evil? No, rather obviously.

You are of course welcome to argue that it's not, but yes, there are Christians who would/do make this exact argument; they might not claim that ice cream has the capacity to be willfully evil, but they would absolutely argue that ice cream melting is Wrong and a symptom of a Fallen world (in the sense of "fallen creation"), and that in a Good world ice cream would not melt.


The constructivist in me thinks we could define it inductively :)


How about: skeptical of formal verification?


I've been looking at writing a code generator, which spits out vectorized code for multi-way joins (kind of like "dynamic programming" to not suffer from being locked in a specific order of binary joins), with the additional complications of being for the incremental/difference-based "Differential Dataflow"[0], the multi-temporal aspect DDFlow needs to handle it's iterative/fixpoint operator, and dynamically adjusting IO concurrency (the asymptotics of part of the "dynamic programming" index traversal suffer when squeezing concurrency out of it, as it's pretty much cache prefetching).

It's been about a year since I realized the impact of a JIT-like query compiler for these kinds of join-project queries, but needing to balance IOPS, vectorization, and likely even applying (vectorized) B-tree tactics within a 4k page, on top of the weirdness from multi-temporal (not just bi-temporal) delta-based records (~change-data-capture stream; needs to be integrated/materialized to get a point-in-time view)... sounds like a recipe for logic bugs/off-by-one errors.

They are already hard-to-impossible to notice if it'd be used in production, it being a code generator allows situations with only edge case queries having data-dependent edge case bugs, and the addition of dynamic/adaptive IO concurrency makes reproducing/debugging a detected error next to impossible.

I wouldn't dare to trust the code generator I'd write, if not formally verified. Not because I don't have faith in my skills, but because it's extremely complex code that has to be fast and fall out of a JIT and it's nature makes off-by-one errors in index access unusually likely. And debugging a known error might well be harder than formally verifying to find the bug.

[0]: https://news.ycombinator.com/item?id=25867693 https://news.ycombinator.com/item?id=27512224

TL;DR: There is code that would be useful if written, but is so complex and neigh-impossible to debug that formally verifying it might be easier than debugging it to "production-grade". I hope this project makes that kind of approach practical for those engineers who could write the code and make it pass integration tests.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: