" it is a published, peer reviewed, 2005 proof, and a rather simple one."
Proofs can be wrong or irrelevant in practice. An example is not being able to prove the tetmination of executing arbitrary programs. Sure I can: it's called a watch-dog timer. So, Im wanting it peer reviewed. Plus, it's critical to know for sure that and correctness composition to determine what to invest in for software assurance R&D.
"Nevertheless, it should be noted that no large or even medium-sized software has ever been fully verified, so I don't think anyone believes the problem is easy."
Never easy. :) However, I still think we should consider counting the deep verifications. You keep saying they havent went upward to verify a program of certain size as if it means something. I know only one that tried (Verisoft for Hyper-V). Most went downward toward the gate level including Verisoft where Microsoft's was just a side item. Verisoft did app correctness to some degree, down to C code, its compiler, the OS, the microkernel, and CPU tying it all together. DeepSpec just tied piles of high-level and low-level code together. So, such methods should allow us to achieve the same amount of complexity going upward using same abstraction and shortcut techniques. We havent done that much upward because they just been working the other way.
Now, what they'll ultimately achieve may or may not be that good. Best approach might be to throw DeepSpec's DSL's and tooling at a large, complex application whose code is FOSS and accessible. I suggest an application server with compression, encryption, caching, and error handling as a good way to stress that project from multiple angles.
> So, such methods should allow us to achieve the same amount of complexity going upward using same abstraction and shortcut techniques.
First, I think you're misinterpreting (and exaggerating) what has actually been achieved. Looking at Hyper-V[1], 30KLOC of (preexisting and tested) C code have been verified in a year (out of a total of just 100KLOC), but only function-by-function, rather than global correctness properties. They wrote some global invariants, but didn't verify them. As I understand, such similar verification of various layers has been done, but no global end-to-end work. Whether low-level or high-level, there has been no composition across abstraction layers.
Second, such methods would not allow us to go "upward", because there's a reason why people concentrate on the lower level. The state space is smaller there.
Finally, I think you're also misinterpreting DeepSpec's work. It is at the forefront of verification of a certain kind, but if you look at their papers, when they talk about end-to-end verification, they're discussing programs that are 3KLOC C. These programs are smaller than many JavaScript libraries. Also, Coq is not really a tool engineers can put to everyday use. This is what Simon Pyton-Jones, one of Haskell's designers, has to say about Coq and Isabelle: https://youtu.be/re96UgMk6GQ?t=50m56s
Formal methods are starting to work, but we are roughly in the same stage as aviation in WWI, while you are talking about "using the same methods" to get a manned craft to Mars. We may well get there, but it will take a while, and the methods will not be exactly the same.
"The theorem states that you cannot prove a program's termination any faster than running it. You're not violating any theorems."
Someone explained it poorly to me then. Sounds like I'm still countering it, though, since my program that terminates it can move faster than the program it's terminating. I can have an infinite loop load and begin with a downward counter interrupting it one or more cycles after it begins. I'm still using two programs and context change to mitigate the artificial problem posed by the theorem using just one program looking at another.
"But it has been peer reviewed. "
How many people looked at it? For how long? And what diversity of backgrounds? Peer review of something critical to me doesn't mean people that look at theorems look at it and say "Good one. Done now." It's a social process that should involve mathematically inclined, people who do applied math in that field, and non-mathematical people in same ___domain with a track record of breaking proofs in practice. On the latter, there's been a number of breaks in cryptography because the oversimplified, abstract, elegant proof missed key details. Peter Guttman also had a nice rebuttal of reliability of formal methods showing all the times formal specs and proofs were incorrect with it taking years or so for people to spot the errors or disprove it by executing a program.
So, I have no doubt it looks good to mathematicians. I just want more effort on it from aforementioned kinds of people before I rely on it. Preferably a few man-years of people trying to solidify or counter it.
"rather than global correctness properties. They wrote some global invariants, but didn't verify them."
See, this is where being a specialist can help. I'm a bit unclear here. Can you get implicit, global correctness if you decomposed your app into pieces who are each locally correct? And interface conditions of function calls are correct? Or is that inadequate to you? The literature was unclear to me on this point.
Note: Three functions per day? Maybe we need to apply these tools to deeply-embedded 8-bitters with bytes of RAM or ROM. Then, a programmer doing full-verified work will feel really productive after "filling up the hardware" in a month or two. First time in verification history for someone to feel that productive. ;)
"It is at the forefront of verification of a certain kind, but if you look at their papers, when they talk about end-to-end verification, they're discussing programs that are 3KLOC C. "
Oh yeah, they're small. The reason I bring them up is that they're at the forefront of composing pieces. I figured you might look at what they're doing on absraction and integration of proof to get a better idea of the composing correctness aspect of your research. Then others and I might learn from your observations.
"This is what Simon Pyton-Jones, one of Haskell's designers, has to say about Coq and Isabelle"
You're just not the guy to have at a verification party, Ron. You stay trying to depress us. :P I appreciate the link, though, since that's guy is a hilarious speaker. He says they start learning Coq, disappear for two years, get all their pain receptors destroyed, and emerge (paraphrased) as zealots bringing the Gospel of Coq to all programs. Lmao. I've seen this disappearance or pain process. It's why I'm still ignorant of the details due to not digging into the full, learning process. Life is already rough enough right now haha.
"but we are roughly in the same stage as aviation in WWI, while you are talking about "using the same methods" to get a manned craft to Mars."
Nah, I'm talking about doing a few, key components of the craft with formal methods with the rest build with lightweight, formal specifications, model-checking, test generation, and esp human review. Recovery procedures simple and verified formally if possible. That's my assurance recommendations for your Mars mission. I do like your WWI aviation comparison. It also fits with my style of figuring out just how much I can cheat around the limitations to kick ass. Within your metaphor, the "Biafran Babies" is something that could've been done with WW1 avaiation.
If you think you could so easily circumvent mathematical theorems, why are you such a believer in formal methods? I don't understand exactly what your method is, but I can assure you you're not circumventing the halting theorem. There's a very precise definition of what it means to be faster. The halting theorem poses no artificial constraints on the problem; it's an extremely general theorem, and it certainly doesn't say how the decision must be made. You're perfectly allowed to serve as an OS and simulate the input program if you like, but that's not going to help.
> How many people looked at it? For how long? And what diversity of backgrounds?
The proof is very simple, and very similar to the proof that a system of cellular automata is Turing-complete: it points out that any nondeterministic Turing machine with a tape of length k cells can be simulated by k finite state machines. If it were possible to verify each component separately and combine the result efficiently, then any such TM could be verified, yet it is well known that it cannot be. The paper was written by well-respected academics, published in a respected journal, and has over twenty citations.
Now, the reason our response should be, "so verification is as hard as we thought", rather than "this is the end of the line", is because, like all results in computational complexity, it talks about the general case. It is possible that a large enough class of programs that are of interest are special cases that are easier to solve.
> Peter Guttman also had a nice rebuttal of reliability of formal methods showing all the times formal specs and proofs were incorrect with it taking years or so for people to spot the errors or disprove it by executing a program.
I don't know what you're referring to, but I'm not aware of any cases where mechanically checked proofs or model checker results have ever been found wrong. The spec can very well be wrong, but you're the one arguing the infallibility of formal methods.
> Preferably a few man-years of people trying to solidify or counter it.
No one is going to try and counter a mathematical theorem, with a very simple, well-understood proof.
> Can you get implicit, global correctness if you decomposed your app into pieces who are each locally correct?
I'm not sure what you mean by "implicit". A formal spec requires a formal proposition to check. Things can be true without us proving them, but formal verification is about verification, not about truth. Theoretically, programs could be correct without verifying or testing them in any way. As to building up a global correctness property end-to-end (from very low-level code and up) is possible by very hard, and no one has ever been able to do it for even medium-sized programs. Obviously, working in a modular way is how people do it; that's how both math and software works. Yet, it's very, very hard.
> Then others and I might learn from your observations.
I think you overestimate my abilities. I'm not an academic. Just a programmer with some experience using formal methods.
> You stay trying to depress us.
This is where you're wrong. Formal methods -- like aviation -- are awesome! But let's enjoy them for what they are, not what we imagine them to be. If you see someone speaking of formal verification as if it were a solved problem, and writing fully correct programs is just a matter of learning Idris, you can know one thing for certain: they've never actually used formal methods on non-trivial real-world programs. The FM experts are those lowering expectations the most. Watch some talks by Xavier Leroy.
> formal specifications, model-checking, test generation
All of these are formal methods (the last only if there's a formal contract).
Proofs can be wrong or irrelevant in practice. An example is not being able to prove the tetmination of executing arbitrary programs. Sure I can: it's called a watch-dog timer. So, Im wanting it peer reviewed. Plus, it's critical to know for sure that and correctness composition to determine what to invest in for software assurance R&D.
"Nevertheless, it should be noted that no large or even medium-sized software has ever been fully verified, so I don't think anyone believes the problem is easy."
Never easy. :) However, I still think we should consider counting the deep verifications. You keep saying they havent went upward to verify a program of certain size as if it means something. I know only one that tried (Verisoft for Hyper-V). Most went downward toward the gate level including Verisoft where Microsoft's was just a side item. Verisoft did app correctness to some degree, down to C code, its compiler, the OS, the microkernel, and CPU tying it all together. DeepSpec just tied piles of high-level and low-level code together. So, such methods should allow us to achieve the same amount of complexity going upward using same abstraction and shortcut techniques. We havent done that much upward because they just been working the other way.
Now, what they'll ultimately achieve may or may not be that good. Best approach might be to throw DeepSpec's DSL's and tooling at a large, complex application whose code is FOSS and accessible. I suggest an application server with compression, encryption, caching, and error handling as a good way to stress that project from multiple angles.