In practice, a medium or high assurance evaluation would also require testing of about everything, covert-channel analysis to spot leaks, verification the object code matches the source, and independent team re-running those with a pentest on top. INTEGRITY-178B and VxWorks MILS are separation kernels that are finished with or in evaluation with all that data. seL4 has strongest formal methods on code and assembly but other stuff is still being done at R&D lab with no evaluation. Both it and Muen were integrated into Genode OS framework with all three needing more review, pentestimg, and improvements.
Regarding the arxiv link on verification methods, I'd be careful reading too much into the summary tables. The authors don't appear to have done their homework there (they overstate the C&A accomplishments of the company I work at and our competitors).
On an unrelated note, I'm pleasantly surprised to see a tech report I contributed to as an undergrad cited - How on earth did they find it?
It's worth noting. However, I can't be too harsh about it since a number of vendors were constantly bullshiting me when I was trying to get to the bottom of what they did. I had to dig deep into protection profiles, BS marketing, comments online, interviews, and so on. I also called out Green Hills through proxies about a fake, "independent" blog they set up basically to bash Lynx and talking like INTEGRITY (not INTEGRITY-178B) was EAL6+. I ended up telling people to watch out for politics with evaluators for the proprietary companies then straight up read the research reports or source + docs for the rest. It ain't quite what it was under TCSEC which had more technical stuff and less politics where technical requirements where main form of political argument back then.
Far as your paper, I'd guess they did what I often do by typing in common keywords into search engines. "Formal," "security" "policy" were three that were in about any certified project. On the policy side, the words "policy" and "model[ing]" were used interchangeably here and there. I got a lot of results with these in a few, quick Googles although not yours. I imagine your paper was in a search or a citation of something with similar terms. Just speculating, though. Cool that you got to see your work surface on HN front page, though. :)
Yeah, dealing with vendors can be frustrating, for sure.
I'm still a little sad about the I-178 eval, since a lot of the work I saw done by RC was quite good, if limited in scope. Seeing The vendor (imo) misrepresent that effort to sell other software was rather disappointing to someone who had (at the time) just spent a couple years jumping on the formal methods bandwagon.
I'm also not particularly happy with the trend towards process, rather than results based evals. (DO-178, and CSFC, I'm looking at you), but I can see how we got here after the EAL7 well was poisoned.
"I'm still a little sad about the I-178 eval, since a lot of the work I saw done by RC was quite good, if limited in scope."
I still give them credit for doing what they could for a robust, usable piece of software with some kind of evaluation. Hopefully fixing whatever problems were noted. Their page on it is also something I've shared plenty of times because some of the design elements are really good. I especially like how each process donates their own CPU and memory to accomplish kernel calls. Maybe that's bypassable but it seems like a good default. The list of certification artifacts on bottom-right also [mostly] detail the assurance activities required in secure and certified systems. I poke at FOSS projects that claim security saying, "Which of these did you not do and why?" Usually they say "a covert-what analysis?" ;)
The evaluations have become too political. Under Walker's initiative, they didn't play or make toothless threats. IBM themselves were first to be shown the door when they made a C1-class product instead of C2. STOP OS (in XTS-400), GEMSOS, SNS Server, and KeyKOS were some of most secure OS's market ever produced. That was partly due to features (current focus) but mostly due to assurance activities the predecessors of EAL6/7 mandated to systematically add quality or security to every aspect of the lifecycle. Although modern stuff is political, the assurance methods work in both EAL6/7 and DO-178B to measurably improve safety/security if applied for real. I wrote an essay on how to improve security certification to learn from prior regimes with new one focusing on assurance activities of various components (eg TOE's) with a compositional rating added and no politics allowed. Here it is:
Far as formal methods, there are recent developments that make it a lot easier for people to do on the low end. For one, TLA+ is making waves after the Amazon report on internal use. hwayne is making a nice tutorial using the PlusCal subset here:
One problem with full verification is they often try to do everything in one tool. I've long encouraged splitting it up among tools picking each best for the job. DeepSpec w/ CertiKOS is doing that with what looks like a DSL style of formal methods. COGENT let one team do an ext2 filesystem certified to low-level code. One I found interesting given all the work with the B method was E-SPARK where they did formal verification of specs in Event-B whose verification conditions were fed into SPARK to prove it at code level. We need more of that. Far as SPARK itself, Carlisle built IRONSIDES DNS in SPARK.
My concept is using a macro-supporting, imperative language to write the application w/ formal specs for everything. Then, this is automatically translated to Rust, C, Ada2012/SPARK2014, and SPIN/TLA+ to let the code get hit by all tooling I can find for all of them in static/dynamic analysis, SPARK's prover, and test generation. Any problems found are reinterpreted in terms of the original code for developer to check if false alarm or actual problem. Concept being high-assurance labor is expensive but push-button tools that just point to severe flaws might go a long way. They can even develop and use minor tests to build plenty of features during the day with background processes or build cluster running the verification process incrementally with some tracker notifying developers of when ready are ready to look at.
I call it Brute-Force Assurance since it basically throws cheap labor and CPU's at the problem. What do you think of this concept for medium-assurance security leaning toward high at least for stopping code injections?
Thanks for the references - I'll look into them, I'm definitely interested in multi-tool workflows given my experience (systems programmers aren't necessarily mathematicians). If only the systems programmer could be pointed at tools that don't have horrible false positive rates ... ;)
On that note, have you or anyone else you've known gotten to use Astree Analyzer? Ive been wanting to hear from field users if it's as good as they claim on bug elimination and false positives.
On the last conference held by the Chaos Computer Club in Germany, there was a talk about a covert side channel[0] using - if I understood and remembered that correctly - cache timings.
The researchers that gave the talk claimed that they could establish a very slow but practically undetectable communication channel between two VMs running on the same host.[1]
Since the introduction to Muen explicitly mentions defense against covert channels, could somebody explain to me if this system would be able to prevent such communication? And if so, is it possible to explain the mechanism to somebody like me who knows basically nothing about these things?
[1] Again, if I understood that talk more or less correctly (probably less), undetectable refers both to the OS running in the VM and to the hypervisor.
I can't speak for Muen but I can a bit for this problem area. This attack has been known for a long time. It's why physical separation and CPU's with separate caches were used in high-assurance security. Anything shared is a potential, covert channel. Timing channels are the hardest to mitigate so share nothing if you're worried about that, right? Well, some sharing is needed so what to do.
1. Since security kernels of the old days, any shared stuff is supposed to be cleared before the next thing uses it. That principle here would be register overwriting and a cache flush when a separation kernel switches between partitions. It's called "periods processing" in the literature.
2. If no shared cache, then we can keep Security Domain A running on Core A, Security Domain B on Core B, etc. There can be multiple, concurrent VM's on each core so long as they are in same security classification (trust each other). Take the penalty of separation when it matters.
3. Partitioning and/or real-time caches. In real-time caches, you lock the cache for determinism. That determinism might be used to aid covert, channel mitigation. The other mechanism is stronger whereby you design the cache itself to know mutually-distrusting processes are using it. A combo of the CPU, cache, and/or separation kernel might wall off parts of it, erase parts of it, replace parts of it, and so on. There's quite a few prototypes in CompSci of this. Maybe some commercial implementations but my brain is too tired to remember.
4. Asynchronous, randomized execution to throw off timing. I came up with this one independently studying the problem probably with inspiration from others randomizing timing in asynchronous way. More importantly, another person did the same concept implemented on a prototype CPU.
5. Run with no cache. Very painful. Barely a solution. It's been done, though.
I'm sure more of this can be combined in interesting ways with randomized storage in caches w/ randomized timing and/or that controlled by separation kernels. It's mostly a hardware problem, though, like software safety against code injection. Most in high-assurance have moved on from separation kernels to other tech, esp modified CPU's, to solve the problem bottom-up like Burroughs B5000 did for some things. Last efforts I looked into were Cambridge CHERI processor w/ CheriBSD, SAFE processor at crash-safe.org, and Watchdog[Lite]. Gotta add cache protections, IO/MMU, RAM cryptosystem, TRNG, and so on.
Wrt "run with no cache": is a shared DRAM that much better than a shared cache? The DRAM state is affected by what previously accessed it, in fact DRAM effectively has an SRAM on-chip "cache" keeping the current active row per bank. ("Cache" in quotes because it's a bit crummy if you think of it as a cache - there's just one way, #lines is 4-8 and their size is measured in kilobytes - but technically it's a cache.)
Far as DRAM state, the separation kernel keeps the state of one process separate from another. Harder to do with a cache without defeating the performance-enhancing purpose of a cache.
"in fact DRAM effectively has an SRAM on-chip "cache" keeping the current active row per bank."
Had no idea. That could be a problem. See why we systems people defaulted on physical separation during most of the Moore's Law advances? Never know what hardware issue will pop up.
> PCI device passthrough using Intel VT-d (DMAR and IR)
I wonder how they formally verify that. DMAR is incredibly complicated. It's not even clear to me that malicious firmware on a PCIe device can be contained by DMAR, even assuming a perfect chipset and kernel.
They might be using a different definition of formally verify than you think. SPARK is one of the "lightweight" methods that tries to automate a lot of stuff with the rest hand-proven. They might just mean it's formally verified to not have all the common code injections since SPARK can do that with a good deal automated vs manual proof. Full correctness of that operation would be something different they might need a proof assistant for or considerable effort.
An example of one doing full, formal verification of all kinds of components is CertiKOS at DeepSpec:
They have 32-bit through Genode OS Framework and VirtualBox. That might mean they got it through what Genode project was doing with full virtualization as indicated here:
Genode/Muen might just not be able to handle 64-bit yet. The Genode versions are also a bit behind since it's a small number of people doing the upgrades on components ranging from custom to common.
I suspect they use virtio, instead of the Hyper-V interfaces; though that would still allow you to use VirtualBox I think. I'm interested to know as well. It could just be a subtlety/assumption of their diagrams.
http://os.inf.tu-dresden.de/papers_ps/nizza.pdf
Survey in 2017 of most separation kernels with verification methods:
https://arxiv.org/pdf/1701.01535.pdf
This one is a medium-assurance kernel that tries to knock out most errors or injections using SPARK Ada.
https://en.m.wikipedia.org/wiki/SPARK_(programming_language)
In practice, a medium or high assurance evaluation would also require testing of about everything, covert-channel analysis to spot leaks, verification the object code matches the source, and independent team re-running those with a pentest on top. INTEGRITY-178B and VxWorks MILS are separation kernels that are finished with or in evaluation with all that data. seL4 has strongest formal methods on code and assembly but other stuff is still being done at R&D lab with no evaluation. Both it and Muen were integrated into Genode OS framework with all three needing more review, pentestimg, and improvements.