> Importantly, we designed Styrolite with full awareness that Linux namespaces w...

klysm · 2025-03-27T11:16:16 1743074176

Say “it’s probably fine” and hope that the people building the foundational systems are protecting us

Joker_vD · 2025-03-27T11:46:46 1743076006

No, I mean, what do the Edera developers do differently, in order to provide more robust foundation with this new container runtime called Styrolite? They still use Linux namespaces, as far as I can tell from TFA.

denhamparry · 2025-03-27T12:25:04 1743078304

Edera developer here, we use Styrolite to run containers with Edera Protect. Edera Protect creates Zones to isolate processes from other Zones so that if someone were to break out of a container, they'd only see the zone processes. Not the host operating system or the hardware on the machine. The key difference here between us and other isolation implementations is that there is no performance degradation, you don't have to rebuild your container images, and that we don't require specific hardware (e.g. you can run Edera Protect on bare metal or on public cloud instances and everything else in-between).

xmodem · 2025-03-27T13:54:07 1743083647

What underlying primitives are you relying on to provide isolation, if not linux namespaces?

How does your approach compare to Google's gVisor?

asmor · 2025-03-27T18:08:59 1743098939

It's Xen, and they even explain why it's not KVM here: https://github.com/edera-dev/krata/blob/main/FAQ.md

sys_call · 2025-03-27T14:34:39 1743086079

gVisor emulates a kernel in userspace, providing some isolation but still relying on a shared host kernel. The recent Nvidia GPU container toolkit vulnerability was able to privilege escalate and container escape to the host because of a shared inode.

Styrolite runs containers in a fully isolated virtual machine guest with its own, non-shared kernel, isolated from the host kernel. Styrolite doesn't run a userspace kernel that traps syscalls; it runs a type 1 hypervisor for better performance and security. You can read more in our whitepaper: http://arxiv.org/abs/2501.04580

xmodem · 2025-03-27T15:36:37 1743089797

Thanks for the explanation. So you are using virtualisation-based techniques. I had incorrectly inferred from other comments that you were not.

I skimmed the paper and it suggests your hypervisor can work without CPU-based virtualisation support - that's pretty neat.

Many cloud environments do not have support for nested virtualisation extensions available (and also it tends to suck, so you shouldn't use it for production even if it is available). So there aren't many good options for running containers from different security domains on the same cloud instance. gVisor has been my go-to for that up until now. I will be sure to give this a shot!

0x1ceb00da · 2025-03-27T16:18:01 1743092281

So it's a lightweight way of running docker images inside a virtual machine?

sys_call · 2025-03-27T20:18:28 1743106708

Yes, precisely. This also provides container operators with the benefits of a hypervisor, like memory ballooning, and dynamically allocating CPU and memory to workloads, improving resource utilization and the current node overprovisioning patterns.

klysm · 2025-03-27T18:21:20 1743099680

So it’s a VM?

znpy · 2025-03-27T13:10:22 1743081022

> Edera Protect creates Zones to isolate processes from other Zones

What do you mean by "zone" exactly?

sys_call · 2025-03-27T14:43:33 1743086613

A zone is jargon for a virtual machine guest environment (an homage to Solaris Zones). Styrolite and Edera runs containers inside virtual machine guests for improved isolation and resource management.

znpy · 2025-03-27T22:18:04 1743113884

> an homage to Solaris Zones

i asked specifically because the word "zones" reminded me of solaris zones :)

> Styrolite and Edera runs containers inside virtual machine guests for improved isolation and resource management.

do your have your own vmm or is it firecracker with make up and a wig?

klysm · 2025-03-27T18:21:45 1743099705

How exactly is this an improvement over VMs?

sys_call · 2025-03-27T20:19:31 1743106771

We run unmodified containers in a VM guest environment, so you get the developer ergonomics of containers with the security and hardware controls of a VMM.

flkenosad · 2025-03-27T14:00:32 1743084032

Anyone know if it's possible to update the Linux kernel so that namespaces are hard security boundaries? I wonder what that would entail.

eyberg · 2025-03-27T15:12:04 1743088324

When we speak of 'hard security boundaries' most people, in this space, are comparing to existing hardware backed isolation such as virtual machines. There are many container escapes each year because the chunk of api that they are required to cover is so large but more importantly it doesn't have isolation at the cpu level (eg: intel vt-x such as VMREAD, VMWRITE, VMLAUNCH, VMXOFF, VMXON).

This is what the entire public cloud is built on. You don't really read articles that often where someone is talking about breaking vm isolation on AWS and spying on the other tenants on the server.

vaylian · 2025-03-27T19:03:15 1743102195

> There are many container escapes each year because the chunk of api that they are required to cover is so large

What API? The kernel syscall API?

If we assume for a moment, that there are no bugs in the Linux namespace implementation, would containers be as safe as virtual machines?

eyberg · 2025-03-27T19:08:33 1743102513

No. As I'm responding to this Qualys just announced three new bypasses as of today: https://seclists.org/oss-sec/2025/q1/253 .

vaylian · 2025-03-27T19:39:11 1743104351

Sorry, can you elaborate? Your answer is not really clear. Why is it not possible for Linux namespaces to be secure?

flaminHotSpeedo · 2025-03-27T15:48:22 1743090502

> This is what the entire public cloud is built on.

Well... The entire public cloud except Azure. They've been caught multiple times for vulnerabilities stemming from the lack of hardware backed isolation between tenants.

richardwhiuk · 2025-03-27T16:50:35 1743094235

Azure has the same level of isolation for VMs at a hardware level as AWS.

flaminHotSpeedo · 2025-03-27T18:43:59 1743101039

How Azure isolates VM's is completely unrelated, because containers are not VM's. And if you meant to assert that Azure uses hardware assisted isolation between tenants in general, that was not the case for azurescape [1] or chaosDB [2].

[1] https://unit42.paloaltonetworks.com/azure-container-instance...

[2] https://www.wiz.io/blog/chaosdb-explained-azures-cosmos-db-v...

richardwhiuk · 2025-03-28T13:18:19 1743167899

It is the case for VMs that customers create.

It hasn't always been the case for manged services, but I don't think that's true for AWS either.

flaminHotSpeedo · 2025-03-28T14:57:35 1743173855

Unmanaged VM's created directly by customers still aren't relevant to this discussion. The whole point here is that everyone else uses some form of hardware assisted isolation between tenants, even in managed services that vend containers or other higher order compute primitives (i.e. Lambda, Cloud Functions, and hosted notebooks/shells).

Between first and second hand experience I can confidently say that, at a bare minimum, the majority of managed services at AWS, GCP, and even OCI use VM's to isolate tenant workloads. Not sure about OCI, but at least in GCP and AWS, security teams that review your service will assume that customers will break out of containers no matter how the container capabilities/permissions/configs are locked down.

GardenLetter27 · 2025-03-27T14:44:37 1743086677

A lot of use cases don't want that though. It's nice having lightweight network namespaces for example, just to separate the network stack for tunneling but still have X and Wayland working fine with the applications running there.

fulafel · 2025-03-28T06:16:28 1743142588

Have a look at gVisor for one approach.

z3t4 · 2025-03-27T11:47:50 1743076070

Once you have set up the namespaces you drop all capabilities so if the program gets hacked while it's running it can do very little.

denhamparry · 2025-03-27T13:00:40 1743080440

Edera developer here. I agree! But there are instances we need to run with additional capabilities, and we’re also dependent on people knowing how to do the right thing. We’re trying to improve this by setting this by default, but also improving the overall performance and efficiency of running containers

znpy · 2025-03-27T13:08:50 1743080930

honest question: how is this any better than running non-root containers?

They can do very little anyway, that way.

sys_call · 2025-03-27T14:30:12 1743085812

Non-root containers still operate under a shared kernel. Non-root containers that run under a vulnerable kernel can lead to privilege escalation and container escapes.

Styrolite is a container runtime engine that runs containers in a virtual machine guest environment with no shared kernel state. It uses a type 1 hypervisor to fully isolate a running container from the node and other containers. It's similar to Firecracker or Kata containers, but doesn't require bare metal instances (runs on standard EC2, etc) and utilizes paravirtualization.