Hacker News new | past | comments | ask | show | jobs | submit login
The first Oxide rack being prepared for customer shipment (hachyderm.io)
296 points by jclulow on July 1, 2023 | hide | past | favorite | 195 comments



Oxide has been discussed on HN a bunch over the last 3+ years (e.g., [0][1][2][3][4][5][6][7]), and while nothing is without its detractors, we have found on balance this community to be extraordinarily supportive of our outlandishly ambitious project -- thank you!

[0] When we started: https://news.ycombinator.com/item?id=21682360

[1] On the Changelog podcast: https://news.ycombinator.com/item?id=32037207

[2] On our embedded Rust OS, Hubris: https://news.ycombinator.com/item?id=29468969

[3] On running Hubris on the PineTime: https://news.ycombinator.com/item?id=30828884

[4] On compliance: https://news.ycombinator.com/item?id=34730337

[5] On our approach to rack-scale networking: https://news.ycombinator.com/item?id=34976444

[6] On our de novo hypervisor, Propolis: https://news.ycombinator.com/item?id=30671447

[7] On our boot model (and the elimination of the BIOS): https://news.ycombinator.com/item?id=33145411


I highly recommend folks check out the Oxide and Friends calls on discord, usually on Mondays 5pm PST. More info: https://oxide.computer/podcasts/oxide-and-friends

Disclaimer: I am a fan, not affiliated.


Wishing you best of luck.

Really curious to see if in 2023, Engineered Systems still have a market in this world of commodity cloud hardware.


Can you share how software engineers can bring hardware to market? Fabrication, logistics, manufacturing. What should be outsourced? Who did you partner with for what?

Also, congrats.


They have discussed that on their podcast: https://www.youtube.com/@oxidecomputercompany4540


Congrats to the entire team. Being a wayward hardware guy somewhere in telcomland that has followed your progress throughout the years (and having listened to pretty much every oxide podcast), I am genuinely happy for you all.


Hubris, Humility and Propolis are open source. Is anyone else using them?


Could be interesting to port Hubris to x86_64 and use it on bare metal commodity hardware -- specifically to run Rust server code, with perhaps a perf advantage over larger OSes?

Though the biggest advantage to Rust-on-bare-metal would have been running all the code in kernel space, which it seems Tasks don't allow for (in order to provide robustness at the OS level).


Hi, I saw the Oxide Twitter account mention you’d be answering some questions from this thread on this evenings podcast so I hope this gets seen in time. Though firstly I should say congratulations!

Q: Are you shipping Milan racks and if so what does the moving to Genoa entail for Oxide?


Yes, the rack is Milan-based (7713P) -- and good question on Genoa; we'll get into that tomorrow!


Be the outlandish you wish to see in the world. — Not Gandhi


The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.

-George Bernard Shaw


The world is an impersonal euphemism for human culture. All things apart from physics and human nature are dentable.


I'm not even so sure about that last one.


Do you have any liquid cooling solutions collaborators?


Its not liquid cooled.


[flagged]


Feedback is helpful, but this comment seems snide just because OPs comment wasn’t perfectly relevant to you. There’s no need for that, it’s relevant to others.


Please don't encourage lazyweb. Plus, this is already being discussed downthread.


Please learn to use google. It will take as much time as this snide comment took to write.


I am extremely proud of everyone at Oxide. It’s been a fantastic place to work, and finally getting to this point feels awesome. Of course, there is so much more to do…

For funsies, it’s neat to look back at the original announcement and discussion from four years ago: https://news.ycombinator.com/item?id=21682360


As someone who's worked on on-prem infra automation pretty much my entire career, I'm rooting for you.


Congrats to the team, but after hearing about Oxide for literal years since the beginning of the company and repeatedly reading different iterations of their landing page, I still don't know what their product actually is. It's a hypervisor host? Maybe? So I can host VMs on it? And a network switch? So I can....switch stuff?


It's AWS Outposts Rack without the AWS connection. That is, you get a turnkey rack with the servers, networking, hypervisor, and support software preconfigured. You plug it into power and network and it's ready to run your VMs.

Outposts, too, started with a full-rack configuration only, but they eventually introduced an individual server configuration as well. It'll be interesting to see if Oxide eventually decides to serve the smaller market that doesn't have the scale for whole-rack-at-a-time.


So what was the precursor solution to this before AWS?


The traditional enterprise option is to call up Dell and order a pre-validated VMware cluster. This will be a specific configuration of Dell hardware and VMware software that has been tested together, although it was not developed together.


A mainframe, usually acquired by calling up IBM/Unisys/etc. and writing a big cheque.


dHCI.


It is a rack of servers, but, every aspect of it is supposed to be engineered to include the full list of things that are needed to make a rack of servers a useful VM hosting setup. So it includes the networking, connection to the service processors which allow you to remotely access each server, other management things, etc.

Once installed, you plug in the network connection(s) and add power, then boot up. Then you can add your VMs and start running them.


Except that you cut out all the fat of traditional servers. These aren't COTS servers. Imagine if somebody re-engineered an entire rack in the same way they once introduced blade computers. Any part that can be moved from standalone to rackwide should be cut out of individual shelves. So I would expect things like serial buses, video, certain power aspects, etc to all be removed from the individual shelves.

When you cut out all that fat, you can fit a lot more muscle in its place. Then you can arrange everything on a rack scale for airflow, power, redundancy, etc.

The obvious downside is that you are locked into a single supplier for everything in the rack. Somebody please add a link here for Oxide's strategy to minimize this risk for their customers. (And feel free to correct any mistaken over-simplification.) Overall, they have seemed incredibly open to sharing in a way that allows others to play. That might just be because they would die if they didn't, but I think a lot of people are willing to accept that they are pretty all-around awesome folks trying to make the world better. And making money in the process is not a bad gig. We'll see which other vendors jump into the pool.

I wish them great success. But I'm wondering why they haven't coordinated my shipping address for their first shipment. First one's free, right?


> When you cut out all that fat, you can fit a lot more muscle in its place. Then you can arrange everything on a rack scale for airflow, power, redundancy, etc.

This is not really true. Data center servers are highly optimized for density already. Like tens of thousands of hours on airflow CFD, tweaking cases and fans and internal layout, heatsink design, etc. chasing the 0.1%s. There are many significant tradeoffs to be made, but there is not a large factor improvement just sitting on the shelf due to a bunch of unused legacy leftovers.


So, we have found that that really IS correct, though perhaps not in the ways you might think. For example, a major problem with servers today is the accepted geometry: 1U or 2U x 19". In order to drive density, you are either doing dual socket in 2U or single socket in 1U -- but to make that geometry work you have small fans that really have to crank fans to ram air through. (And have we mentioned that the AC power supplies also need fans?). These little screamers are acoustically distasteful, but it's worse than that: they have to do much more work (and draw more power!) to move a fraction of the air because they are so small (air movement is ~cubic with respect to diameter). By changing the enclosure geometry (our sleds are 100mm high) we have much larger fans (80mm). The upshot? Our fans move SO much more air that we can run them much more slowly (we worked with our fan provider to drop the RPM at 0% PWM from 5000 RPM to 2000 RPM) -- which means our rack is not only silent by comparison, it means the power in the rack is going to compute rather than fans.

So, this isn't really chasing the 0.1% -- it's chasing much bigger wins, and it's doing it by changing the power distribution (shelf of rectifiers to a 54V DC busbar vs. redundant AC power supplies in every 1U/2U), changing the cooling (the larger fans), changing the networking (we have a blindmated cabled backplane, obviating the need for operator cabling), etc. etc. These are not just multipliers on efficiency, but also on manageability: once physically installed, our rack is designed to be provisioning VMs orders of magnitude faster than the traditional manual rack/stack/cable/SW install.


> So, we have found that that really IS correct, though perhaps not in the ways you might think.

I was replying more to the idea there's all this costly legacy IO.

I guess you aren't the first to try different geometries or power delivery or cabling either. There's been lots of little opencompute-type efforts and startups come and go. I'm skeptical there's a lot in it in a significant niche that does not already do these things, but you don't need to convince me. Although if you did want to you could show some comparative density numbers. What can you fit, 2048 cores, 32TB, and push 15kW through a rack?


Yeah: 32 AMD 7713P (64 cores/128 threads), so 2048 cores/4096 threads, 32 TB of DRAM, 1PB of NVMe, with 2x100GbE to dual, redundant 6.4Tb/s switches -- all in 15 kW. In terms of other efforts, there have certainly been some industry initiatives (OCP, Open19), but no startups that I'm aware (the smaller companies in the space have historically been integrators rather than doing their own de novo designs -- and they don't do/haven't done their own software at all); is there one in particular you're thinking of?


Well the first OCP specification more than 10 years ago specced 13kW per rack. ORv3 is up to 30kW now, some vendors are pushing 40 and more. So maybe I'm missing something, didn't really seem like density was a major point of difference there.

And not one in particular, there's just a bunch that have sprung up around OCP over the past decade. None that I'm aware of that are doing everything that Oxide does, but we were talking more about the mechanical, electrical, and cooling side of it there -- they do seem to do okay with power density.


To be clear, the problem is in how the power budget is being spent (most enterprise DCs don't even have 15 kW going to the rack). The question on density is: how can you get the most compute elements into the space/power that you've got, and cramming towards highest possible density (i.e., 1U) actually decreases density because you spend so much of that power budget cranking fans and ramming air. And the challenge with OCP is: those systems aren't for sale (we tried to buy one!) and even if you got one, it has no software.


> To be clear, the problem is in how the power budget is being spent (most enterprise DCs don't even have 15 kW going to the rack).

I thought you were going for cloud DCs rather than enterprise. Seems like a big uphill battle to get software certified to run on your platform. Are any of the major Linux distros or Windows certified to run on your hypervisor platform? Any ISVs?

> The question on density is: how can you get the most compute elements into the space/power that you've got, and cramming towards highest possible density (i.e., 1U) actually decreases density because you spend so much of that power budget cranking fans and ramming air.

The question really is how much compute power, and electrical/thermal power ~= compute power. Sure you could fit more CPUs and run them at a lower frequency or duty cycle.

> And the challenge with OCP is: those systems aren't for sale (we tried to buy one!) and even if you got one, it has no software.

OCP is a set of standards. They certainly are systems sold. I guess the nature of the beast is that buyers of one probably don't get taken very seriously, particularly not a competitor.


> electrical/thermal/support power ~= compute power.

Yes, and using larger fans decreases the proportional there. Centralized rectifiers reduces the proportional. Google can make it all the way down to 10% overhead power. That's the point they are making.


Yes, and the question I am asking is how much better are they doing than competition there. For compute density, the real cloud racks are pushing 30-40kW per rack so even if they were 20% and oxide 0%, they don't seem to have a compute density advantage there.


Let's imagine a 42U rack full of Dell or HPE servers. Each one has a couple of power supplies, console ports, a video port and a GPU to drive it, USB ports for keyboard, and mouse, etc.

More to the point with Oxide, there's layer upon layer of software cruft as well: various levels of proprietary firmware blobs driving the boot process, and then the management stack that underlies a commodity PC platform OS.

All of that is gone, replaced with a bottom-up redesign, of both the hardware and the software, with the specific purpose of running large scale modern application loads.

This goes well beyond a blade server, and is a completely different beast than a rack of DL360s.


Exactly, and the amount of cruft in there is indeed stunning! And broadly, not needed -- though there was a moment when we really wondered about KBRST_L, the ancient input to the CPU to indicate a keyboard reset! (We recounted this story on our "Tales from the Bringup Lab"[0], which is a must-listen if you haven't heard it.)

[0] https://www.youtube.com/watch?v=lhji-kP3Lhk#t=1h12m56s


I know it might seem like it, but there's just not that much in it. All video, USB, "console ports" can be driven from a small chip using a couple of watts. Some do all that on their BMC SoC, even. And if your requirements include redundant power supplies, then that's what you get, and if they don't you get something without them. And the boot and runtime firmware might be complicated and have its own issues, but there are not large runtime performance overheads in that either that you can just "rewrite it all" and get a big speedup.

Proof of the pudding will be in the eating I guess. Does Oxide talk about performance advantages at all, or have numbers?


A couple of watts here, and a couple of watts there, and pretty soon you're talking about real money!

If it didn't matter, then there'd be no interest in blades, or OCP, and all the cloud vendors would use standard rackmount whitebox servers. At the scale of one rack, the difference is fairly small. But once you're building out cages full of racks, maybe it matters more?

And it's not just power and space (although they both matter), it's the attack surface, and the ability to actually have your hardware do what you want, and the benefits of a stack that's built for purpose, not cobbled together out of off-the-shelf parts.

It's not for everyone, to be sure, but I suspect that for its target market, it'll be a very successful product/family. As you say, we will see. I wish them well.


> A couple of watts here, and a couple of watts there, and pretty soon you're talking about real money! >

Not with the current state of the art.

> If it didn't matter, then there'd be no interest in blades, or OCP, and all the cloud vendors would use standard rackmount whitebox servers.

I'm talking about it mattering from the starting point of blades, OCP, "cloud" systems!

The thread is about where the oxide niche is and what advantages it has over competition. The idea there is huge amount to be won on southbridge chipset and IO ports in large scale systems is simply not true. Quite amazing that people who don't understand this are posting in this thread as though they are experts in the matter.


>> A couple of watts here, and a couple of watts there, and pretty soon you're talking about real money!

> Not with the current state of the art.

Perhaps you could educate me: what's the state of the art that makes watts free?

>> If it didn't matter, then there'd be no interest in blades, or OCP, and all the cloud vendors would use standard rackmount whitebox servers.

> I'm talking about it mattering from the starting point of blades, OCP, "cloud" systems!

> The thread is about where the oxide niche is and what advantages it has over competition. The idea there is huge amount to be won on southbridge chipset and IO ports in large scale systems is simply not true.

I don't think anyone has asserted that there's "huge amount"s to be won from the simplified hardware? It's my view that there is a small amount to be won (more if you're coming from DL380s, and a bit less if you're replacing blades, for instance). But those small amounts do add up, and contribute to the value proposition. I think there are other parts of the Oxide product that are more compelling, but that's the point: it's the total package that you're buying, not just the missing VGA socket on the front panel.

Imagine that I'm an enterprise computing user. I currently have a few small (wrt cloud providers) datacenters, stuffed with racks of any tier 1 vendor's rack mount, blade or even OCP systems. And they're running some combination of plain VMs, perhaps some orchestration platform (k8s, or Mesos, or whatever). And it's time for my 3/5/8-yearly refresh of a bunch of those systems.

A unified, coherent, targeted product (such as Oxide) might be a compelling offer. It promises to deliver cloud-type hardware and software efficiencies at a smaller scale. It gives me the savings of not renting from AWS/Azure/etc, while not dealing with the hodge-podge of the standard PC hardware and software stack.

> Quite amazing that people who don't understand this are posting in this thread as though they are experts in the matter.

Welcome to the Internet?


> Perhaps you could educate me

Doesn't seem to be taking. The issue is that there are not a couple of watts there. There are a couple here, and that's all.

> I don't think anyone has asserted that there's "huge amount"s to be won from the simplified hardware?

They did. In this thread you are participating in even. Perhaps you didn't really read the post of mine that you first replied to, or its context?


People have this idea that legacy code or hardware is some huge burden but due to continuous progress in scale the relative size of legacy code or hardware shrinks over time. There might be some exception (e.g. physical connectors) but otherwise you can pretty much include the entirety of a 1995 PC computer (chips and firmware, OS, etc.) inside a tiny, tiny package.

The attack surface it leaves open is a different story.


So, I don't disagree in the abstract -- but in this case, the presence of these abstractions are functioning as real obstacles with respect to platform enablement, and we found it be a tremendous win to not only eliminate the abstractions entirely.[0]

[0] https://www.osfc.io/2022/talks/i-have-come-to-bury-the-bios-...


I wrote this a while back, does that help? Happy to elaborate. https://news.ycombinator.com/item?id=30678324


Seems like Oxide is aiming to be the Apple of the enterprise hardware (which isn't too surprising given the background of the people involved - Sun used to be something like that as were other fully-integrated providers, though granted that Sun didn't write Unix from scratch). Almost like coming to a full circle from the days where the hardware and the software was all done in an integrated fashion before Linux turned-up and started to run on your toaster.

From your referenced comment:

> The rack isn't built in such a way that you can just pull out a sled and shove it into another rack; the whole thing is built in a cohesive way.

> other vendors will sell you a rack, but it's made up of 1U or 2U servers, not designed as a cohesive whole, but as a collection of individual parts

What I'm curious about is how analogous or different is this cohesiveness to the days where vendors built the complete system? Is that the main selling point or there are nuances to it?


The holistic design is certainly a big piece of it. While we certainly admire aspects of both Apple and Sun (and we count formerly-Apple and formerly-Sun among our employees), we would also differentiate ourselves from both companies in critical dimensions:

- Oxide is entirely open where Apple is extraordinarily secretive: all of our software (including the software at the lowest layers of the stack!) is open source, and we encourage Oxide employees to talk publicly about their work.

- Oxide is taking a systems approach where Sun sold silo'd components: I have written about this before[0], but Sun struggled to really build whole systems. For Oxide, we have made an entire rack-level system that includes both hardware and software: the abstraction to the operator is as turn-key elastic infrastructure rather than as a kit car.

We have tried to take the best of both companies (and for that matter, the best of all the companies we have worked) to deliver what customers want: a holistic system to deploy and operate on-premises infrastructure.

[0] https://news.ycombinator.com/item?id=2287033


Oxide is the only exciting and refreshing technology product company I know of today. I've been rooting from the sidelines for years now, I want Oxide to succeed wildly so I can hopefully be a customer at some point.


Thanks for the perspective. That begs the request (and this despite your point about everything being open source) - you guys should share the differentiating highlights of the monitoring gains you have achieved from this level of integration - it'll help those who are not able to dive into the source code and extract/summarize this aspect. I think it'll be a good showcase to everyone else of the additional benefits that come out of this approach.


Yes, good idea! In terms of monitoring, we have made component selection that allows for monitoring (e.g., PMBus for power rails) -- and then built the software into our SP that pulls that information and makes it available upstack via our management network.[0]

[0] https://www.youtube.com/watch?v=abE_9zsAadE


Apple or Sun are common comparisons, yes :)

> What I'm curious about is how analogous or different is this cohesiveness to the days where vendors built the complete system?

To be honest, that was before my personal time. I was a kid in that era, using ultra hand-me-down hardware. I'd speculate that one large difference is that hardware was much, much simpler back then.


Perfect - now get that 2nd paragraph on the landing page somehow!


I don't really understand how having a larger minimum purchase unit (entire rack vs rack unit) is a USP. Your comments explain the emphasis on tighter integration across the stack, but it doesn't clearly show why this is a benefit.

What are the problems people are having with existing systems (like Vxrail), and how does Oxide fix those? What stories are you hearing?


> I don't really understand how having a larger minimum purchase unit (entire rack vs rack unit) is a USP.

For some organizations cattle-like pizza boxes or chassis with blade systems are still not cattle-like enough. By making the management unit the entire rack you can reduce overhead (at least compared to a rack of individual servers, even if they are treated like cattle).

There are vendors that will drop ship entire (multiple) racks pre-wired and pre-configured for various scenarios (especially HPC): just provide power, (water) cooling, and a network uplink.


I wouldn't say that a larger purchase unit is a USP; it is an underlying reason why other USPs are able to be delivered. This is an engineering focused place, so I tended to focus on the engineering aspects.

My sibling commentor just left a great answer to your second question, so I'll leave that one there :)


I don't get it. What hard to understand? They are selling servers, specifically in a rack. This has been a discussion in every thread and its pretty clear on their landing page.


It's like an on prem AWS for devs. I don't understand the use case but the hardware is cool.


Congrats to the Oxide team, this is a massive achievement.

Now that racks are shipping it'd be awesome to see a top-to-bottom look at the hardware and software. They've given a lot of behind the scenes peeks at what they're doing via the Oxide and Friends podcast, but as far as I'm aware there is no public information on what it all looks like together.


Somebody help me understand the business value. All the tech is cool but I don't get the business model, it seems deeply impractical.

* You buy your own servers instead of renting, which is what most people are doing now. They argue there's a case for this, but it seems like a shrinking market. Everything has gone cloud.

* Even if there are lots of people who want to leave the cloud, all their data is there. That's how they get you -- it costs nothing to bring data in and a lot to transfer it out. So high cost to switch.

* AWS and others provide tons of other services in their clouds, which if you depend on you'll have to build out on top of Oxide. So even higher cost to switch.

* Even though you bought your own servers, you still have to run everything inside VMs, which introduce the sort of issues you would hope to avoid by buying your own servers! Why is this? Because they're building everything on Illumos (Solaris) which is for all practical purposes is dead outside Oxide and delivering questionable value here.

* Based on blogs/twitter/mastodon they have put a lot of effort into perfecting these weird EE side quests, but they're not making real new hardware (no new CPU, no new fabric, etc). I am skeptical any customers will notice or care and would have not noticed had they used off the shelf hardware/power setups.

So you have to be this ultra-bizarre customer, somebody who wants their own servers, but doesn't mind VMs, doesn't need to migrate out of the cloud but wants this instead of whatever hardware they manage themselves now, who will buy a rack at a time, who doesn't need any custom hardware, and is willing to put up with whatever off-the-beaten path difficulties are going to occur because of the custom stuff they've done that's AFAICT is very low value for the customer. Who is this? Even the poster child for needing on prem, the CIA is on AWS now.

I don't get it, it just seems like a bunch of geeks playing with VC money?


Part of the appeal of the cloud over on-premise is that on-premise is not just expensive, but hard: oxide’s product isn’t just on-premise hardware, it’s on-premise hardware _that is easy_*. If on-premise was just expensive, it would be so much more appealing — because the cloud is expensive too!

Most every software engineer has worked in an org where spending six figures per month on AWS or GCP is totally normal and acceptable because the alternative, buying hardware, is this awful scary unknown that could be cheaper but could also blow the entire company up. If oxide can solve that, suddenly on-premise becomes much more attractive.

Yes, people are hooked on the cloud, but not because of data transfer… because it’s easy.

* well, the first few deployments might not be easy but that’s true of anything new.


Not OP, but what differentiates Oxide from VMWare or Dell? I can get pretty competitive quotes from both if I wanted to build my own DC, or if had much more niche HPC requirements I could go with NVIDIA.


top-to-bottom integration and fully open-source, basically. That's a pretty good value proposition, though it does depend on how much premium you pay for it.


I hate to link this comment agin in this thread, but I wrote a comment about almost exactly this https://news.ycombinator.com/item?id=30678324


The enterprises running VMWare I know run completely shit hardware and getting outrageously fleeced on any updates they want to do.


100% agree with you

Also, what this person said :) https://news.ycombinator.com/item?id=36552245


It is simply false that everything has gone cloud. The whole argument falls down on the first premise. Also nearly everyone who owns their own servers still runs VMs on them.


~Bandcamp~ Basecamp is doing just the opposite and leaving the cloud, in fact: https://world.hey.com/dhh/why-we-re-leaving-the-cloud-654b47...


Basecamp (DHH) not Bandcamp (Derek Sivers)


Isn’t Derek Sivers CD Baby?


This thread is just a litany of errors


well heck. yes, sivers@ founded CD Baby, not Bandcamp.


Thanks!


> Also nearly everyone who owns their own servers still runs VMs on them.

This strikes me as at least as much of a leap as "everything has gone cloud."

Containers are... kind of a thing. And while there certainly are use cases for VMs over containers, they're comparatively niche.

This product seems as if it'd be a better fit for every real use case I've ever seen if it were a prebuilt kubernetes cluster rather than a prebuilt VM hypervisor.


That’s a Silicon Valley bubble perspective. Everything from your kid’s school to your car dealer to your automaker is VMWare.


This is true, but I wouldn't underestimate the penetration of Cloud and Container First deployments.

A number of customers I've worked with in Biotech, Telecom, Retail, Defense, and Financial sectors have either completed their cloud transformation or have ongoing 3-5 year timelines to make such a move.

On the mid-market side, I've been seeing organizations increasingly offload the entire IT operation to MSPs instead of managing their own ESXi server. The MSP might often be using VMWare, but a number are also transitioning away from that.

I think it's premature to say container and cloud first deployments are the norm, but the change and transition is rapidly occurring.


I would expect most of those cloud deployments to be VMs not containers.


It's 60-40 VMs-Containers in my experience.

A lot of this is highly dependent on the kind of app you are migrating as well as the scale needs.


> Everything from your kid’s school to your car dealer to your automaker is VMWare.

While true, they also aren't buying $500k+ rack. They're running vmware on a much smaller scale.


In 15 years of professional experience I've never worked at a place that uses VMs on the servers they own. They're just going to run Linux off an image so what's the point? You might want to look outside your niche.


Unless you're running at 100% load most of the time, it's atypical to run a server on bare metal these days. Much more flexible, maintainable, and cost-saving to stick servers in a VM.

We sell a product that used to be delivered as a rack mountable server, but now I think nearly all our customers want a VM instead.


I worked for a SaaS that ran a hybrid infra, partially on metal. Each rack had N blade enclosures, which together made up a logical isolated unit, with each blade running something. New blades would get provisioned automatically, then Terraform and Puppet handled their configuration.

I don’t remember utilization exactly, but I think most of the blades were reasonably committed. That said, when I left they were moving from a hybrid model to entire cloud, so apparently the cost:flexibility ratio wasn’t to peoples’ liking.


Is this true? I have honestly never worked in any company >50 people that didn't use VMs on owned hardware in some capacity, though usually not the main product.

IT departments typically love VMs (and vmware)- AD machines are most often hosted on VMs on VMWare.


Two of the companies I've worked for have run gigantic clusters and had >100 employees and didn't really use VMs. One of them used one to build some third party libs, but it was basically an implementation detail of one script. Everything on the servers was still bare metal. You already have user separations, permissions/privileges, etc. It's not clear what a VM adds other than as a hack to allowed installed packages to diverge which doesn't matter if you vendor them.


I'm not sure what "weird EE side quests" you're referring to, but anyone interested in learning what we've done in terms of hardware should listen to the team in its own voice, e.g. in their description of our board bringup.[0]

[0] https://oxide-and-friends.transistor.fm/episodes/tales-from-...


Addressing #2 and #3, a "hybrid cloud" architecture can include a site-to-site VPN or direct fiber connection to a cloud. In AWS, Direct Connect data transfer pricing effectively makes your on-prem DC or colocation facility into an availability zone in your AWS Region. Direct Connect is $0.02/GB egress (out of AWS) and free ingress (into AWS), which is a better deal than cross-AZ data transfer within the AWS Region. Cross-AZ within an AWS Region is effectively $0.02/GB in both directions.

This way, you can run your big static workloads on-prem to save money, and run your fiddly dynamic workloads in AWS and continue to use S3.

That said, if a hybrid cloud architecture is your plan and you desire a managed rack-at-a-time experience, AWS Outposts would seem to be the safer pick. They've been shipping for years and they have public pricing that you can look at. I'm not sure that Oxide specifically has an opening for customers who want to keep their cloud. I wish them luck.

https://aws.amazon.com/outposts/rack/pricing/


My own experience here was with a rack of servers in a DC running a fairly heavy workload for a number of customers.

Long story short, despite pressure to move from our parent company, there was literally no dimension - services, finance, performance, support - on which AWS was better for our workload. Our DC capex+opex was less than a quarter of the parent company’s opex spend on AWS, despite our workload being much greater - and we needed fewer staff to manage it.

So I for one am really happy to see Oxide producing an on premise private cloud solution like this.


This is for people who want a turnkey cloud that they own, not rent. Renting is really expensive: you could probably buy a few of these racks a year for the cost of renting the same amount of compute on AWS. You also get a software stack which gives you the tools to manage the hardware without growing your own, with an extremely high level of trust. It's extremely well aimed at the companies which can afford the racks, because operating at that scale you are being fleeced by the hyperscalers.


> they're building everything on Illumos (Solaris)

This is an amazing plus in my eyes. Solaris systems are amazing.


A lot of corprotation have a very hard time moving workloads to the cloud due to regulations alone. In some places it's a compliance nightmare, a landing page with "trust me, we're secure" from AWS is far from enough for legal counselors, auditors and the CISO.


Those corporations are dumb. Way more resources are spent making AWS secure than could ever be spent on bespoke security implementations.


Cloud was a low interest rate phenomena. I predict a return to metal servers and managed data centers.


>* You buy your own servers instead of renting, which is what most people are doing now. They argue there's a case for this, but it seems like a shrinking market. Everything has gone cloud.

This is very much not true and seems to be a result of people in the valley thinking the rest of the world operates like the valley. In the rest of the world I've found mature businesses that bought into the "cloud is the best" quickly started doing the math on their billing rate and realized there is a VERY small subset of their business that has any reason to be in the cloud. Actually one of the very best use-cases of public cloud I've seen is a finance firm that sticks new products into the cloud until they hit maturity so they can properly right-size the on-prem permanent home for them. And if those products never take off, they just move on to the next one. They're willing to pay a premium for 12-18 months because they can justify it financially.

>* Even if there are lots of people who want to leave the cloud, all their data is there. That's how they get you -- it costs nothing to bring data in and a lot to transfer it out. So high cost to switch.

And yet company's do it all the time. I think you'll again find mature fortune 500s can do the math on the exit cost vs. staying cost and quickly justify leaving in a reasonable time window.

>* AWS and others provide tons of other services in their clouds, which if you depend on you'll have to build out on top of Oxide. So even higher cost to switch.

And as you've seen plenty of people here point out: most of those services tend to be overrated. OK, so you've got database as a service: except now you can't actually tune it to your specific workload. And $/query, even ignoring performance, is astronomically higher than building your own and paying a DBA to manage it unless you're a 2-man startup.

>* Even though you bought your own servers, you still have to run everything inside VMs, which introduce the sort of issues you would hope to avoid by buying your own servers! Why is this? Because they're building everything on Illumos (Solaris) which is for all practical purposes is dead outside Oxide and delivering questionable value here.

I don't know of a single enterprise that has run anything BUT VMs for the last decade. Other than Mainframe (which you can argue is actually just VMs in a different name), and some HFT-type applications that need the lowest possible latency at all costs, it's all virtualized. As for Illumos: why do you care? Oxide is supporting and maintaining it as an appliance. Tape has been "dead" for 2 decades. FreeBSD has been "dead" since the early 2000s. It's only dead for people that don't deal with enterprise IT.

>* Based on blogs/twitter/mastodon they have put a lot of effort into perfecting these weird EE side quests, but they're not making real new hardware (no new CPU, no new fabric, etc). I am skeptical any customers will notice or care and would have not noticed had they used off the shelf hardware/power setups.

I have no doubt they've done their research, and I can tell you from my industry experience there is a large cross-section of people who want an easy button. There's a reason why companys like Nutanix exist and have the market cap they do - but they could never actually get the whole way there because they got wrapped up in the "everything is software defined!!!". Which works really well until you realize that you're left to your own devices on networking.

>So you have to be this ultra-bizarre customer, somebody who wants their own servers, but doesn't mind VMs, doesn't need to migrate out of the cloud but wants this instead of whatever hardware they manage themselves now, who will buy a rack at a time, who doesn't need any custom hardware, and is willing to put up with whatever off-the-beaten path difficulties are going to occur because of the custom stuff they've done that's AFAICT is very low value for the customer. Who is this? Even the poster child for needing on prem, the CIA is on AWS now.

I mean no disrespect but I get the impression you haven't ever worked with a fortune 500 that's outside of the valley. This is EXACTLY what they all want. They aren't going to run their entire datacenter on this, but when the datacenter is measured in hundreds to thousands of servers, they've got plenty of workloads that it's a perfect fit for


I've never worked in the valley lol


Many companies are leaving cloud hosting due to spiraling costs. Even well-knowns like 37signals:

https://world.hey.com/dhh/we-have-left-the-cloud-251760fb

It's nice to have options. Cloud good. Self-hosting good. Middle options good.


My mind also jumped to this post. But can you name another company that did so recently?


most companies will not be so loud about it.

Double that for companies that were loud about their cloud usage, usually that came with some usage discounts on the side- on-prem does not, and it also looks bad to pivot anyway; so why be loud?


Why does it look bad? It is perfectly reasonable that at the time of the switch to the cloud, that it made good business sense. It is also perfectly reasonable that after some time that the landscapes have changed, and it is now good business sense to leave the cloud. If at the time of the switch to the cloud the pricing was cheap, then sure, but we all know that prices are not written in stone and will pretty much always increase. At some point, those increases become untenable. It's the nature of tech. We used to use thin clients connected to a server, then went to dedicated workstations, and currently we're back to connections to servers.


Anyone willing to disclose minimum pricing? Are we talking tens of thousands? Hundreds? I hate when people say, 'If you have to ask, you can't afford it.' Please don't be that guy.


32x EPYC servers, figure $10K/server on a base config = $320,000 .

Add in what appears to be, 2x 100Gb Ethernet switches, the 100Gbps NICs, cabling, other rack hardware; service processors that allow you to control the servers, whatever amount of NVME drives, etc. Assembly, burn in, test etc.

My guess is that a base config would be somewhere between $400K to $500K but could very definitely go up from there for a completely "loaded" config.


If you compare an 0xide rack with a standard combo of Dl380s, Nexus switches, netapp filers and Vmware licences, and look at the specs page - “ 1024TB of raw storage in NVMe” - there’s no way this is tens of thousands and I’d be a bit surprised if it was in the hundreds either.


You're saying this rack might cost a million dollars?


I’d say there’s certainly a chance based on the specs being quoted. If we do some very rough back-of-napkin math and figure enterprise grade NVMe storage is something like $100 per TB, that’s plausibly $100k on storage drives alone


a half rack netapp filer can be a half-million dollars by itself.


Based on the components I'm guessing a half million.


Based on the components I’d say that’s the build cost not price!


Servers with high quality software integration. These provide the same value to businesses as Apple devices provide to consumers. Hopefully they "just work" and eliminate a bunch of Devops distractions.

Most hardware companies have terrible software. If Oxide can handle manufacturing and logistics, then they'll be huge in about 10 years.


For those not in the know, this is what is being talked about[1]. Congrats to the Oxide team!

Would love to know what kind of uses this is being put to. In this age, everyone only talks about the cloud, with its roach motel model and all.

[1] https://oxide.computer


The demise of on premise and private data centers is greatly exaggerated. Few people here do that because the cloud is great for startups and rapid prototyping. Most on prem beyond small scale is big established companies.

There is a minor trend of established companies reconsidering cloud because it turns out it doesn’t really save money if your work load is well understood and not highly elastic. In fact cloud is often far more expensive.


FYI for language pedants like me: It’s “on-premises” or “on-prem”. A “premise” is something assumed to be true.


I didn’t understand the business opportunity of Oxide at all. Didn’t make sense to me.

However if they’re aiming at the companies parachuting out of the cloud back to data centers and on prem then it makes a lot of sense.

It’s possible that the price comparison is not with comparable computing devices, but simply with the 9 cents per gigabyte egress fee from the major clouds.

If I was marketing director at Oxide I’d focus all messaging on “9 cents”.


They sell servers, what does not make sense about it? You can argue about the specific niche (big enough to run their own hardware, too small to design their own), but companies need somewhere to do compute. If nothing else, I love their approach to rethinking all of the individual software components in the stack and tossing those things which do not make sense in the modern era.


The question isn't whether anyone fits into that niche, but why anyone who does would buy this over a plain old off-the-shelf system.


That off the shelf stuff is some Dell hardware (and fireware from 7 different venders) with some VMWare stuff on top, I can guess why somebody would go for Oxide.


They seem to sell one set of servers, that's the part that doesn't make sense.

Where is this magical company that needs exactly one rack of exactly one type of server? The vast majority of companies needing this much compute will also be interested in storage servers, servers filled with GPUs, special high-RAM nodes, etc. And at that point you'll also be using some kind of router for proper connectivity.

Why bother going for a proprietary solution from an unproven company for the regular compute nodes, and forcing yourself to overcommit by buying it per rack? Why not just get them from proven vendors?


I work for a manufacturing company that needs exactly two types of boxes, generic compute without storage that connects to a SAN, and GPU based servers.


Everybody has to start somewhere. I remember when EC2 only had one kind of VM.


My take: Oxide is for companies who want to buy compute, not computers.

They take the idea of "hyperconvergence" - building a software platform that's tightly integrated and abstracts away the Hard Parts of building a big virtualized compute infrastructure, and "hyperscaling" - building a hardware platform that's more than the sum of its parts, thanks to the idea of being able to design cooling, power, etc. at a rack scale rather than a computer scale. Then they combine these into a compute-as-a-unit product.

I, too, am a bit skeptical. I think that they will absolutely have the "hyperconvergence" side nailed given their background and goals, but selling an entire rack-at-a-time solution at the same time will be hard. But I have high hopes for them as it seems like a very interesting idea.


Because the "proven" vendors suck?


I'm convinced that if anything is going to reverse the migration to the cloud, it's Oxide.

Is there anything in the works for using some of this in a homelab? Mainframes (unofficially) have Hercules, be good to see something similar for folks who want to experiment.


Maybe I’m misunderstanding the product but wouldn’t hypervisor do the same thing for homelab related stuff? You’ll have to provide your own hardware but shouldn’t be that difficult.


In what manner? The vast majority of companies I don't think need or want an entire rack.

So this has to be targeted more at F500, or maybe even to colos?


This is super cool. I realize a lot of HN folks might not see the point of this, but it literally saves an entire team of people for companies.


Oxide is such an ambitious project, I am such a fan of the aesthetic and design and of course transparency of all of the cool people that work there.

I'd love to have a rack or two some day!


I believe that good hardware and software unified into one neat package can steal customers back from the cloud. Especially in the current economic conditions where everyone's looking to save on their server bills. I hope to some day work with Oxide stuff.


The On the Metal podcast had been one of my absolute favorites. It's clear these folks are exceptional, and they deserve congratulations!


Is there a public price list anywhere?


What really put me off of them (and as a fan of BCantrill and others there) is this information is highly obfuscated and I was never reached out to the two times I contacted Oxide to find out more info (both times on behalf of an org that could more than pay).

Still think they’ll succeed big but I don’t think they’ve fully dialed in what is important to people who may be able to pull the trigger on a decision like this.


Second this, I was (and am) in a position to pay substantially for such a system and the few times I reached out was met with radio silence.

possibly because I am in Europe and they want to focus on the NA market, not sure.


It is true that we are focusing on the North American market, but we are also not trying to treat the European market with radio silence; please accept our apologies! We can't find anything under your HN username or the information in your HN profile; would you mind mailing either me (my first name at oxidecomputer.com) or [email protected]? With our thanks -- and apologies again!


I will definitely reach out. likely you have me under [email protected] which was my corporate email address at the time. Alas, I have moved on from that job. But just so you know I am not lying.


Thank you so much! (And definitely viewed this as being on us not on you, so no worries!)


With our apologies, we can't seem to find anything based on either your HN username or the e-mail address in your profile. So sorry that it was somehow dropped or otherwise went into the ether! Would you mind mailing either me (my first name at oxidecomputer.com) or [email protected]? Thanks for your interest (and fandom!) and apologies again!


I have a very strong aversion to being put in any kind of sales system. Do you have anything public you can share at this time, or is it all "contact sales for pricing"-level stuff?


I LOVE how these guys unit test hardware sub systems before putting them onto whole motherboards.


How do they do that?


A great question! You can find more details on our approach in our Oxide and Friends episode on the power of proto boards.[0]

[0] https://oxide-and-friends.transistor.fm/episodes/the-power-o...


Thanks much!


From a commodity hardware perspective I'm not sure there is much to be excited about but if it's a meaningful better and cost competitive IaaS maybe that is exciting. Also you are probably going to want GPU support which may be hard with their super customized offering.

If I were to do a bare metal deployment I'd look at kubernetes + kubevirt + a software defined storage provider. Then you get a common orchestration layer for VMs, containers, or other distribute workloads (ML training/inference) but don't need to pay the Vmware Tax and you'd be using a common enough primitive that you can move workloads around to 'burst' to the cheapest cloud as needed.


you can just follow one of the "kubernetes on ec2" guides and adapt it to their apis


As long as it encourages on-premise self-hosting, this can only be a good thing.


If I were rich, I would try to buy some big clusters like that now but only if they were full of MI300Xs (and some MI300As). May ten or twenty deadly cabinets. Then (ideally) train a record-breaking multimodal model and (hopefully) take over the planet before they all become obsolete in 2-3 years after tinygrad puts out their ASIC design and Intel and SiFive adapt it for 2nm.

I guess to be truly turn-key it would need to come with a special model or two adept at training and commanding an army of LLM agents.


The listed items are for last gen CPUs and 2 gen old nics (100 Gb/s). Also zfs rather than ceph. It's hard to consider under those constraints.

Any plans for upgrades beyond single rack deployments and newer gen hardware? And what is the sell case over kube, mesos, or VMware, especially since the hardware and software appear tightly coupled?


It's Crucible not just ZFS BTW.


Thanks for the info.


Congratulations on your release! Great product. This will totally fill our MANY on-premise CI, Security, analytics and Video needs that CANT be moved to the cloud. Oxide will definitely be on our short list for our next hardware/software stack refresh.


Am I standing this correctly? This is a on premise drop in replacement for your cloud service like AWS?


Not unless you have strictly constrained yourself to using vanilla VMs and nothing else.


Super excited to see more companies owning hardware vs renting (aka cloud).

Somewhat related if you are interested on companies which heavily uses on-Prem check out

https://github.com/viggy28/awesome-onprem


Well done and congratulations to the Oxide team ! Very excited to see where this company goes


What operating system do they use?


A modern rack contains many computers. We use both our own lil embedded OS, Hubris, as well as our own illumos distribution named Helios.

None of this is exposed directly to the customer, though. You run VMs with whatever you want.


> None of this is exposed directly to the customer, though

What about security updates and bug fixes? No platform is perfect, after all...

Or what about if (god forbid), you go out of business 5 years down the line... would the hardware be repurposeable? Or would it just become a large paperweight?


> What about security updates and bug fixes?

Security is an important part of the product. You'll get updates.

> Or what about if (god forbid), you go out of business 5 years down the line

All of the software, to the degree that we are able to, is open source. This is important precisely because you own the hardware; you as a customer deserve to know what we put on it.


People who buy systems like this lifecycle it out after 5 years. Somewhere around year three they are working on specs for its replacement.


You would think so, but we have several SANs at dayjob that are coming up on 10 years old...


Why is this better than a normal k8's distribution or just buying vm's from Amazon for someone who doesn't need high security or other boutique features?


It may be, it may not be! The main differential in those cases is that you're not owning the hardware, you're renting it. That is very important for some people, and not so much for others. It depends on what you're doing.


Pardon my knowledge but isn't this the same as what is being done on the IBM Z Mainframe?


In some sense, but not in others. This is an x86 box.


Hmm. So why will someone go with Oxide over IBM?


This is a cluster of commodity machines, a mainframe is one machine.

The mainframe is optimized for reliability and compatibility foremost and does things that are quite unlike most other classes of machines even "high end servers", including RAIM (RAID for memory), and the ability to stop a failed processor and move its checkpointed state to another processor and resume it transparently from software point of view, and generally employing quite hardened circuits and strong error detection and correction throughout the system.

There are some guesses at $500k for this Oxide rack, not sure if that even gets your phone call returned for a low end mainframe with much less compute power and memory installed. High end configurations rumored to be many millions.

This thing is more a competitor to "scale-out" / "cloud" / "webscale" / etc., at least on the hardware front (they seem to do a lot more on software/firmware side than typical such hardware vendors).


I explained that (I hope) over here https://news.ycombinator.com/item?id=30678324


Why do you use Illumos?


A sizable portion of their team and leadership is former Sun / Solaris / Illumos dev folks. It's their 'native' OS/platform.


Many people at Oxide have been working with it and its predecessors for an extremely long time. It is a platform that is very well known to us.


I truly and honestly hope you succeed. I know for certain that the market for on-prem will remain large for certain sectors for the forseeable future.

However. The kind of customer who spends this type of money can be conservative. They already have to go with on an unknown vendor, and rely on unknown hardware. Then they end up with a hypervisor virtually no one else in the same market segment uses.

Would you say that KVM or ESXi would be an easier or harder sell here?

Innovation budget can be a useful concept. And I'm afraid it's being stretched a lot.


Your point is well-made: the status quo is safe! Introducing any new product into an organization takes both a recognition of pain and some courage. In the case of Oxide, the pain is apparent: extant systems are expensive, vendors gaslight and squabble when issues arise, these heterogeneous solutions are surprisingly brittle. This is true even when dealing with what is ostensibly the same company! Try getting Dell, VMware, and EMC to agree whether a problem is due to the server, hypervisor or SAN!

In addition, app teams (i.e. the folks who deliver revenue to the business) are more productive on public cloud. They don't file tickets, they make VMs. Oxide is intended to bring the hardware, software, and interface innovations of the cloud into enterprise data centers with greater reliability and lower TCO.


> I truly and honestly hope you succeed.

Thanks!

> However.

Yes. These things are challenges, but that doesn't mean they're insurmountable. I am confident that we can overcome them.

> Would you say that KVM or ESXi would be an easier or harder sell here?

I am not in enterprise sales, so I couldn't tell you :) Oxide has those people, I'm not just one of them.


I'd not mind frying some meats on a grill and celebrating, if anyone in Bay Area is interested. Having some Oxide folks would be awesome.


Does oxide provide some sort of management software like openstack on top of the hardware?


You interact through the rack via an API, yes. I hesitate to say "like openstack" because these sorts of systems are huge, complicated, and what "like" means depends on you know, what you use. But you do get management software, yes.


That's everything they do I do believe.


It looks cool, what primary problem does it solve vs AWS?


Ownership and control vs rent and managed setup


Additionally, compared to AWS Outposts which use inefficient commodity hardware and closed source software, the Oxide platform uses integrated hardware and software, and open source software.


Who's the lucky customer?


Anyone want to explain what the heck Oxide is? Based on the comments it sounds like a biodegradable plastic wrap?


I wrote this a while back, does that help? Happy to elaborate. https://news.ycombinator.com/item?id=30678324


It's a cloud in a rack.


So they reinvented the mainframe with a prettier exterior?

Their own OS: Check Big tall box: Check Expensive: Check (well no price list i have seen so far) Proprietary hardware at least interfaces. (?) upgrades must be bought from the vendor (?)

Right to repair?


Not quite. A traditional mainframe is highly integrated, to the point of allowing hotswapping of CPU and memory. Oxide seems to be a fairly standard collection of x64 hardware, with a proprietary management software sauce.


Just to expand slightly, "proprietary" in the sense of "built by us for this computer," but not in the way that free or open source software developers speak of the word "proprietary," the vast majority of what we do is MPL licensed, to the degree that it is actually possbile.


Out of curiosity, is there a list of those things that can not ot could not be licensed thusly and why?


In some cases, it's derived work of open source but non-MPL'd code -- but the ones that Steve is referring to are where we have third-party proprietary firmware. There aren't a ton of these (and in almost all cases, we ourselves don't have source code). We have (naturally) tried to minimize these, but as a concrete example: every AMD CPU has a PSP that runs proprietary software; we deliver that as a binary blob given to us by AMD (it's delivered over SPI to the host CPU).


Can you expand on what is not MPL licensed? I recall (foggily) reading about struggles with incomplete data sheets for Nuvoton controllers & I see the discussion about CVEs for NXP microcontrollers with secret patch ROMs.


See Brian’s excellent reply to your sibling.


One of our core product goals is to actually allow relatively hot swapping of individual compute sleds. It's true that each sled is an x86 server, but there's control software managing the whole rack as a unit, with live migration of workloads and replicated storage. The architecture of the whole thing leans towards treating sleds as essentially like CPU and memory (and storage) boards in previous generations of large scale computers like mainframes.


This is the case for some .

Like the Tandem Nonstop series. (Now part of Hewlett Packard Enterprise) I had a contract with a 9/11 call handler for a while. They had one of them (or possibly more). On display of sorts through a window of bulletproof glass.

The IBM Z series does not allow hot swapping the CPU and I dont believe it ever did. The parts on the Z Series are super easy to replace. All in nice components sitting on a rails to yank in or out.


It's more like reinventing Sun with an uglier exterior but yes.


Hey now, the Oxide rack is pretty good looking[0] -- and not to take anything away from the likes of the E10K, I personally think the Oxide rack is better looking than the Sun machines of yore. ;)

[0] https://oxide.computer/


... but with more inner beauty.


Finally, a true dark mode server.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: