The API seems bad to me, the ioctl() is ridiculously overloaded (disconnect, connect, read, and write all go through ioctl(), for example). As an API, this doesn't pass Linus' "good taste" test. As a solution, it's more complicated than necessary.
On messaging in general: this is yet another IPC api from RedHat. They really want to get an IPC api into the kernel over there. However, their proposals seem 'ignorant,' in the sense that plenty of IPC research and work has been done over the last half century (or more?), and their proposals seems focused on their own small use cases. They seem unaware of how others have tried to solve the IPC problem.
On why Linus doesn't want to integrate it: no kernel team is going to want to have yet another poorly-designed IPC stack to maintain even though hardly anyone uses it. That's been done, and IPC in Unix is a mess as a result. Any sane kernel dev would be resistant to this, unless the new proposal is absolutely beautiful, and you can look at it and say, "yeah, that is really great."
RedHat should go out, investigate the research that has been done, become experts on the topic. Learn everyone's IPC problems, not just they problems they have in their own insular community. Only then create an API that actually is in good taste.
Then begin the political work of getting people to adopt it. Start with the BSD teams. Start with the hobbyist OS devs at osdev.org. When the whole world agrees that it is a good thing, then Linus will put it in the kernel too.
I don't have much of an opinion on how good this particular proposal is, but it is important to have a sensible IPC system for Linux. I see a lot of resistance to this, with claims that Unix sockets and POSIX are good enough. As the maintainer of Rust's ipc-channel, which wraps all this stuff, I strongly disagree. The complexity of the Linux implementation of ipc-channel [1] has been absurd compared to the Mac implementation, which uses Mach [2]. Don't let the similar lines-of-code count fool you—90% of the bugs that I've seen go by have been in the POSIX backend, necessitating things like manual fragmentation to get around size limits in the kernel, weird structure alignment rules in the cmsg API, file descriptor limits, ENOBUFS stuff, etc. etc. By contrast, the Mach implementation has more or less "just worked".
At this point I don't really care what the Linux kernel decides on, as long as it decides on something. kdbus or Binder or anything would have been fine; the people who are truly hurt by kernel politics are us developers.
The reason people think that IPC is not important (and I agree with you, btw) is because so often bad designs are built around IPC.
The reason Linux is opposed to integrating IPC into the kernel is because it's been done more than once, poorly, resulting in a lousy API that needs to be supported forever by kernel devs.
My impression is that bus1 appears to have taken the (copious) feedback from the kdbus debacle and actually applied it and looked at other platforms' IPC to build a novel IPC system for Linux worth using. There's a talk [1] about the design of bus1, and comparison against IPC on other platforms where IPC is saner, and how the capability model is the right design for IPC - composable, understandable, and secure-by-default. It strikes me that the bus1 devs arrived at their design after doing the things you suggested! :)
Is there something I'm missing? What might an ideal IPC API look like to you?
Erlang's internal IPC is doing it pretty well since before there was a Linux kernel. I don't know if that approach can be ported into the kernel and how complex they are behind the scenes, but spawn / send / receive are apparently simple concepts.
Authentication, authorization, and resource quotas for agents are not really addressed in the Erlang model, but would be expected for IPC on a Unix-like system.
Do we really want to put all of that into the kernel and not implementing it in userland? I get the feeling that it's too much application dependent, not enough general principles.
Maybe I just don't understand the problem they are out to solve.
The reason for a bus-style IPC implemented in the kernel is the same that sendfile(2) exists. I doubt anyone thinks it's the pinnacle of great design, but reduced copies and context switching for real application workload: sometimes the more 'proper' design is sacrificed for practicality.
I can't answer that because I don't have a PhD related to IPC, and I haven't done the necessary research to fully understand the field. I have looked at how some other systems are doing it, but I know that is not a strong enough knowledge-base to build a good IPC system.
I do have enough understanding to know that when I look at a good IPC api, I will look at it and say, "wow, that's really nice."
"Tom Gundersen, one of the authors, acknowledged that having a dedicated system call to create a peer, and to perform the various other operations that currently use ioctl(), might make more sense for eventual upstream submission. Devices and ioctl() have been used so far because they make out-of-tree development easier."
So I think using ioctl is a prototyping concession.
I'm not an expert on this matter but I think it is the Android devs that want it not Red Hat it is similar to the android specific BINDER IPC mechanism but with multicast support.
I said 'RedHat' because the people associated with the repository are from RedHat: https://github.com/orgs/bus1/people Tom Gundersen works at RedHat, too.
There are certainly many people who want IPC, and a good implementation would be very welcome.
Other commenters have made good points, I'll mention one I haven't seen yet.
Linus has mentioned multiple times in the past that he likes to merge code it is getting used even if it's not ideal. That way, it's in the tree and can be improved instead of continuing to change outside of the kernel without the core developers' input.
That's what Android's Binder is. We also have kdbus, so we know that something like this was seen as useful before. Obviously android is used on a HELL of a lot of devices. It's not considered suitable for merge into mainline for various reasons, but much like other Android specific technology something very similar to it does make sense.
So as long as they do a decent job at this there's a good chance that it will get merged into the kernel. Overtime Google will help improve it, as well as move things out of Binder or turn Binder into a layer on top of it.
Every piece of widely used external (kernel level) code that ends up in the kernel is a win. The only other option would be to wait for the extra repository to get up to standards and developed the way the kernel maintainers would like, which probably won't happen without their constant input. but this is good for Google because it means they get more eyes on their patches, and don't have to carry as much of a load when updating versions and making new releases.
One very important thing to mention is that Binder is an RPC mechanism while Bus1 aims to be an IPC mechanism. The crucial consequence of that is, Binder is synchronous - you make an RPC call and you wait for the answer, while Bus1 is asynchronous - you send some data to an address and later on some data is returned to you on another address. That also necessitates the design that with Binder, the called process steals work from the calling process and temporarily executes with the calling process's priority until it's done. Ideally I'd like to see Bus1 also cover Binder RPC calls, or at least some merge of the two technologies.
This seems needlessly negative; and you seem to be inferring a lot about what Linus wants without him having said anything. The one thing he's asked about so far on this patch series is how it handles resource exhaustion/denial-of-service issues, and they have a reasonable reply, though there may need to be some iteration there: http://www.mail-archive.com/[email protected]/msg...
The last attempt, kdbus, was definitely way too complicated and funky. It went through a few rounds of review and was never merged for a good reason.
This redesign from the ground up does a lot of what you are asking for. It derives inspiration from other, well respected, capability based systems, as well as other IPC systems for the Linux kernel, like binder, which is used on one of the most popular Linux based consumer platforms, Android.
Linux already supports the IPC mechanisms supported by most of the BSDs; pipes, Unix ___domain sockets, POSIX IPC. But these actually have a number of shortcomings for building reliable, efficient IPC between processes.
I would say that they have done most of what you ask. They have investigated the existing solutions, and found them wanting. They have already gone through one design that they threw away due to it being too complicated. They have picked a model that is widely respected and implemented in one form or another on most systems, a capability based model. There's a little bit of new invention here due to the ability to multicast messages and their message ordering guarantees, and probably some room for iterating, but overall, this looks like a pretty promising IPC system compared to the fairly overwrought kdbus.
And yeah, an ioctl interface may not be the prettiest, but they've said that they're using that over syscalls just because it's easier to implement and iterate out of tree that way before getting it merged, but are willing to switch to a sycall based interface if that's preferred: http://www.mail-archive.com/[email protected]/msg...
Besides the ioctl vs. syscall issue, which is pretty much just an implementation detail that can be solved with a wrapper API or done away with before merging, what do you find not in good taste about this proposal?
What makes you think we are not? What use case would you like to be taken into account in bus1 that is not? Open to suggestions (that is the point of an RFC after all ;) ).
That's a very good suggestion. But the reality is that will never, ever happen. Linus and the Linux community as a whole exist in an echo chamber.
They never resort to looking outside their little bubble world to see how others solved problems they are trying to fix. They simply are not capable of that. They constantly reinvent the wheel and reinvent it poorly. They should have simply adopted kqueue, ZFS, dtrace, jail(), etc. but didn't. They choose to refuse to see how others solved problems and use the good solutions of others.
It's one of the most damaging things about that community.
Linux was GPL licensed long before these projects were open-sourced under CDDL, and there are obvious reasons for why Sun at the time would not want to have their technology advantages incorporated into Linux (like it being their main competitor to whom they were losing in the market place).
Crafting a new GPL incompatible license for ZFS and DTrace resulted in Linux not being able to incorporate them.
Danese Cooper who was responsible for drafting the actual license while at Sun says another thing.
Obvious business sense stands firmly with Danese, it would be very stupid of Sun to hand over the best technological advantages of Solaris to their main competitor, which was Linux, while struggling against it in the market.
Open-sourcing Solaris was a last ditch effort from Sun to attract mindshare (and eventually market share) back from Linux, that plan would be doomed beforehand if Linux could just pick the best parts of Solaris for inclusion.
In term of linux design ioctl are natural candidates as an underlying mechanism for multi-process communication.
Quoting Steven Doyle
ioctl can be guaranteed by the kernel to be atomic. If a driver grabs a spinlock during the ioctl processing then the driver will enter atomic context and will remain in atomic context until it releases the spinlock.
http://lwn.net/Articles/274695/
cooperative multitasking in critical functioning requires thread safety and atomicity.
What, bothers me is what the fuck is an iovec! That is the most important part of the norm and yet it is not defined. My fear is that to accommodate «industry grade level of developers» they will use dynamically allocated structures vs fixed size structure. And we all know that malloc in user space is already the door to hell, but in kernel space, it is a direct nightmare.
What cringes me too is it is a distributed system (and I played quite a lot with them) and they say they have tackled the problem of global ordering of the messages. Well, be it on the network, be it on silicium, I never saw anyone achieve this feature.
I fear they are over promising and they will under-deliver
> What cringes me too is it is a distributed system
I still don't understand why they don't use (or extend, if necessary) TIPC, which is already a distributed IPC protocol that is already in the kernel. Why build a library around an existing feature that has already had a lot of testing when you can say NIH and design something new with an entirely new set of bugs?
> ioctl can be guaranteed by the kernel to be atomic. If a driver grabs a spinlock during the ioctl processing then the driver will enter atomic context and will remain in atomic context until it releases the spinlock.
That's a red herring though. A new syscall implementation can also provide the same guarantees.
> What, bothers me is what the fuck is an iovec!
Pretty sure they're referring to Berkeley-style UIO. Any discussion about those will soon devolve in other types of IPC, in my experience..
> What cringes me too is it is a distributed system (and I played quite a lot with them) and they say they have tackled the problem of global ordering of the messages. Well, be it on the network, be it on silicium, I never saw anyone achieve this feature.
They are claiming there's no global synchronization and a global order. That is textbook impossible, and leads me to believe what's written on the site is either waaay misunderstood or waaay manipulative.
I've read both, thanks. I clearly remember the fact the total ordering is "somewhat arbitrary" in Lamport's own words, which is what I pointed out here [https://news.ycombinator.com/item?id=12803907], too.
I admit I haven't read the implementation to see what kind of bounds you derive, and I couldn't find them in the wiki either. So, I think I'll go with "accidentally exaggerated" instead of "manipulative".
"[S]omewhat arbitrary" is a correct description. We take something that is fundamentally partially ordered (real-world events that may happen exactly at the same time), respect the partial order and extend it to a total order. The extension is arbitrary, but I fail to see the problem with that, or how it contradict anything we wrote?
Could you explain what bounds you are interested in and in what way you think anything is exaggerated? I would like to update the docs if necessary.
It is higher level, though, it is not something you would integrate into the kernel (like bus1 intends to be). OSX has decent IPC though, since it is based on the Mach research kernel from CMU, which was kind of based around the concept of messaging to begin with.
Although I have brought a lot of high level RPC/IPC systems into production I'm really having a hard time imagining where I could use this.
As far as I understand it gives me a bit more high level features (security, multicast) compared to other IPC primitives (sockets, pipes, ...). However once I'm going to use this my application (or high-level IPC framework) will be locked to Linux and no longer be portable to other platforms. So for any applications that should be at least halfway portable I would prefer something that works on the more common primitives (most likely sockets) and build something more powerful in a cross-plattform way on top ofi it (like HTTP, grpc, Thrift, ...).
A full featured low-level framework makes sense if I have a whole set of applications on top of it which is not intended to be portable and uses it exclusively. Something like the Android/iOS/... platform. But the current ones already have settled on their infrastructure (e.g. on Binder), and even if new ones come up there is a high possibility that they wouldn't like at least something in Bus1 and instead come up with their own solution.
... that's pretty much the point of ioctl - a syscall multiplexer in the context of a fd. 9 is pretty low - check out /dev/cdrom or /dev/dri/* if you want to see a lot.
ioctl is pretty much a generic syscall interface (other OSes have similar things). An awful lot of Linux subsystems export an awful lot of syscalls through it; probably easily a four or five figure number (literature typically claims that Linux has only a couple hundred syscalls).
With their "synchronize-local-clocks" approach, it doesn't.
They are using Lamport's algorithm to synchronize the clocks. However, Lamport's approach creates a _partial_ ordering, and to make that a _total_ ordering you need some mechanism to break "ties". For instance, the PID, or whatever.
The catch is that the relationship derived from this arbitrary tie-breaking mechanism has nothing to do with causality, and therefore the total order it imposes is only an artifact of the mechanism chosen.
Finding a tie-breaking mechanism that corresponds to the sending events is, in Lamport's own words, "not trivial".
Indeed that is how we break ties (not exactly the PID, but you get the idea).
The reason this works is that the only time we can have a tie is if there can be no causality between the events. I.e., the two sending events happen concurrently: the two ioctl calls overlap in time, so there would be no way for one to have caused the other.
I foresee a problem where people confuse the wording of "total order on all messages" in the wiki to mean there is a "global total order" - in other words, that bus1 solves distributed systems and we can all go home - and building buggy systems on this assumption. I'm not saying the concept is flawed or the implementation buggy, or anything like that.
PS. Neil Brown in the LWN article already conflates "global" and "total" order.
I have - I'm actually working with it on a multi-version IPC provider (totally unrelated to bus1 & friends). Is it relevant here? I know they're the latest and greatest, but they're not without problems either.
On messaging in general: this is yet another IPC api from RedHat. They really want to get an IPC api into the kernel over there. However, their proposals seem 'ignorant,' in the sense that plenty of IPC research and work has been done over the last half century (or more?), and their proposals seems focused on their own small use cases. They seem unaware of how others have tried to solve the IPC problem.
On why Linus doesn't want to integrate it: no kernel team is going to want to have yet another poorly-designed IPC stack to maintain even though hardly anyone uses it. That's been done, and IPC in Unix is a mess as a result. Any sane kernel dev would be resistant to this, unless the new proposal is absolutely beautiful, and you can look at it and say, "yeah, that is really great."
RedHat should go out, investigate the research that has been done, become experts on the topic. Learn everyone's IPC problems, not just they problems they have in their own insular community. Only then create an API that actually is in good taste.
Then begin the political work of getting people to adopt it. Start with the BSD teams. Start with the hobbyist OS devs at osdev.org. When the whole world agrees that it is a good thing, then Linus will put it in the kernel too.