Bus1 – Kernel Message Bus

ktRolster · on Oct 27, 2016

The API seems bad to me, the ioctl() is ridiculously overloaded (disconnect, connect, read, and write all go through ioctl(), for example). As an API, this doesn't pass Linus' "good taste" test. As a solution, it's more complicated than necessary.

On messaging in general: this is yet another IPC api from RedHat. They really want to get an IPC api into the kernel over there. However, their proposals seem 'ignorant,' in the sense that plenty of IPC research and work has been done over the last half century (or more?), and their proposals seems focused on their own small use cases. They seem unaware of how others have tried to solve the IPC problem.

On why Linus doesn't want to integrate it: no kernel team is going to want to have yet another poorly-designed IPC stack to maintain even though hardly anyone uses it. That's been done, and IPC in Unix is a mess as a result. Any sane kernel dev would be resistant to this, unless the new proposal is absolutely beautiful, and you can look at it and say, "yeah, that is really great."

RedHat should go out, investigate the research that has been done, become experts on the topic. Learn everyone's IPC problems, not just they problems they have in their own insular community. Only then create an API that actually is in good taste.

Then begin the political work of getting people to adopt it. Start with the BSD teams. Start with the hobbyist OS devs at osdev.org. When the whole world agrees that it is a good thing, then Linus will put it in the kernel too.

pcwalton · on Oct 27, 2016

I don't have much of an opinion on how good this particular proposal is, but it is important to have a sensible IPC system for Linux. I see a lot of resistance to this, with claims that Unix sockets and POSIX are good enough. As the maintainer of Rust's ipc-channel, which wraps all this stuff, I strongly disagree. The complexity of the Linux implementation of ipc-channel [1] has been absurd compared to the Mac implementation, which uses Mach [2]. Don't let the similar lines-of-code count fool you—90% of the bugs that I've seen go by have been in the POSIX backend, necessitating things like manual fragmentation to get around size limits in the kernel, weird structure alignment rules in the cmsg API, file descriptor limits, ENOBUFS stuff, etc. etc. By contrast, the Mach implementation has more or less "just worked".

At this point I don't really care what the Linux kernel decides on, as long as it decides on something. kdbus or Binder or anything would have been fine; the people who are truly hurt by kernel politics are us developers.

[1]: https://github.com/servo/ipc-channel/blob/master/src/platfor...

[2]: https://github.com/servo/ipc-channel/blob/master/src/platfor...

ktRolster · on Oct 27, 2016

The reason people think that IPC is not important (and I agree with you, btw) is because so often bad designs are built around IPC.

The reason Linux is opposed to integrating IPC into the kernel is because it's been done more than once, poorly, resulting in a lousy API that needs to be supported forever by kernel devs.

So come up with a good solution.

TD-Linux · on Oct 27, 2016

What previous "bad IPC" mechanisms were integrated into mainline Linux? Only one I can think of is sysv shared memory.

ktRolster · on Oct 27, 2016

All of them :)

Pipes are too specialized, localSockets are kind of a pain, etc.

mfukar · on Oct 27, 2016

Most recently, kdbus. It wasn't even ABI-compatible.

dezgeg · on Oct 27, 2016

kdbus was never merged to mainline.

mfukar · on Oct 27, 2016

Sorry, I took 'integrated' to mean something other than merged. No, kdbus was never part of the kernel repo.

darfs · on Oct 27, 2016

Maybe that's why it's Bad :?

pjc50 · on Nov 5, 2016

So .. could we have Mach reimplemented in Linux?

zarvox · on Oct 27, 2016

My impression is that bus1 appears to have taken the (copious) feedback from the kdbus debacle and actually applied it and looked at other platforms' IPC to build a novel IPC system for Linux worth using. There's a talk [1] about the design of bus1, and comparison against IPC on other platforms where IPC is saner, and how the capability model is the right design for IPC - composable, understandable, and secure-by-default. It strikes me that the bus1 devs arrived at their design after doing the things you suggested! :)

Is there something I'm missing? What might an ideal IPC API look like to you?

[1] - https://www.youtube.com/watch?v=6zN0b6BfgLY

pmontra · on Oct 27, 2016

> What might an ideal IPC API look like to you?

Erlang's internal IPC is doing it pretty well since before there was a Linux kernel. I don't know if that approach can be ported into the kernel and how complex they are behind the scenes, but spawn / send / receive are apparently simple concepts.

In Erlang: http://erlang.org/doc/getting_started/conc_prog.html#id68696

In Elixir: http://elixir-lang.org/getting-started/processes.html

swhipple · on Oct 27, 2016

Authentication, authorization, and resource quotas for agents are not really addressed in the Erlang model, but would be expected for IPC on a Unix-like system.

pmontra · on Oct 27, 2016

Do we really want to put all of that into the kernel and not implementing it in userland? I get the feeling that it's too much application dependent, not enough general principles.

Maybe I just don't understand the problem they are out to solve.

swhipple · on Oct 27, 2016

The reason for a bus-style IPC implemented in the kernel is the same that sendfile(2) exists. I doubt anyone thinks it's the pinnacle of great design, but reduced copies and context switching for real application workload: sometimes the more 'proper' design is sacrificed for practicality.

ktRolster · on Oct 27, 2016

What might an ideal IPC API look like to you?

I can't answer that because I don't have a PhD related to IPC, and I haven't done the necessary research to fully understand the field. I have looked at how some other systems are doing it, but I know that is not a strong enough knowledge-base to build a good IPC system.

I do have enough understanding to know that when I look at a good IPC api, I will look at it and say, "wow, that's really nice."

jononor · on Oct 27, 2016

Which IPC apis do you consider really nice? And which not?

fiddlerwoaroof · on Oct 27, 2016

What do you think of zeromq?

bigger_cheese · on Oct 27, 2016

In the LWN article linked by another commentator:

"Tom Gundersen, one of the authors, acknowledged that having a dedicated system call to create a peer, and to perform the various other operations that currently use ioctl(), might make more sense for eventual upstream submission. Devices and ioctl() have been used so far because they make out-of-tree development easier."

So I think using ioctl is a prototyping concession.

I'm not an expert on this matter but I think it is the Android devs that want it not Red Hat it is similar to the android specific BINDER IPC mechanism but with multicast support.

ktRolster · on Oct 27, 2016

I said 'RedHat' because the people associated with the repository are from RedHat: https://github.com/orgs/bus1/people Tom Gundersen works at RedHat, too.

There are certainly many people who want IPC, and a good implementation would be very welcome.

MBCook · on Oct 27, 2016

Other commenters have made good points, I'll mention one I haven't seen yet.

Linus has mentioned multiple times in the past that he likes to merge code it is getting used even if it's not ideal. That way, it's in the tree and can be improved instead of continuing to change outside of the kernel without the core developers' input.

That's what Android's Binder is. We also have kdbus, so we know that something like this was seen as useful before. Obviously android is used on a HELL of a lot of devices. It's not considered suitable for merge into mainline for various reasons, but much like other Android specific technology something very similar to it does make sense.

So as long as they do a decent job at this there's a good chance that it will get merged into the kernel. Overtime Google will help improve it, as well as move things out of Binder or turn Binder into a layer on top of it.

Every piece of widely used external (kernel level) code that ends up in the kernel is a win. The only other option would be to wait for the extra repository to get up to standards and developed the way the kernel maintainers would like, which probably won't happen without their constant input. but this is good for Google because it means they get more eyes on their patches, and don't have to carry as much of a load when updating versions and making new releases.

This is good for everyone.

Asooka · on Oct 27, 2016

One very important thing to mention is that Binder is an RPC mechanism while Bus1 aims to be an IPC mechanism. The crucial consequence of that is, Binder is synchronous - you make an RPC call and you wait for the answer, while Bus1 is asynchronous - you send some data to an address and later on some data is returned to you on another address. That also necessitates the design that with Binder, the called process steals work from the calling process and temporarily executes with the calling process's priority until it's done. Ideally I'd like to see Bus1 also cover Binder RPC calls, or at least some merge of the two technologies.

tomegun · on Oct 27, 2016

Adding support for priority inheritance would be a natural extension to bus1.

Q6T46nT668w6i3m · on Oct 27, 2016

I assume their proposal will include dedicated system calls and their usage of ioctl is to simplify their proof-of-concept.

lambda · on Oct 27, 2016

This seems needlessly negative; and you seem to be inferring a lot about what Linus wants without him having said anything. The one thing he's asked about so far on this patch series is how it handles resource exhaustion/denial-of-service issues, and they have a reasonable reply, though there may need to be some iteration there: http://www.mail-archive.com/[email protected]/msg...

The last attempt, kdbus, was definitely way too complicated and funky. It went through a few rounds of review and was never merged for a good reason.

This redesign from the ground up does a lot of what you are asking for. It derives inspiration from other, well respected, capability based systems, as well as other IPC systems for the Linux kernel, like binder, which is used on one of the most popular Linux based consumer platforms, Android.

Linux already supports the IPC mechanisms supported by most of the BSDs; pipes, Unix ___domain sockets, POSIX IPC. But these actually have a number of shortcomings for building reliable, efficient IPC between processes.

Another inspiration that they cited as inspiration in the announcement of their talk (https://www.linuxplumbersconf.org/2016/ocw/proposals/3819), used by the most popular consumer-oriented BSD, is Mach message ports, used on in XNU, the macOS/iOS kernel. However, those are not an ideal API to adopt wholesale, as shown by the serious vulnerabilities based on trying to integrate them with a monolithic kernel: https://googleprojectzero.blogspot.com/2016/10/taskt-conside...

I would say that they have done most of what you ask. They have investigated the existing solutions, and found them wanting. They have already gone through one design that they threw away due to it being too complicated. They have picked a model that is widely respected and implemented in one form or another on most systems, a capability based model. There's a little bit of new invention here due to the ability to multicast messages and their message ordering guarantees, and probably some room for iterating, but overall, this looks like a pretty promising IPC system compared to the fairly overwrought kdbus.

And yeah, an ioctl interface may not be the prettiest, but they've said that they're using that over syscalls just because it's easier to implement and iterate out of tree that way before getting it merged, but are willing to switch to a sycall based interface if that's preferred: http://www.mail-archive.com/[email protected]/msg...

Besides the ioctl vs. syscall issue, which is pretty much just an implementation detail that can be solved with a wrapper API or done away with before merging, what do you find not in good taste about this proposal?

ktRolster · on Oct 27, 2016

I would say that they have done most of what you ask.

No.

gribbly · on Oct 27, 2016

>On why Linus doesn't want to integrate it:

Where have you read that Linus doesn't want to integrate it ?

From the LKML response it sounded as if he wanted more explanation of how it solves certain things.

mfukar · on Oct 27, 2016

If only Red Hat actually tried to accommodate others' use cases, Linux systems would be so much better.

tomegun · on Oct 27, 2016

What makes you think we are not? What use case would you like to be taken into account in bus1 that is not? Open to suggestions (that is the point of an RFC after all ;) ).

mfukar · on Oct 27, 2016

> What makes you think we are not?

History. The direction past RedHat-led projects have taken.

> What use case would you like to be taken into account in bus1 that is not?

This is no time for me to make any suggestions. ;-) In the future, perhaps.

X86BSD · on Oct 27, 2016

That's a very good suggestion. But the reality is that will never, ever happen. Linus and the Linux community as a whole exist in an echo chamber.

They never resort to looking outside their little bubble world to see how others solved problems they are trying to fix. They simply are not capable of that. They constantly reinvent the wheel and reinvent it poorly. They should have simply adopted kqueue, ZFS, dtrace, jail(), etc. but didn't. They choose to refuse to see how others solved problems and use the good solutions of others.

It's one of the most damaging things about that community.

gribbly · on Oct 27, 2016

>ZFS, dtrace

These are both using a incompatible license (CDDL), they can't be merged into mainline.

X86BSD · on Oct 27, 2016

The incompatible license is the GPL. I think that's where your problem lies. IllumOS, FreeBSD had no problem at all adopting them.

gribbly · on Oct 27, 2016

Linux was GPL licensed long before these projects were open-sourced under CDDL, and there are obvious reasons for why Sun at the time would not want to have their technology advantages incorporated into Linux (like it being their main competitor to whom they were losing in the market place).

Crafting a new GPL incompatible license for ZFS and DTrace resulted in Linux not being able to incorporate them.

X86BSD · on Oct 27, 2016

You are simply dead wrong here. And Bryan Cantrill has said so publically.

Sun NEVER intended for Linux not to adopt it. They wanted it to be ubiquitous across all open source platforms.

IMO they should have two claused FreeBSD licensed it but they didn't.

gribbly · on Oct 27, 2016

Bryan Catrill says one thing.

Danese Cooper who was responsible for drafting the actual license while at Sun says another thing.

Obvious business sense stands firmly with Danese, it would be very stupid of Sun to hand over the best technological advantages of Solaris to their main competitor, which was Linux, while struggling against it in the market.

Open-sourcing Solaris was a last ditch effort from Sun to attract mindshare (and eventually market share) back from Linux, that plan would be doomed beforehand if Linux could just pick the best parts of Solaris for inclusion.

gtaylor · on Oct 27, 2016

I didn't think they could adopt ZFS due to licensing?

SFJulie · on Oct 27, 2016

In term of linux design ioctl are natural candidates as an underlying mechanism for multi-process communication.

Quoting Steven Doyle

ioctl can be guaranteed by the kernel to be atomic. If a driver grabs a spinlock during the ioctl processing then the driver will enter atomic context and will remain in atomic context until it releases the spinlock. http://lwn.net/Articles/274695/

cooperative multitasking in critical functioning requires thread safety and atomicity.

What, bothers me is what the fuck is an iovec! That is the most important part of the norm and yet it is not defined. My fear is that to accommodate «industry grade level of developers» they will use dynamically allocated structures vs fixed size structure. And we all know that malloc in user space is already the door to hell, but in kernel space, it is a direct nightmare.

What cringes me too is it is a distributed system (and I played quite a lot with them) and they say they have tackled the problem of global ordering of the messages. Well, be it on the network, be it on silicium, I never saw anyone achieve this feature.

I fear they are over promising and they will under-deliver

pdkl95 · on Oct 27, 2016

> What cringes me too is it is a distributed system

I still don't understand why they don't use (or extend, if necessary) TIPC, which is already a distributed IPC protocol that is already in the kernel. Why build a library around an existing feature that has already had a lot of testing when you can say NIH and design something new with an entirely new set of bugs?

http://www.spinics.net/lists/netdev/msg190623.html

tomegun · on Oct 27, 2016

> What, bothers me is what the fuck is an iovec!

https://www.gnu.org/software/libc/manual/html_node/Scatter_0...

mfukar · on Oct 27, 2016

> ioctl can be guaranteed by the kernel to be atomic. If a driver grabs a spinlock during the ioctl processing then the driver will enter atomic context and will remain in atomic context until it releases the spinlock.

That's a red herring though. A new syscall implementation can also provide the same guarantees.

> What, bothers me is what the fuck is an iovec!

Pretty sure they're referring to Berkeley-style UIO. Any discussion about those will soon devolve in other types of IPC, in my experience..

> What cringes me too is it is a distributed system (and I played quite a lot with them) and they say they have tackled the problem of global ordering of the messages. Well, be it on the network, be it on silicium, I never saw anyone achieve this feature.

They are claiming there's no global synchronization and a global order. That is textbook impossible, and leads me to believe what's written on the site is either waaay misunderstood or waaay manipulative.

tomegun · on Oct 27, 2016

> They are claiming there's no global synchronization and a global order.

Need to update your textbook ;) http://research.microsoft.com/en-us/um/people/lamport/pubs/t...

In particular, what we did is described here: https://github.com/bus1/documentation/wiki/Message-ordering

If anything is unclear or misleading, please let me know and I'll try to clarify.

mfukar · on Oct 27, 2016

I've read both, thanks. I clearly remember the fact the total ordering is "somewhat arbitrary" in Lamport's own words, which is what I pointed out here [https://news.ycombinator.com/item?id=12803907], too.

I admit I haven't read the implementation to see what kind of bounds you derive, and I couldn't find them in the wiki either. So, I think I'll go with "accidentally exaggerated" instead of "manipulative".

tomegun · on Oct 27, 2016

"[S]omewhat arbitrary" is a correct description. We take something that is fundamentally partially ordered (real-world events that may happen exactly at the same time), respect the partial order and extend it to a total order. The extension is arbitrary, but I fail to see the problem with that, or how it contradict anything we wrote?

Could you explain what bounds you are interested in and in what way you think anything is exaggerated? I would like to update the docs if necessary.

mfukar · on Oct 27, 2016

I could. I would rather put it in an email or PR. I'll try to put it together as soon as I have some time.

tomegun · on Oct 27, 2016

Thanks.

CalChris · on Oct 26, 2016

A discussion of where Bus1 fits in the bestiary of Linux IPC proposals:

Bus1: a new Linux interprocess communication proposal

https://lwn.net/Articles/697191/

agumonkey · on Oct 27, 2016

Someone on twitter suggested that Apple libxpc should be used. I never heard of it but if someone did, I'll be happy to know more about it.

ktRolster · on Oct 27, 2016

Here you go: https://developer.apple.com/library/content/documentation/Ma...

It is higher level, though, it is not something you would integrate into the kernel (like bus1 intends to be). OSX has decent IPC though, since it is based on the Mach research kernel from CMU, which was kind of based around the concept of messaging to begin with.

Applescript has interesting IPC concepts as well.

Matthias247 · on Oct 27, 2016

Although I have brought a lot of high level RPC/IPC systems into production I'm really having a hard time imagining where I could use this.

As far as I understand it gives me a bit more high level features (security, multicast) compared to other IPC primitives (sockets, pipes, ...). However once I'm going to use this my application (or high-level IPC framework) will be locked to Linux and no longer be portable to other platforms. So for any applications that should be at least halfway portable I would prefer something that works on the more common primitives (most likely sockets) and build something more powerful in a cross-plattform way on top ofi it (like HTTP, grpc, Thrift, ...).

A full featured low-level framework makes sense if I have a whole set of applications on top of it which is not intended to be portable and uses it exclusively. Something like the Android/iOS/... platform. But the current ones already have settled on their infrastructure (e.g. on Binder), and even if new ones come up there is a high possibility that they wouldn't like at least something in Bus1 and instead come up with their own solution.

jojo3000 · on Oct 27, 2016

There was also a presentation on Bus1 on the systemd.conf 2016: https://www.youtube.com/watch?v=6zN0b6BfgLY

jacquesm · on Oct 27, 2016

Previous thread on HN:

https://news.ycombinator.com/item?id=10708177

ndesaulniers · on Oct 27, 2016

So, Binder? And man, when people call ioctl a bastardized syscall... looks like 9 syscalls in one!

TD-Linux · on Oct 27, 2016

... that's pretty much the point of ioctl - a syscall multiplexer in the context of a fd. 9 is pretty low - check out /dev/cdrom or /dev/dri/* if you want to see a lot.

dom0 · on Oct 27, 2016

ioctl is pretty much a generic syscall interface (other OSes have similar things). An awful lot of Linux subsystems export an awful lot of syscalls through it; probably easily a four or five figure number (literature typically claims that Linux has only a couple hundred syscalls).

ausjke · on Oct 27, 2016

how is this related to systemd(not just technically)? if anything.

solidsnack9000 · on Oct 27, 2016

> Also in the case where there can be no causal relationship, we are guaranteed

> a global order. In case two events happend concurrently, there can never be

> any inconsistency in which occurred before the other. By way of example,

> consider two peers sending one message each to two different peers, we are

> guaranteed that both the recipient peers receive the two messages in the same

> order, even though the order may be arbitrary.

How could that possibly work?

sciurus · on Oct 27, 2016

See https://github.com/bus1/documentation/wiki/Message-ordering for details on message ordering.

mfukar · on Oct 27, 2016

With their "synchronize-local-clocks" approach, it doesn't.

They are using Lamport's algorithm to synchronize the clocks. However, Lamport's approach creates a _partial_ ordering, and to make that a _total_ ordering you need some mechanism to break "ties". For instance, the PID, or whatever.

The catch is that the relationship derived from this arbitrary tie-breaking mechanism has nothing to do with causality, and therefore the total order it imposes is only an artifact of the mechanism chosen.

Finding a tie-breaking mechanism that corresponds to the sending events is, in Lamport's own words, "not trivial".

tomegun · on Oct 27, 2016

Indeed that is how we break ties (not exactly the PID, but you get the idea).

The reason this works is that the only time we can have a tie is if there can be no causality between the events. I.e., the two sending events happen concurrently: the two ioctl calls overlap in time, so there would be no way for one to have caused the other.

What problem do you see with this?

mfukar · on Oct 27, 2016

I foresee a problem where people confuse the wording of "total order on all messages" in the wiki to mean there is a "global total order" - in other words, that bus1 solves distributed systems and we can all go home - and building buggy systems on this assumption. I'm not saying the concept is flawed or the implementation buggy, or anything like that.

PS. Neil Brown in the LWN article already conflates "global" and "total" order.

jasonwatkinspdx · on Oct 27, 2016

I'd suggest reading the hybrid logical clocks paper.

mfukar · on Oct 27, 2016

I have - I'm actually working with it on a multi-version IPC provider (totally unrelated to bus1 & friends). Is it relevant here? I know they're the latest and greatest, but they're not without problems either.

jasonwatkinspdx · on Oct 27, 2016

Based on your comments about lamport clocks, etc, I thought you'd find it interesting.

mfukar · on Oct 28, 2016

It is indeed. Cool applications too. I remember at least GUN - a graph db engine - uses them to great effect, and its author hangs out here, iirc.