PID 1 should be boring. Tiny, simple and boring. Any software project that raise...

XorNot · on Nov 1, 2015

Most of systemd is not in PID 1.

People making this complaint don't seem to have any idea what is in init normally or why you might want to add more stuff there (for example, where are you going to manage cgroup trees for system processes from?)

vezzy-fnord · on Nov 1, 2015

Yes, not charting out your module boundaries and just bundling the system and service state along with parsing and cgroup management in the same process is relatively unwise. Even Solaris SMF got it right by keeping init(8) small amidst the otherwise highly impressive feature set of the main service management, using contracts (the equivalent to cgroups, which actually predate them) outside PID 1.

So the same can be done on Linux. At least one system, OpenRC, has explicit support in such a manner.

(I take it you haven't read the architectural critique? Most pro-systemd arguments fall flat on their face in any event.)

sandGorgon · on Nov 1, 2015

systemd is NOT making the change to manage cgroups from PID1. Kernel is - systemd is just the first (and currently the only) one to comply with this change.

Legacy cgroup API is going away.

It is the same case for the /usr merge [1]. systemd is not forcing the change, but it is complying with the changes required and is getting blamed for it.

[1] http://www.freedesktop.org/wiki/Software/systemd/TheCaseForT...

vezzy-fnord · on Nov 1, 2015

Incorrect. The kernel mandates a single writer. It doesn't enforce any particular PID.

jo909 · on Nov 1, 2015

You are right, but the cgroup manager should still be something that starts very very early, so you can start components that make use of it. If literally everything your init system starts should automatically use cgroups, you have to start it with the init system or as the first thing the init system will start.

simoncion · on Nov 2, 2015

Remember that vezzy-fnord was responding to a comment that made the following erroneous statement:

> systemd is NOT making the change to manage cgroups from PID1. Kernel is...

I've heard systemd proponents assert that both udev and the cgroup manager must live in either kernel space or in PID 1, because to do otherwise would expose systemd to races or something while PID 1 started udev and/or the cgroup manager.

These are also erroneous statements. It's rather important to correct such statements, as we're dealing with a (sadly) highly politicized technical topic.

sandGorgon · on Nov 2, 2015

you are correct - I had conflated the PID1 argument with cgroup daemon, because as far as I stand.. it doesnt make a difference to me. There have been alternative managers like cgmanager [1] that lxc is bringing - which (surprisingly) work well with systemd as well [2] . Probably another reason not be scared of systemd ;)

[1] https://linuxcontainers.org/cgmanager/introduction/

[2] http://unix.stackexchange.com/questions/170998/how-to-create...

simoncion · on Nov 2, 2015

> There have been alternative managers like cgmanager [1] that lxc is bringing - which (surprisingly) work well with systemd as well...

Given that vezzy-fnord told you: [0]

> The kernel mandates a single [cgroup] writer.

you must have come to the understanding that the way cgmanager gets its work done on a systemd system is to pass control group management requests through to systemd's control group manager. Because the ultimate plan is to have a single CG manager, either cgmanager, or systemd's control group manager handles the CG management requests. There's no other way it can work. Because the systemd project is heavily vertically integrated, the odds that cgmanager uses its own CG manager are near-zero.

> you are correct - I had conflated the PID1 argument ... because as far as I stand.. it doesnt make a difference to me.

Yep. Your arguments and assertions have been almost exclusively soft and non-technical. Here's some advice: When people are making comments about technical topics, don't join in the conversation unless your level of understanding on the topic is just about as deep as that of those who are speaking. [1]

> Probably another reason not be scared of systemd ;)

Given the timestamp on this comment, it seems unlikely that you've not had the opportunity to read my reply [2] to one of your much earlier comments. Given that I lay out five solid non-fear-based arguments for why one might be worried about the systemd project, your assertion that I shouldn't be "scared of systemd" is extremely dismissive.

[0] https://news.ycombinator.com/item?id=10486471

[1] Unless -of course- you're joining the conversation to learn more about the topic. In that case, refrain from making uninformed assertions, ask clarifying questions about things you are unsure about, and make the limits of your knowledge clear up front.

[2] https://news.ycombinator.com/item?id=10488921

the_why_of_y · on Nov 1, 2015

This question has always appeared to me as academic, with little or no real-world relevance.

If your service manager process were to crash, what are you going to do about it?

If you restart the service manager, it won't know what state the system is in, which services are running and which are not, which services were running at the time when it crashed but then stopped just before it restarted, etc.

How are you going to do that in a race-free and reliable way that is actually better in practice than the alternative (reboot)?

And if your service manager is a single point of failure, it doesn't matter much which PID it's running it, it has to be perfectly reliable anyway (just like the kernel).

vezzy-fnord · on Nov 1, 2015

There's been a lot of research in fault recovery through message logging and checkpoint-based methods that could be applied here, e.g. [1]. Of course, you use "academic" as a snarl world, so I don't think anything will convince you.

The idea that the service manager would not be able to know the system and service states is completely false. Solaris SMF is a design that does, via its use of the configuration repository. Simpler designs can then deduce enough metadata from the persistent configuration in the supervisor tree. There's many possible approaches.

The idea that such fault recovery is implausible is a naive one that only one unfamiliar with the research literature could espouse.

If we take your logic to its conclusion, we should just run everything in ring 0 with a single unisolated address space, because hey, anything can fail. Component modularization and communication boundary enforcement is the first step to fault isolation, which is the first step to fault tolerance.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.52....

the_why_of_y · on Nov 2, 2015

That's very interesting, I didn't know SMF could do that.

Let's see... init(1) is apparently restarted by the Solaris kernel automatically, which is different from Linux, no automatic kernel panic.

https://github.com/illumos/illumos-gate/blob/master/usr/src/...

  * State File and Restartability
  *   Premature exit by init(1M) is handled as a special case by the kernel:
  *   init(1M) will be immediately re-executed, retaining its original PID.  (PID
  *   1 in the global zone.)  To track the processes it has previously spawned,
  *   as well as other mutable state, init(1M) regularly updates a state file
  *   such that its subsequent invocations have knowledge of its various
  *   dependent processes and duties.

Then init(1) and SMF's svc.startd(1) seem to have a bit of a relationship:

  * Process Contracts
  *   We start svc.startd(1M) in a contract and transfer inherited contracts when
  *   restarting it.  Everything else is started using the legacy contract
  *   template, and the created contracts are abandoned when they become empty.

So init(1) creates the initial contract for svc.startd(1), then the latter creates nested contracts below that. (Aside: doing the equivalent cgroup manipulation on Linux would run afoul of the notorious one-writer rule.)

If svc.startd(1) crashes, init(1) will restart it inside the existing contract of the crashed instance, so it can find its spawned services (in nested contracts), as well as its companion svc.configd(1).

Now during startup, svc.startd(1) calls ct_event_reset(3), and this is really the interesting bit here:

https://github.com/illumos/illumos-gate/blob/master/usr/src/...

     The ct_event_reset() function resets  the  ___location  of  the
     listener to the beginning of the queue. This function can be
     used to re-read events, or read events that were sent before
     the  event endpoint was opened. Informative and acknowledged
     critical events, however, might have been removed  from  the
     queue.

I'm willing to entertain the idea that with this feature, SMF can properly track the state of the services that its previous incarnation launched, even if it crashed in the middle of handling an event.

With any luck it will also handle the situation if a supervised process exits after the service manager crashes, and before it is restarted, as the contact should buffer the event in the kernel until it is read.

Notably this is a Solaris specific kernel feature of the contract(4) filesystem; does Linux have anything equivalent in cgroups or somewhere?

The other SMF process, svc.configd, uses an SQLite database (actually 2, a persistent one and a tmpfs one for runtime state), so it's plausible that it's properly transactional.

> If we take your logic to its conclusion, we should just run everything in ring 0 with a single unisolated address space, because hey, anything can fail.

That is an entirely erroneous extrapolation, as I never claimed any other single point of failure [in user-space] than the service manager.

JdeBP · on Nov 2, 2015

> I never claimed any other single point of failure [in user-space] than the service manager.

If all of one's system and service management relies upon a system-wide software "bus", then another similar problem is what to do when one has restarted the "bus" broker service and it has lost track of all active clients and servers.

* https://bugs.freedesktop.org/show_bug.cgi?id=89847

* https://github.com/NixOS/nixpkgs/issues/7633

Related problems are what to do when one cannot shut down one's log daemon because the only way to reach its control interface is via a "bus" broker service, and the broker in turn relies upon logging being available until it is shut down. Again, this is an example of engineering tradeoffs. Choose one big centralized logger daemon for logging everything, and this complexity and interdependence is a consequence. A different design is to have multiple log daemons, independent of one another. With the cyclog@dbus service logging to /var/log and that log daemon's own and the service manager's log output being logged by a different daemon to /run/system-manager/log/, one can shut down the separate logging services at separate points in the shutdown procedure.

* https://github.com/systemd/systemd/issues/867

* https://bugzilla.redhat.com/show_bug.cgi?id=1214466

JdeBP · on Nov 1, 2015

> If your service manager process were to crash, what are you going to do about it?

With the assistance of the SRC_kex.ext extension, you re-establish knowledge of all running services in a new service manager.

* http://www-01.ibm.com/support/knowledgecenter/ssw_aix_53/com...

Or you make the other engineering tradeoffs.

* https://news.ycombinator.com/item?id=10216906

* https://news.ycombinator.com/item?id=8384251

the_why_of_y · on Nov 2, 2015

It's literally named SRC_kex.ext? So... would it be fair to say that part of SRC is implemented in kernel-space? The manual page gives me this impression.

That could very well be a solution to the problem, but perhaps not one that vezzy-fnord was hoping for.

I actually wanted to link the second of your linked comments but couldn't find it unfortunately.

bossrat · on Nov 1, 2015

you are exactly on point.

the unix way is simplicity and transparency. systemd is complex and opaque.

it's ok to have systemd's goals, but an additional goal should be "not a huge monolith"

Avshalom · on Nov 1, 2015

The unix way is also a different incompatible implementation of regex in every utility and a thousand interesting and dangerous modes of failure in the event of whitespace

Systemd has issues I'm sure and I don't trust poettering's software further than I can throw him but not being 'unix'-y isn't a strike against it.

_b8r0 · on Nov 1, 2015

> The unix way is also a different incompatible implementation of regex in every utility and a thousand interesting and dangerous modes of failure in the event of whitespace

Then_stop_using_whitespace_and_that_problem_is_solved_for_some_values_of_solved_;)

GFK_of_xmaspast · on Nov 1, 2015

"You're doing it wrong and should have known better" is also the unix way.

emmelaich · on Nov 1, 2015

Things should be as simple as possible but no simpler.

We all agree with you on the 'simple as possible' but you need to spend some effort on the 'no simpler' part.

otterley · on Nov 1, 2015

What is the criteria by which you classify something as "opaque"? The documentation for systemd and its utilities is second to none.

vezzy-fnord · on Nov 1, 2015

The documentation for systemd and its utilities is second to none.

God help the software industry if that is indeed the case (of course, it is not).

witty_username · on Nov 1, 2015

How so? I was able to easily use systemd and the man pages seem decent. Maybe not the best documentation, but it seems reasonable.

simoncion · on Nov 2, 2015

In order to see the quality difference, you have to compare the docs to another project that you make extensive use of.

I know that Postgresql's and Erlang's documentation is really rather good. So, go use Postgresql or Erlang for a slightly non-trivial project, then -now that you know about the topic that the docs cover- compare the quality of the systemd documentation to the documentation of either other project.

Pay special attention to the documentation provided to folks who want to understand the internals of systemd, Postgres, or Erlang. AIUI, [0] systemd's internals documentation is woefully lacking.

[0] And as has been repeated by everyone I've ever seen try to use said documentation.

witty_username · on Nov 2, 2015

True, systemd's documentation isn't very bad though, it's reasonable (lack of internal documentation is pretty common in many open source projects).

simoncion · on Nov 2, 2015

Remember that this thread was sparked by otterly's comment: [0]

> The documentation for systemd and its utilities is second to none.

What little I've seen of systemd's user/sysadmin documentation leads me to believe that it is okay. I also understand that documentation is often the least interesting part of any project, and often sorely neglected.

However. Everyone I've heard of that tests out the Systemd Cabal's claim that

"Systemd is not monolithic! Systemd is fully documented and modular, so any sufficiently skilled programmer can replace any and all parts of it with their own implementation."

by attempting to make a compatible reimplementation has failed at their task [1] and reported that the internals documentation is woefully insufficient.

When you're writing software for general consumption, good user documentation is a requirement. After all, if noone can figure out how to use your system, "noone" will use it.

When you also claim that you go out of your way to provide enough documentation to allow others to understand the relevant parts of your internals, and be able to write compatible, independent implementations of your software, the quality of the documentation about your internals is now in scope for evaluation and criticism.

[0] https://news.ycombinator.com/item?id=10485095

[1] I am very aware that this task is made harder by the fact that it is large and thankless. :)

otterley · on Nov 1, 2015

What specifically do you find lacking in it?

digi_owl · on Nov 2, 2015

Perhaps. Observed a lovely exchange a while back where a database was being publicly shamed for producing a poor unit file (i think they actually had the unit file launch a shell script that fired up the database).

Their response given was that it was the only way for them to avoid tying their database to the systemd signaling lib.

This was counted by one of the systemd devs claiming they could just use a socket that systemd provides.

But when i poked at the documentation, the only place such a "option" was mentioned was at the bottom of the man file for said lib. And it was presented as a note on the internal workings of systemd.

And you will find warning after warning about not using systemd internals, as the devs reserve the right to change the behavior of those internals at any time.

otterley · on Nov 2, 2015

Right, you're supposed to use the published interfaces. There's nothing particularly novel about that -- neither Microsoft nor Apple will support you if you don't use their public APIs, and in fact Apple will refuse to publish your software in their app store if you don't.

With respect to socket activation, a pretty useful tutorial, published by the systemd author, can be found here: http://0pointer.de/blog/projects/socket-activated-containers...

The DBus API can be found here: http://www.freedesktop.org/wiki/Software/systemd/dbus/

simoncion · on Nov 2, 2015

> Right, you're supposed to use the published interfaces.

You missed the point. I'll isolate each component for you:

"[A] database was being publicly shamed for producing a ... unit file ... [that used] a shell script [to start] the database[.]"

"[The database devs mentioned] that it was the only way for them to avoid tying their database to the systemd signaling lib."

"[O]ne of the systemd devs [mentioned] they could just use a socket that systemd provides."

"[But this] ... 'option' ... was presented [at the bottom of the man page for the systemd signalling lib that the database authors were trying to not use] as a note on the internal workings of systemd."

"[You] will find warning after warning about not using systemd internals, as the devs reserve the right to change the behavior of those internals at any time."

So, this "option" -as documented- is something that you cannot rely on, as it is subject to change at any time, without warning.

> With respect to socket activation, a pretty useful tutorial...

Tutorials are no substitute for documentation. Documentation describes the contracts that the software commits to. Tutorials can exploit edge cases and undocumented behaviors without warning. Moreover, if the docs say that the tutorial is demonstrating a feature that's subject to change at any time, you'd have to be a madman to rely on it.

> The DBus API can be found here...

If the database devs don't want to depend on the systemd signalling lib, I bet they really don't want to depend on DBus. This might come as a surprise to some, but many servers don't run a DBus daemon.

davidstrauss · on Nov 2, 2015

Directly talking to the notify socket is not considered using systemd internals. It is documented as a stable, public interface:

https://wiki.freedesktop.org/www/Software/systemd/InterfaceS...

Socket activation doesn't have any systemd-based interface. You just get a file descriptor passed in the normal Unix way. The systemd library functions related to socket activation are utility functions for examining the inherited socket, but they are just wrappers for any other way you might do so.

You can configure daemons like nginx or PHP-FPM to use sockets inherited from systemd instead of their own, and it works fine. They don't have any specific support for systemd socket activation, nor do they need to. They can't even tell the difference between the systemd sockets and ones they'd get on a configuration reload.

simoncion · on Nov 2, 2015

> Directly talking to the notify socket is not considered using systemd internals.

Then -according to digi_owl's report- it sounds like the documentation for the signalling lib should be fixed.

davidstrauss · on Nov 3, 2015

The closest I could find in the docs to what digi_owl said is the following:

> Internally, these functions send a single datagram with the state string as payload to the AF_UNIX socket referenced in the $NOTIFY_SOCKET environment variable. If the first character of $NOTIFY_SOCKET is "@", the string is understood as Linux abstract namespace socket. The datagram is accompanied by the process credentials of the sending service, using SCM_CREDENTIALS.

I can see how someone would be reluctant to rely on that, even given the interface promise and the nudging of the systemd developers. To be more consistent with what's a stable, public interface and the admonition to avoid internals, I would probably drop the word "internally."

Indeed, I've created a pull request: https://github.com/systemd/systemd/pull/1759

simoncion · on Nov 3, 2015

Props for fixing the documentation. :D

However, even with your change, I still read that section as describing implementation detail that's not guaranteed to be stable. If that note describes a stable, documented protocol, a link back to the documentation of that protocol would be helpful and reassuring.

JdeBP · on Nov 3, 2015

For those who want context and specifics: The whole argument from some of the people who didn't want to rely upon something that is explicitly described as "internal" is set out at length in places like https://news.ycombinator.com/item?id=7809174 .

e12e · on Nov 1, 2015

> The documentation for systemd and its utilities is second to none.

  $ man none
  No manual entry for none

Sounds about right.

sobkas · on Nov 1, 2015

sysvinit provided me with so much excitement that now I will take anything that just isn't sysvinit.

simoncion · on Nov 1, 2015

It only replaces sysvrc, [0] but I find the OpenRC-powered systems I admin to be quite sane and easy to manage.

[0] But that's okay. I strongly suspect that when most folks say "sysvinit", they really mean "sysvrc". Hell, I used to be one of those folks until a while back.

digi_owl · on Nov 2, 2015

I am worried even more that people people may well have used upstart for a number of years and still think they are using sysv, because upstart could grok the scripts without change.

But then i am sitting here using a distro that boot by way of a couple of flat files. Frankly i kinda like it, but then i grew up fiddling with autoexec.bat...

davidstrauss · on Nov 3, 2015

> because upstart could grok the scripts without change

I don't believe that's the case. The "service" command (which is part of the sysvinit-utils package, not Upstart) invokes either Upstart or SysV init as necessary, but Upstart itself has no awareness of the SysV init world. You couldn't have an Upstart service depend on a SysV init one, list SysV service status with Upstart, or enable a SysV service through Upstart.

In case you're curious, the wrapping on the systemd side is more comprehensive. SysV scripts appear as units, and systemd parsed the semi-standard metadata at the top of most SysV init files to determine when the service should start if enabled (translating from the traditional run levels). As units, the SysV init scripts are possible to enable/disable, start/stop, and list using the standard systemd commands. They can also participate in dependency graphs alongside systemd units.

api · on Nov 1, 2015

I've come around to being a systemd skeptic too after initially supporting it. It's just so over engineered, confusing, hard to use. Today in the age of user experience anything that wants to replace the (also ugly) old init should be a huge step forward and a breath of fresh air. We're replacing bash nastiness with over engineered "enterprisey" nastiness.