People making this complaint don't seem to have any idea what is in init normally or why you might want to add more stuff there (for example, where are you going to manage cgroup trees for system processes from?)
Yes, not charting out your module boundaries and just bundling the system and service state along with parsing and cgroup management in the same process is relatively unwise. Even Solaris SMF got it right by keeping init(8) small amidst the otherwise highly impressive feature set of the main service management, using contracts (the equivalent to cgroups, which actually predate them) outside PID 1.
So the same can be done on Linux. At least one system, OpenRC, has explicit support in such a manner.
(I take it you haven't read the architectural critique? Most pro-systemd arguments fall flat on their face in any event.)
systemd is NOT making the change to manage cgroups from PID1. Kernel is - systemd is just the first (and currently the only) one to comply with this change.
Legacy cgroup API is going away.
It is the same case for the /usr merge [1]. systemd is not forcing the change, but it is complying with the changes required and is getting blamed for it.
You are right, but the cgroup manager should still be something that starts very very early, so you can start components that make use of it. If literally everything your init system starts should automatically use cgroups, you have to start it with the init system or as the first thing the init system will start.
Remember that vezzy-fnord was responding to a comment that made the following erroneous statement:
> systemd is NOT making the change to manage cgroups from PID1. Kernel is...
I've heard systemd proponents assert that both udev and the cgroup manager must live in either kernel space or in PID 1, because to do otherwise would expose systemd to races or something while PID 1 started udev and/or the cgroup manager.
These are also erroneous statements. It's rather important to correct such statements, as we're dealing with a (sadly) highly politicized technical topic.
you are correct - I had conflated the PID1 argument with cgroup daemon, because as far as I stand.. it doesnt make a difference to me.
There have been alternative managers like cgmanager [1] that lxc is bringing - which (surprisingly) work well with systemd as well [2] . Probably another reason not be scared of systemd ;)
> There have been alternative managers like cgmanager [1] that lxc is bringing - which (surprisingly) work well with systemd as well...
Given that vezzy-fnord told you: [0]
> The kernel mandates a single [cgroup] writer.
you must have come to the understanding that the way cgmanager gets its work done on a systemd system is to pass control group management requests through to systemd's control group manager. Because the ultimate plan is to have a single CG manager, either cgmanager, or systemd's control group manager handles the CG management requests. There's no other way it can work. Because the systemd project is heavily vertically integrated, the odds that cgmanager uses its own CG manager are near-zero.
> you are correct - I had conflated the PID1 argument ... because as far as I stand.. it doesnt make a difference to me.
Yep. Your arguments and assertions have been almost exclusively soft and non-technical. Here's some advice: When people are making comments about technical topics, don't join in the conversation unless your level of understanding on the topic is just about as deep as that of those who are speaking. [1]
> Probably another reason not be scared of systemd ;)
Given the timestamp on this comment, it seems unlikely that you've not had the opportunity to read my reply [2] to one of your much earlier comments. Given that I lay out five solid non-fear-based arguments for why one might be worried about the systemd project, your assertion that I shouldn't be "scared of systemd" is extremely dismissive.
[1] Unless -of course- you're joining the conversation to learn more about the topic. In that case, refrain from making uninformed assertions, ask clarifying questions about things you are unsure about, and make the limits of your knowledge clear up front.
This question has always appeared to me as academic, with little or no real-world relevance.
If your service manager process were to crash, what are you going to do about it?
If you restart the service manager, it won't know what state the system is in, which services are running and which are not, which services were running at the time when it crashed but then stopped just before it restarted, etc.
How are you going to do that in a race-free and reliable way that is actually better in practice than the alternative (reboot)?
And if your service manager is a single point of failure, it doesn't matter much which PID it's running it, it has to be perfectly reliable anyway (just like the kernel).
There's been a lot of research in fault recovery through message logging and checkpoint-based methods that could be applied here, e.g. [1]. Of course, you use "academic" as a snarl world, so I don't think anything will convince you.
The idea that the service manager would not be able to know the system and service states is completely false. Solaris SMF is a design that does, via its use of the configuration repository. Simpler designs can then deduce enough metadata from the persistent configuration in the supervisor tree. There's many possible approaches.
The idea that such fault recovery is implausible is a naive one that only one unfamiliar with the research literature could espouse.
If we take your logic to its conclusion, we should just run everything in ring 0 with a single unisolated address space, because hey, anything can fail. Component modularization and communication boundary enforcement is the first step to fault isolation, which is the first step to fault tolerance.
* State File and Restartability
* Premature exit by init(1M) is handled as a special case by the kernel:
* init(1M) will be immediately re-executed, retaining its original PID. (PID
* 1 in the global zone.) To track the processes it has previously spawned,
* as well as other mutable state, init(1M) regularly updates a state file
* such that its subsequent invocations have knowledge of its various
* dependent processes and duties.
Then init(1) and SMF's svc.startd(1) seem to have a bit of a relationship:
* Process Contracts
* We start svc.startd(1M) in a contract and transfer inherited contracts when
* restarting it. Everything else is started using the legacy contract
* template, and the created contracts are abandoned when they become empty.
So init(1) creates the initial contract for svc.startd(1), then the latter creates nested contracts below that. (Aside: doing the equivalent cgroup manipulation on Linux would run afoul of the notorious one-writer rule.)
If svc.startd(1) crashes, init(1) will restart it inside the existing contract of the crashed instance, so it can find its spawned services (in nested contracts), as well as its companion svc.configd(1).
Now during startup, svc.startd(1) calls ct_event_reset(3), and this is really the interesting bit here:
The ct_event_reset() function resets the ___location of the
listener to the beginning of the queue. This function can be
used to re-read events, or read events that were sent before
the event endpoint was opened. Informative and acknowledged
critical events, however, might have been removed from the
queue.
I'm willing to entertain the idea that with this feature, SMF can properly track the state of the services that its previous incarnation launched, even if it crashed in the middle of handling an event.
With any luck it will also handle the situation if a supervised process exits after the service manager crashes, and before it is restarted,
as the contact should buffer the event in the kernel until it is read.
Notably this is a Solaris specific kernel feature of the contract(4) filesystem; does Linux have anything equivalent in cgroups or somewhere?
The other SMF process, svc.configd, uses an SQLite database (actually 2, a persistent one and a tmpfs one for runtime state), so it's plausible that it's properly transactional.
> If we take your logic to its conclusion, we should just run everything in ring 0 with a single unisolated address space, because hey, anything can fail.
That is an entirely erroneous extrapolation, as I never claimed any other single point of failure [in user-space] than the service manager.
> I never claimed any other single point of failure [in user-space] than the service manager.
If all of one's system and service management relies upon a system-wide software "bus", then another similar problem is what to do when one has restarted the "bus" broker service and it has lost track of all active clients and servers.
Related problems are what to do when one cannot shut down one's log daemon because the only way to reach its control interface is via a "bus" broker service, and the broker in turn relies upon logging being available until it is shut down. Again, this is an example of engineering tradeoffs. Choose one big centralized logger daemon for logging everything, and this complexity and interdependence is a consequence. A different design is to have multiple log daemons, independent of one another. With the cyclog@dbus service logging to /var/log and that log daemon's own and the service manager's log output being logged by a different daemon to /run/system-manager/log/, one can shut down the separate logging services at separate points in the shutdown procedure.
It's literally named SRC_kex.ext? So... would it be fair to say that part of SRC is implemented in kernel-space? The manual page gives me this impression.
That could very well be a solution to the problem, but perhaps not one that vezzy-fnord was hoping for.
I actually wanted to link the second of your linked comments but couldn't find it unfortunately.
The unix way is also a different incompatible implementation of regex in every utility and a thousand interesting and dangerous modes of failure in the event of whitespace
Systemd has issues I'm sure and I don't trust poettering's software further than I can throw him but not being 'unix'-y isn't a strike against it.
> The unix way is also a different incompatible implementation of regex in every utility and a thousand interesting and dangerous modes of failure in the event of whitespace
In order to see the quality difference, you have to compare the docs to another project that you make extensive use of.
I know that Postgresql's and Erlang's documentation is really rather good. So, go use Postgresql or Erlang for a slightly non-trivial project, then -now that you know about the topic that the docs cover- compare the quality of the systemd documentation to the documentation of either other project.
Pay special attention to the documentation provided to folks who want to understand the internals of systemd, Postgres, or Erlang. AIUI, [0] systemd's internals documentation is woefully lacking.
[0] And as has been repeated by everyone I've ever seen try to use said documentation.
Remember that this thread was sparked by otterly's comment: [0]
> The documentation for systemd and its utilities is second to none.
What little I've seen of systemd's user/sysadmin documentation leads me to believe that it is okay. I also understand that documentation is often the least interesting part of any project, and often sorely neglected.
However. Everyone I've heard of that tests out the Systemd Cabal's claim that
"Systemd is not monolithic! Systemd is fully documented and modular, so any sufficiently skilled programmer can replace any and all parts of it with their own implementation."
by attempting to make a compatible reimplementation has failed at their task [1] and reported that the internals documentation is woefully insufficient.
When you're writing software for general consumption, good user documentation is a requirement. After all, if noone can figure out how to use your system, "noone" will use it.
When you also claim that you go out of your way to provide enough documentation to allow others to understand the relevant parts of your internals, and be able to write compatible, independent implementations of your software, the quality of the documentation about your internals is now in scope for evaluation and criticism.
Perhaps. Observed a lovely exchange a while back where a database was being publicly shamed for producing a poor unit file (i think they actually had the unit file launch a shell script that fired up the database).
Their response given was that it was the only way for them to avoid tying their database to the systemd signaling lib.
This was counted by one of the systemd devs claiming they could just use a socket that systemd provides.
But when i poked at the documentation, the only place such a "option" was mentioned was at the bottom of the man file for said lib. And it was presented as a note on the internal workings of systemd.
And you will find warning after warning about not using systemd internals, as the devs reserve the right to change the behavior of those internals at any time.
Right, you're supposed to use the published interfaces. There's nothing particularly novel about that -- neither Microsoft nor Apple will support you if you don't use their public APIs, and in fact Apple will refuse to publish your software in their app store if you don't.
> Right, you're supposed to use the published interfaces.
You missed the point. I'll isolate each component for you:
"[A] database was being publicly shamed for producing a ... unit file ... [that used] a shell script [to start] the database[.]"
"[The database devs mentioned] that it was the only way for them to avoid tying their database to the systemd signaling lib."
"[O]ne of the systemd devs [mentioned] they could just use a socket that systemd provides."
"[But this] ... 'option' ... was presented [at the bottom of the man page for the systemd signalling lib that the database authors were trying to not use] as a note on the internal workings of systemd."
"[You] will find warning after warning about not using systemd internals, as the devs reserve the right to change the behavior of those internals at any time."
So, this "option" -as documented- is something that you cannot rely on, as it is subject to change at any time, without warning.
> With respect to socket activation, a pretty useful tutorial...
Tutorials are no substitute for documentation. Documentation describes the contracts that the software commits to. Tutorials can exploit edge cases and undocumented behaviors without warning. Moreover, if the docs say that the tutorial is demonstrating a feature that's subject to change at any time, you'd have to be a madman to rely on it.
> The DBus API can be found here...
If the database devs don't want to depend on the systemd signalling lib, I bet they really don't want to depend on DBus. This might come as a surprise to some, but many servers don't run a DBus daemon.
Socket activation doesn't have any systemd-based interface. You just get a file descriptor passed in the normal Unix way. The systemd library functions related to socket activation are utility functions for examining the inherited socket, but they are just wrappers for any other way you might do so.
You can configure daemons like nginx or PHP-FPM to use sockets inherited from systemd instead of their own, and it works fine. They don't have any specific support for systemd socket activation, nor do they need to. They can't even tell the difference between the systemd sockets and ones they'd get on a configuration reload.
The closest I could find in the docs to what digi_owl said is the following:
> Internally, these functions send a single datagram with the state string as payload to the AF_UNIX socket referenced in the $NOTIFY_SOCKET environment variable. If the first character of $NOTIFY_SOCKET is "@", the string is understood as Linux abstract namespace socket. The datagram is accompanied by the process credentials of the sending service, using SCM_CREDENTIALS.
I can see how someone would be reluctant to rely on that, even given the interface promise and the nudging of the systemd developers. To be more consistent with what's a stable, public interface and the admonition to avoid internals, I would probably drop the word "internally."
However, even with your change, I still read that section as describing implementation detail that's not guaranteed to be stable. If that note describes a stable, documented protocol, a link back to the documentation of that protocol would be helpful and reassuring.
For those who want context and specifics: The whole argument from some of the people who didn't want to rely upon something that is explicitly described as "internal" is set out at length in places like https://news.ycombinator.com/item?id=7809174 .
It only replaces sysvrc, [0] but I find the OpenRC-powered systems I admin to be quite sane and easy to manage.
[0] But that's okay. I strongly suspect that when most folks say "sysvinit", they really mean "sysvrc". Hell, I used to be one of those folks until a while back.
I am worried even more that people people may well have used upstart for a number of years and still think they are using sysv, because upstart could grok the scripts without change.
But then i am sitting here using a distro that boot by way of a couple of flat files. Frankly i kinda like it, but then i grew up fiddling with autoexec.bat...
> because upstart could grok the scripts without change
I don't believe that's the case. The "service" command (which is part of the sysvinit-utils package, not Upstart) invokes either Upstart or SysV init as necessary, but Upstart itself has no awareness of the SysV init world. You couldn't have an Upstart service depend on a SysV init one, list SysV service status with Upstart, or enable a SysV service through Upstart.
In case you're curious, the wrapping on the systemd side is more comprehensive. SysV scripts appear as units, and systemd parsed the semi-standard metadata at the top of most SysV init files to determine when the service should start if enabled (translating from the traditional run levels). As units, the SysV init scripts are possible to enable/disable, start/stop, and list using the standard systemd commands. They can also participate in dependency graphs alongside systemd units.
I've come around to being a systemd skeptic too after initially supporting it. It's just so over engineered, confusing, hard to use. Today in the age of user experience anything that wants to replace the (also ugly) old init should be a huge step forward and a breath of fresh air. We're replacing bash nastiness with over engineered "enterprisey" nastiness.
A software project this large, complex, controversial and coupled that wants to be PID 1? Absolutely no way.