This question has always appeared to me as academic, with little or no real-world relevance.
If your service manager process were to crash, what are you going to do about it?
If you restart the service manager, it won't know what state the system is in, which services are running and which are not, which services were running at the time when it crashed but then stopped just before it restarted, etc.
How are you going to do that in a race-free and reliable way that is actually better in practice than the alternative (reboot)?
And if your service manager is a single point of failure, it doesn't matter much which PID it's running it, it has to be perfectly reliable anyway (just like the kernel).
There's been a lot of research in fault recovery through message logging and checkpoint-based methods that could be applied here, e.g. [1]. Of course, you use "academic" as a snarl world, so I don't think anything will convince you.
The idea that the service manager would not be able to know the system and service states is completely false. Solaris SMF is a design that does, via its use of the configuration repository. Simpler designs can then deduce enough metadata from the persistent configuration in the supervisor tree. There's many possible approaches.
The idea that such fault recovery is implausible is a naive one that only one unfamiliar with the research literature could espouse.
If we take your logic to its conclusion, we should just run everything in ring 0 with a single unisolated address space, because hey, anything can fail. Component modularization and communication boundary enforcement is the first step to fault isolation, which is the first step to fault tolerance.
* State File and Restartability
* Premature exit by init(1M) is handled as a special case by the kernel:
* init(1M) will be immediately re-executed, retaining its original PID. (PID
* 1 in the global zone.) To track the processes it has previously spawned,
* as well as other mutable state, init(1M) regularly updates a state file
* such that its subsequent invocations have knowledge of its various
* dependent processes and duties.
Then init(1) and SMF's svc.startd(1) seem to have a bit of a relationship:
* Process Contracts
* We start svc.startd(1M) in a contract and transfer inherited contracts when
* restarting it. Everything else is started using the legacy contract
* template, and the created contracts are abandoned when they become empty.
So init(1) creates the initial contract for svc.startd(1), then the latter creates nested contracts below that. (Aside: doing the equivalent cgroup manipulation on Linux would run afoul of the notorious one-writer rule.)
If svc.startd(1) crashes, init(1) will restart it inside the existing contract of the crashed instance, so it can find its spawned services (in nested contracts), as well as its companion svc.configd(1).
Now during startup, svc.startd(1) calls ct_event_reset(3), and this is really the interesting bit here:
The ct_event_reset() function resets the ___location of the
listener to the beginning of the queue. This function can be
used to re-read events, or read events that were sent before
the event endpoint was opened. Informative and acknowledged
critical events, however, might have been removed from the
queue.
I'm willing to entertain the idea that with this feature, SMF can properly track the state of the services that its previous incarnation launched, even if it crashed in the middle of handling an event.
With any luck it will also handle the situation if a supervised process exits after the service manager crashes, and before it is restarted,
as the contact should buffer the event in the kernel until it is read.
Notably this is a Solaris specific kernel feature of the contract(4) filesystem; does Linux have anything equivalent in cgroups or somewhere?
The other SMF process, svc.configd, uses an SQLite database (actually 2, a persistent one and a tmpfs one for runtime state), so it's plausible that it's properly transactional.
> If we take your logic to its conclusion, we should just run everything in ring 0 with a single unisolated address space, because hey, anything can fail.
That is an entirely erroneous extrapolation, as I never claimed any other single point of failure [in user-space] than the service manager.
> I never claimed any other single point of failure [in user-space] than the service manager.
If all of one's system and service management relies upon a system-wide software "bus", then another similar problem is what to do when one has restarted the "bus" broker service and it has lost track of all active clients and servers.
Related problems are what to do when one cannot shut down one's log daemon because the only way to reach its control interface is via a "bus" broker service, and the broker in turn relies upon logging being available until it is shut down. Again, this is an example of engineering tradeoffs. Choose one big centralized logger daemon for logging everything, and this complexity and interdependence is a consequence. A different design is to have multiple log daemons, independent of one another. With the cyclog@dbus service logging to /var/log and that log daemon's own and the service manager's log output being logged by a different daemon to /run/system-manager/log/, one can shut down the separate logging services at separate points in the shutdown procedure.
It's literally named SRC_kex.ext? So... would it be fair to say that part of SRC is implemented in kernel-space? The manual page gives me this impression.
That could very well be a solution to the problem, but perhaps not one that vezzy-fnord was hoping for.
I actually wanted to link the second of your linked comments but couldn't find it unfortunately.
If your service manager process were to crash, what are you going to do about it?
If you restart the service manager, it won't know what state the system is in, which services are running and which are not, which services were running at the time when it crashed but then stopped just before it restarted, etc.
How are you going to do that in a race-free and reliable way that is actually better in practice than the alternative (reboot)?
And if your service manager is a single point of failure, it doesn't matter much which PID it's running it, it has to be perfectly reliable anyway (just like the kernel).