There's been a *lot* of research in fault recovery through message logging and c...

the_why_of_y · on Nov 2, 2015

That's very interesting, I didn't know SMF could do that.

Let's see... init(1) is apparently restarted by the Solaris kernel automatically, which is different from Linux, no automatic kernel panic.

https://github.com/illumos/illumos-gate/blob/master/usr/src/...

  * State File and Restartability
  *   Premature exit by init(1M) is handled as a special case by the kernel:
  *   init(1M) will be immediately re-executed, retaining its original PID.  (PID
  *   1 in the global zone.)  To track the processes it has previously spawned,
  *   as well as other mutable state, init(1M) regularly updates a state file
  *   such that its subsequent invocations have knowledge of its various
  *   dependent processes and duties.

Then init(1) and SMF's svc.startd(1) seem to have a bit of a relationship:

  * Process Contracts
  *   We start svc.startd(1M) in a contract and transfer inherited contracts when
  *   restarting it.  Everything else is started using the legacy contract
  *   template, and the created contracts are abandoned when they become empty.

So init(1) creates the initial contract for svc.startd(1), then the latter creates nested contracts below that. (Aside: doing the equivalent cgroup manipulation on Linux would run afoul of the notorious one-writer rule.)

If svc.startd(1) crashes, init(1) will restart it inside the existing contract of the crashed instance, so it can find its spawned services (in nested contracts), as well as its companion svc.configd(1).

Now during startup, svc.startd(1) calls ct_event_reset(3), and this is really the interesting bit here:

https://github.com/illumos/illumos-gate/blob/master/usr/src/...

     The ct_event_reset() function resets  the  ___location  of  the
     listener to the beginning of the queue. This function can be
     used to re-read events, or read events that were sent before
     the  event endpoint was opened. Informative and acknowledged
     critical events, however, might have been removed  from  the
     queue.

I'm willing to entertain the idea that with this feature, SMF can properly track the state of the services that its previous incarnation launched, even if it crashed in the middle of handling an event.

With any luck it will also handle the situation if a supervised process exits after the service manager crashes, and before it is restarted, as the contact should buffer the event in the kernel until it is read.

Notably this is a Solaris specific kernel feature of the contract(4) filesystem; does Linux have anything equivalent in cgroups or somewhere?

The other SMF process, svc.configd, uses an SQLite database (actually 2, a persistent one and a tmpfs one for runtime state), so it's plausible that it's properly transactional.

> If we take your logic to its conclusion, we should just run everything in ring 0 with a single unisolated address space, because hey, anything can fail.

That is an entirely erroneous extrapolation, as I never claimed any other single point of failure [in user-space] than the service manager.

JdeBP · on Nov 2, 2015

> I never claimed any other single point of failure [in user-space] than the service manager.

If all of one's system and service management relies upon a system-wide software "bus", then another similar problem is what to do when one has restarted the "bus" broker service and it has lost track of all active clients and servers.

* https://bugs.freedesktop.org/show_bug.cgi?id=89847

* https://github.com/NixOS/nixpkgs/issues/7633

Related problems are what to do when one cannot shut down one's log daemon because the only way to reach its control interface is via a "bus" broker service, and the broker in turn relies upon logging being available until it is shut down. Again, this is an example of engineering tradeoffs. Choose one big centralized logger daemon for logging everything, and this complexity and interdependence is a consequence. A different design is to have multiple log daemons, independent of one another. With the cyclog@dbus service logging to /var/log and that log daemon's own and the service manager's log output being logged by a different daemon to /run/system-manager/log/, one can shut down the separate logging services at separate points in the shutdown procedure.

* https://github.com/systemd/systemd/issues/867

* https://bugzilla.redhat.com/show_bug.cgi?id=1214466