Package: tech-ctte; Maintainer for tech-ctte is Technical Committee <[email protected]>;
Reported by: Sean Whitton <[email protected]>
Date: Wed, 25 Jul 2018 04:12:02 UTC
Severity: normal
Done: Margarita Manterola <[email protected]>
Bug is archived. No further changes may be made.
View this report as an mbox folder, status mbox, maintainer mbox
Report forwarded
to [email protected], [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Wed, 25 Jul 2018 04:12:05 GMT) (full text, mbox, link).
Acknowledgement sent
to Sean Whitton <[email protected]>
:
New Bug report received and forwarded. Copy sent to [email protected], Technical Committee <[email protected]>
.
(Wed, 25 Jul 2018 04:12:05 GMT) (full text, mbox, link).
Message #5 received at [email protected] (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Package: tech-ctte X-debbugs-cc: [email protected] Control: block 780403 by -1 I hereby request advice from the Technical Committee on a decision that I must take in my role as a Debian Policy delegate. To be completely clear, I am not seeking a decision. I refer to the third power of the T.C. listed under section 6.1 of the Debian Constitution: "Any person or body may ... seek advice from [the Technical Committee]." In bugs #780403 and #802501 the following question has been asked (I quote Daniel Pocock): If postinst or one of the other scripts does a service restart and the restart operation fails, should the postinst abort or should it mask the error, continue and return success? At present the Policy Manual does not answer this question, and thus it is left up to maintainer discretion: whatever the maintainer thinks makes sense for the service in question. Others have pointed out, however, that this means that users will see inconsistent behaviour. There is no practical way for a user to determine what will happen when installing a given package that starts or restarts a service, if that start or restart attempt fails. So if it were possible to come up with consistent answer to the question posed, it would be useful to our users. As a Policy delegate I want to move this issue along, and I can see three ways of doing that: 1. write a patch to explicitly state in Policy that what happens when a service (re)start fails in a maintscript is left up to package maintainer discretion, and close the bugs 2. make a further attempt to establish consensus on a requirement that maintscripts are consistent in the case of a (re)start failure (this is the default option, so to speak, and I cannot see it succeeding) 3. ask the T.C. to decide what maintscripts should do in these cases. The general question about which I am seeking advice: does the T.C. think that Debian can be consistent on service (re)starts in maintscripts, or is the best we can do to leave it up to package maintainer discretion? Thanks. -- Sean Whitton
[signature.asc (application/pgp-signature, inline)]
Added indication that bug 904558 blocks 780403,802501
Request was from Sean Whitton <[email protected]>
to [email protected]
.
(Wed, 25 Jul 2018 04:12:06 GMT) (full text, mbox, link).
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Thu, 09 Aug 2018 19:39:02 GMT) (full text, mbox, link).
Acknowledgement sent
to Tollef Fog Heen <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Thu, 09 Aug 2018 19:39:02 GMT) (full text, mbox, link).
Message #12 received at [email protected] (full text, mbox, reply):
]] Sean Whitton > The general question about which I am seeking advice: does the > T.C. think that Debian can be consistent on service (re)starts in > maintscripts, or is the best we can do to leave it up to package > maintainer discretion? I think we can give advice on what the default should be and that people should not stray from that unless they have particular reasons. That advice might be more appropriate for the developers reference than policy, though. Due to the variety and complexity of daemons in the archive, I would be reluctant to require complete consistency, there are likely various edge cases we have not thought about. -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Fri, 10 Aug 2018 09:45:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Sean Whitton <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Fri, 10 Aug 2018 09:45:03 GMT) (full text, mbox, link).
Message #17 received at [email protected] (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Hello, Thank you for your reply. On Thu 09 Aug 2018 at 09:19pm +0200, Tollef Fog Heen wrote: > ]] Sean Whitton > >> The general question about which I am seeking advice: does the >> T.C. think that Debian can be consistent on service (re)starts in >> maintscripts, or is the best we can do to leave it up to package >> maintainer discretion? > > I think we can give advice on what the default should be and that people > should not stray from that unless they have particular reasons. That > advice might be more appropriate for the developers reference than > policy, though. I disagree -- it's about the contents of packages, so it should go into Policy. We can make it a recommendation rather than a requirement. > Due to the variety and complexity of daemons in the archive, I would be > reluctant to require complete consistency, there are likely various edge > cases we have not thought about. It would be useful to write something like this into Policy, rather than it remaining silent on the issue. It would be a fine resolution for the Policy bug in question. -- Sean Whitton
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Mon, 17 Sep 2018 18:30:06 GMT) (full text, mbox, link).
Acknowledgement sent
to Margarita Manterola <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Mon, 17 Sep 2018 18:30:06 GMT) (full text, mbox, link).
Message #22 received at [email protected] (full text, mbox, reply):
Hi, Sorry that it took so long to get back to this bug. The other bug took all the attention. On 2018-07-25 06:07, Sean Whitton wrote: > If postinst or one of the other scripts does a service restart and > the restart operation fails, should the postinst abort or should it > mask the error, continue and return success? We had some discussion around this subject at the past ctte meeting [1], and after some back and forth we came to the conclusion that in general it's a bad idea for any postinst to purposely fail, regardless of whether it was trying to (re)start a service or not. If a postinst fails (for whatever reason), the package is left in a broken state (Failed-Config) which in general makes the package management system unhappy. It seems that the only reason why one may want to do this is to call the attention of the sysadmin so that they can solve the problem. However, in a world where a large number of users are running automatic updates, leaving the package management system in a broken state is pretty sad, not very visible and rather confusing for the user when they finally encounter it. Is there an another use case for leaving the package in Failed-Config that we missed? [1]: https://salsa.debian.org/debian/tech-ctte/blob/master/meetings/20180815/debian-ctte.2018-08-15.log.txt > As a Policy delegate I want to move this issue along, and I can see > three ways of doing that: > > 1. write a patch to explicitly state in Policy that what happens when a > service (re)start fails in a maintscript is left up to package > maintainer discretion, and close the bugs > > 2. make a further attempt to establish consensus on a requirement that > maintscripts are consistent in the case of a (re)start failure (this > is the default option, so to speak, and I cannot see it succeeding) > > 3. ask the T.C. to decide what maintscripts should do in these cases. It's unclear why the service (re)start needs to be a special case. Any operation that is performed in a postinst might fall under the same question of what should happen when that operation fails. Operations like creating users, creating directories, changing permissions, running a command to update the contents of a file, and so on. > The general question about which I am seeking advice: does the > T.C. think that Debian can be consistent on service (re)starts in > maintscripts, or is the best we can do to leave it up to package > maintainer discretion? We didn't reach this point in our discussion, so this is still an open question. I personally think that it would make sense for the policy to at least recommend what should happen with regards to maintainer scripts and typical operations that are performed in them. And, while I'm open to be convinced otherwise, I don't see any benefit from postinst (particularly postinst + configure) ever failing. If the only reason for postinst to fail is so that the user knows what happened, we should devise a better mechanism for informing the user about the failure. -- Regards, Marga
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Tue, 18 Sep 2018 16:33:05 GMT) (full text, mbox, link).
Acknowledgement sent
to Ian Jackson <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Tue, 18 Sep 2018 16:33:05 GMT) (full text, mbox, link).
Message #27 received at [email protected] (full text, mbox, reply):
Margarita Manterola writes ("Bug#904558: What should happen when maintscripts fail to restart a service"): > Sorry that it took so long to get back to this bug. The other bug took > all the attention. ... > If a postinst fails (for whatever reason), the package is left in a > broken state (Failed-Config) which in general makes the package > management system unhappy. The other effect is that the package's dependencies are not configured, so their postinsts do not experience a broken situation. > It seems that the only reason why one may want to do this is to call > the attention of the sysadmin so that they can solve the problem. > However, in a world where a large number of users are running automatic > updates, leaving the package management system in a broken state is > pretty sad, not very visible and rather confusing for the user when > they finally encounter it. > > Is there an another use case for leaving the package in Failed-Config > that we missed? If you deliberately cause the postinst to succeed when the package is nonfunctional, then the package's r-dependencies will be configured (ie have their postinsts run) in the broken state. The r-dependencies' postinsts may then do wrong things. They may leave the r-dependencies in anomalous states. If one takes the argument you make above to its logical conclusion, all those postinsts should also report success. The result is system where the only thing that is happy is the package management systme, and the records of the root cause of the problem, and how the failed operations might be reattempted, have been lost. I guess you will infer from what I write above that "reporting errors causes the next layer to be unhappy", and "reporting errors causes the user to be unhappy" to be extraordinarily bad arguments. There may be good reasons not to treat daemon startup failure as a postinst failure, but the argument above is not one of them. > It's unclear why the service (re)start needs to be a special case. Service (re)starts are more likely to fail for unrelated reasons. Also some packages are able to provide much of their intended API even without the daemon. I think the general rule of thumb should be that a daemon startup failure should be treated as a configuration failure. I'm content with a situation where maintainers Feel free to diverge from this if there are reasons to do so. > I personally think that it would make sense for the policy to at least > recommend what should happen with regards to maintainer scripts and > typical operations that are performed in them. There is already a section on error handling in scripts, which (IMO correctly) says that shell scripts should use set -e. When I wrote that, it didn't occur to me that anyone would think that a failure by a postinst script to perform an intended operation should be treated any other way than a failure of the postinst script. (In the usual case. There are of course lots of situations where the right approach is some kind of error recovery, or the operation was attempted "just in case", or something, in which case more subtle error handling is called for.) > And, while I'm open to be convinced otherwise, I don't see any benefit > from postinst (particularly postinst + configure) ever failing. Frankly I'm disturbed to be reading this, here. See above. If the postinst fails, then the user has the opportunity to fix the root cause and rerun dpkg-source --configure --pending. That will then repair the system completely. Ian. -- Ian Jackson <[email protected]> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Tue, 18 Sep 2018 20:45:12 GMT) (full text, mbox, link).
Acknowledgement sent
to Tollef Fog Heen <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Tue, 18 Sep 2018 20:45:12 GMT) (full text, mbox, link).
Message #32 received at [email protected] (full text, mbox, reply):
]] Ian Jackson Hi, > There may be good reasons not to treat daemon startup failure as a > postinst failure, but the argument above is not one of them. I think this is the core question. I largely agree with Ian here that having postinsts fail is not that big a deal if they can't make forward progress, but also we're being asked to advice on what happens when a maintainer script fails to restart a service. I disagree with him on whether failure to start/restart a service should be considered a configuration failure. The API provided by a package being in the configured state is not whether the relevant daemon is running or not; that is runtime and can and will change many times while the package is in the configured state, so dpkg dependencies are not useful for expressing «this service must be running». (There's also the case where the service is running on a separate host, which is often the case for services such as databases and where the use of Depends is inappropriate.) I think the general rule should be that the success/failure of the postinst script should signal whether the package considers itself ready to provide whatever API it exists to provide (disregarding the case of Essential packages here, since those are special). This means that failure to start a daemon should generally not cause the postinst to fail. At the same time, I think there are exceptions to this rule that should be left to maintainer judgement: sshd comes to mind as a service where if it can't restart, you want the system to make it very clear that something is wrong that you might want to fix sooner rather than later (since failure to do so can lead to you not being able to access it after a reboot). -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Wed, 19 Sep 2018 02:21:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Stuart Prescott <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Wed, 19 Sep 2018 02:21:03 GMT) (full text, mbox, link).
Message #37 received at [email protected] (full text, mbox, reply):
Ian Jackson wrote: >> I personally think that it would make sense for the policy to at least >> recommend what should happen with regards to maintainer scripts and >> typical operations that are performed in them. > > There is already a section on error handling in scripts, which (IMO > correctly) says that shell scripts should use set -e. > > When I wrote that, it didn't occur to me that anyone would think that > a failure by a postinst script to perform an intended operation should > be treated any other way than a failure of the postinst script. That was perhaps also written before we started to realise that maintainer scripts are actually best avoided as they tend to be complicated, fragile, difficult to do right and make upgrades harder for the package manager. In the intervening two decades, we've gone from "maintainer scripts are cool" to "the best maintainer script is the one that doesn't exist". So yes, ignoring errors seems wrong but… >> And, while I'm open to be convinced otherwise, I don't see any benefit >> from postinst (particularly postinst + configure) ever failing. > > Frankly I'm disturbed to be reading this, here. See above. > > If the postinst fails, then the user has the opportunity to fix the > root cause and rerun dpkg-source --configure --pending. That will > then repair the system completely. … causing a snowball of errors in an awkward half-upgraded environment is nasty. The problem comes when you don't yet have the right tools installed to be able to fix the problem. We see that scenario often enough in #debian where someone has a failed upgrade and we try to collect more information via pastebinit, strace, traceroute, netcat, gdb, etc; we frequently discover that the relevant tool isn't installed and because apt is sufficiently unhappy about broken packages and a half-completed upgrade, you can't ask it to install the tool at that point in time. In the upgrade scenario, while you're trying to fix one particular problem, you're also in a completely untested half-upgraded situation and so latent bugs in any number of other tools may also be exposed. So while ignoring errors is wrong, so is making it harder to fix them. This isn't a question of absolutes. cheers Stuart -- Stuart Prescott http://www.nanonanonano.net/ [email protected] Debian Developer http://www.debian.org/ [email protected] GPG fingerprint 90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Wed, 19 Sep 2018 04:27:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Gunnar Wolf <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Wed, 19 Sep 2018 04:27:03 GMT) (full text, mbox, link).
Message #42 received at [email protected] (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Stuart Prescott dijo [Wed, Sep 19, 2018 at 12:18:24PM +1000]: > (...) > That was perhaps also written before we started to realise that maintainer > scripts are actually best avoided as they tend to be complicated, fragile, > difficult to do right and make upgrades harder for the package manager. In > the intervening two decades, we've gone from "maintainer scripts are cool" > to "the best maintainer script is the one that doesn't exist". > > So yes, ignoring errors seems wrong but… > (...) > … causing a snowball of errors in an awkward half-upgraded environment is > nasty. > > The problem comes when you don't yet have the right tools installed to be > able to fix the problem. We see that scenario often enough in #debian where > someone has a failed upgrade and we try to collect more information via > pastebinit, strace, traceroute, netcat, gdb, etc; we frequently discover > that the relevant tool isn't installed and because apt is sufficiently > unhappy about broken packages and a half-completed upgrade, you can't ask it > to install the tool at that point in time. > > In the upgrade scenario, while you're trying to fix one particular problem, > you're also in a completely untested half-upgraded situation and so latent > bugs in any number of other tools may also be exposed. > > So while ignoring errors is wrong, so is making it harder to fix them. This > isn't a question of absolutes. I completely agree with Stuart here. Yes, of course, there is a reason for maintainer scripts to exist, and if they fail to set up things around the package, of course, the user _needs_ to know something is off in their system. But that should happen _very_ seldom. As Stuart says, helping non-technical users out of this situation can be quite hard, and quite discouraging for the user. We have to make sure the scripts are as foolproof as possible — and failing to stop or restart a daemon it should _never_ cause the system to enter such a state.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Wed, 19 Sep 2018 08:15:06 GMT) (full text, mbox, link).
Acknowledgement sent
to Wouter Verhelst <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Wed, 19 Sep 2018 08:15:06 GMT) (full text, mbox, link).
Message #47 received at [email protected] (full text, mbox, reply):
On Tue, Sep 18, 2018 at 10:04:26PM +0200, Tollef Fog Heen wrote: > ]] Ian Jackson > > Hi, > > > There may be good reasons not to treat daemon startup failure as a > > postinst failure, but the argument above is not one of them. > > I think this is the core question. I largely agree with Ian here that > having postinsts fail is not that big a deal if they can't make forward > progress, but also we're being asked to advice on what happens when a > maintainer script fails to restart a service. I disagree with him on > whether failure to start/restart a service should be considered a > configuration failure. I'm not sure why that position is even being considered valid. > The API provided by a package being in the configured state is not > whether the relevant daemon is running or not; that is runtime and can > and will change many times while the package is in the configured state, > so dpkg dependencies are not useful for expressing «this service must be > running». No. But it *is* a useful way to express "this service must be able to run". Additionally, if something fails to restart, then that is a serious problem that I, as a system administrator, would like to know about. Failure to configure a package signals that there is a serious problem that I need to fix, so that informs me. > (There's also the case where the service is running on a > separate host, which is often the case for services such as databases > and where the use of Depends is inappropriate.) > > I think the general rule should be that the success/failure of the > postinst script should signal whether the package considers itself ready > to provide whatever API it exists to provide (disregarding the case of > Essential packages here, since those are special). > > This means that failure to start a daemon should generally not cause the > postinst to fail. I think it should. If the daemon fails to restart, that means its configuration is incomplete or incorrect, which means the package failed to configure correctly. The failure to restart is just a symptom; the actual problem is the broken configuration, which may have further effects beyond just "the daemon won't restart". As such, in the general case, I think failure to restart is something that should cause failure to configure. There are really only two[1] reasons why a daemon could fail to restart: - The maintainer made a mistake in the default configuration, and the user didn't make any changes so the old conffiles are being replaced by the new ones, or the package is being newly installed; now the daemon encounters a syntax error. This is a bug, plain and simple, and catching bugs earlier rather than later is a good idea, which will happen if the daemon restart failure causes a postinst failure. - The maintainer made no mistake, but the upgrading user made some local changes, so the conffile system ensures that the syntactic differences in the configuration are not incorporated and the daemon fails to restart. As a system administrator, I would want to know when something like that happens sooner rather than later, so that I can fix it (also sooner rather than later). Failing to finish postinst correctly ensures that that does happen. This is now being countered by "but some people use tools that don't show failures to system administrators", from which the (wrong) conclusion is drawn "so we shouldn't fail anymore". It would be awesome if we lived in a world where we could avoid bugs in code and thus avoid all possible failures, but alas, we don't. So, given that failures *will* happen, even if we don't fail when daemons fail to restart, the correct conclusion would be "so those tools should be fixed to do their utter best to inform the system administrator when something failed". When those tools do that, failure to restart a service is no longer a problem for them, and we can continue to do the right thing. [1] There is also the possibility of "the package ships with incomplete configuration on purpose, because there are no sane defaults to use and installing the package requires manual steps from the maintainer before it can be made to work", but (a) our best practices recommend against doing that if at all possible, and (b) in that case starting the daemon shouldn't even be attempted from postinst, and so failure to start can't be a consideration in the exit state of postinst. -- Could you people please use IRC like normal people?!? -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Wed, 19 Sep 2018 12:45:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Ian Jackson <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Wed, 19 Sep 2018 12:45:04 GMT) (full text, mbox, link).
Message #52 received at [email protected] (full text, mbox, reply):
Tollef Fog Heen writes ("Bug#904558: What should happen when maintscripts fail to restart a service"): > Ian Jackson: > > There may be good reasons not to treat daemon startup failure as a > > postinst failure, but the argument above is not one of them. > > I think this is the core question. I largely agree with Ian here that > having postinsts fail is not that big a deal if they can't make forward > progress, but also we're being asked to advice on what happens when a > maintainer script fails to restart a service. I disagree with him on > whether failure to start/restart a service should be considered a > configuration failure. I think whether it is a configuration failure depends on ... > I think the general rule should be that the success/failure of the > postinst script should signal whether the package considers itself ready > to provide whatever API it exists to provide (disregarding the case of > Essential packages here, since those are special). ... that. I think I'm in agreement with you on that. But ... > This means that failure to start a daemon should generally not cause the > postinst to fail. ... I disagree with that. I think that in the usual case, if the daemon is broken, and the package's purpose is to provide that daemon service, then the package probably isn't providing its API. Maybe part of the difficulty we are having with this conversation is that we are lacking in examples. This bug and the "parents" #780403 and #802501 are all entirely abstract. Would someone care to give some examples of packages which with both behaviours ? Also: > The API provided by a package being in the configured state is not > whether the relevant daemon is running or not; that is runtime and can > and will change many times while the package is in the configured state, > so dpkg dependencies are not useful for expressing "this service must be > running". I disagree with this. dpkg dependencies are not just about what sets of packages can be coinstalled. They also imply sequencing of package setup. And since starting daemons is part of package setup, dpkg dependencies imply a sequencing of daemon startup. That is actually necessary in the case where the startup of daemon B can only successfully completed if daemon A is up, > (There's also the case where the service is running on a > separate host, which is often the case for services such as databases > and where the use of Depends is inappropriate.) In that case, there would be a Recommends or Suggests instead, I would have thought. Thanks, Ian. -- Ian Jackson <[email protected]> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Wed, 19 Sep 2018 12:51:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Ian Jackson <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Wed, 19 Sep 2018 12:51:03 GMT) (full text, mbox, link).
Message #57 received at [email protected] (full text, mbox, reply):
Stuart Prescott writes ("Bug#904558: What should happen when maintscripts fail to restart a service"): > Ian Jackson wrote: > > When I wrote that, it didn't occur to me that anyone would think that > > a failure by a postinst script to perform an intended operation should > > be treated any other way than a failure of the postinst script. > > That was perhaps also written before we started to realise that maintainer > scripts are actually best avoided I don't think that makes any difference. Whether things are implemented by handcoded code in postinst, or dh-generated templatey postinst, or some kind of declarative system, is important for manageability of our codebase etc. etc. But it doesn't have any bearing on what the error handling should be like. Any kind of declarative or automatic system or whatever ought to have similar error handling: failure to perform an intended function is an error and should not be ignored. See for example the handling of errors which occur during trigger processing. One of the things that I am most proud of in dpkg is the comprehensive and thoughtful error behaviours. > > If the postinst fails, then the user has the opportunity to fix the > > root cause and rerun dpkg-source --configure --pending. That will > > then repair the system completely. > > \u2026 causing a snowball of errors in an awkward half-upgraded > environment is nasty. > > The problem comes when you don't yet have the right tools installed to be > able to fix the problem. We see that scenario often enough in #debian where > someone has a failed upgrade and we try to collect more information via > pastebinit, strace, traceroute, netcat, gdb, etc; we frequently discover > that the relevant tool isn't installed and because apt is sufficiently > unhappy about broken packages and a half-completed upgrade, you can't ask it > to install the tool at that point in time. This is a bug in apt, plain and simple. Of course it is a design error, but that does not make it a bug. There is nothing conceptually incoherent in installing strace while cupsd and its dependencies are broken. dpkg will happily do it. I agree that in the absence of a fix to this, some workarounds would be good. Perhaps dpkg --configure --force-postinst-fail broken-package ? > In the upgrade scenario, while you're trying to fix one particular > problem, you're also in a completely untested half-upgraded > situation and so latent bugs in any number of other tools may also > be exposed. dpkg is designed so that it is in general only the _configuration_ of other packages which is blocked, not their actual upgrade. So hopefully you should be in a reasonably coherent state. > So while ignoring errors is wrong, so is making it harder to fix them. This > isn't a question of absolutes. As I say I think it is a bug in apt that when you have an error, apt makes it hard to fix the error by insisting that you can't do anything (even install diagnosis tools) until you have fixed the error (which you can't do). Ian. -- Ian Jackson <[email protected]> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Fri, 21 Sep 2018 19:57:02 GMT) (full text, mbox, link).
Acknowledgement sent
to Tollef Fog Heen <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Fri, 21 Sep 2018 19:57:02 GMT) (full text, mbox, link).
Message #62 received at [email protected] (full text, mbox, reply):
]] Wouter Verhelst > On Tue, Sep 18, 2018 at 10:04:26PM +0200, Tollef Fog Heen wrote: [...] > > The API provided by a package being in the configured state is not > > whether the relevant daemon is running or not; that is runtime and can > > and will change many times while the package is in the configured state, > > so dpkg dependencies are not useful for expressing «this service must be > > running». > > No. But it *is* a useful way to express "this service must be able to > run". That's not what «configured» means, though. «apt install foo ; rm /etc/foo.conf» and the package will be in a «running, but can't restart» state, but also configured in dpkg terms. > Additionally, if something fails to restart, then that is a serious > problem that I, as a system administrator, would like to know about. > Failure to configure a package signals that there is a serious problem > that I need to fix, so that informs me. I think monitoring should be implemented using monitoring tools, so if you actually care if a service is up, you should monitor it rather than relying on postinsts failing or succeeding. Alternatively, you could just add «systemctl is-system-running» to a post-dpkg-invoke hook, it'll tell you if there are daemons that have failed. [...] > There are really only two[1] reasons why a daemon could fail to restart: > > - The maintainer made a mistake in the default configuration, and the > user didn't make any changes so the old conffiles are being replaced > by the new ones, or the package is being newly installed; now the > daemon encounters a syntax error. This is a bug, plain and simple, and > catching bugs earlier rather than later is a good idea, which will > happen if the daemon restart failure causes a postinst failure. > - The maintainer made no mistake, but the upgrading user made some local > changes, so the conffile system ensures that the syntactic differences > in the configuration are not incorporated and the daemon fails to > restart. As a system administrator, I would want to know when > something like that happens sooner rather than later, so that I can > fix it (also sooner rather than later). Failing to finish postinst > correctly ensures that that does happen. In addition to this: Any number of runtime problems. The disk might be full. The service might try to look up a user whose name is in LDAP and the network is down and thus the user lookup fails. Some hardware the service needs is not plugged in or doesn't work correctly. Data files are corrupted. Out of memory. I'm sure you can come up with more. :-) This then also ties into what the semantics of «daemon is started» should be: is it that the service has started, or that it is working? What should happen if you, on a host with no network connectivity (or just heavily firewalled), do «apt install ntp»? Should it wait until the clock is synced (effectively forever in this case? Should the postinst fail until you've fixed the firewall?)? > [1] There is also the possibility of "the package ships with incomplete > configuration on purpose, because there are no sane defaults to use > and installing the package requires manual steps from the maintainer > before it can be made to work", but (a) our best practices recommend > against doing that if at all possible, and (b) in that case starting > the daemon shouldn't even be attempted from postinst, and so failure > to start can't be a consideration in the exit state of postinst. You might still want to restart it on upgrade to ensure you don't run outdated binaries. -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Fri, 21 Sep 2018 20:09:02 GMT) (full text, mbox, link).
Acknowledgement sent
to Tollef Fog Heen <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Fri, 21 Sep 2018 20:09:02 GMT) (full text, mbox, link).
Message #67 received at [email protected] (full text, mbox, reply):
]] Ian Jackson > Tollef Fog Heen writes ("Bug#904558: What should happen when maintscripts fail to restart a service"): [...] > > This means that failure to start a daemon should generally not cause the > > postinst to fail. > > ... I disagree with that. I think that in the usual case, if the > daemon is broken, and the package's purpose is to provide that daemon > service, then the package probably isn't providing its API. I don't think dpkg relationships are a good fit for expressing those kinds of statements. They are not about in-memory and process state management, they're about what's on disk. [...] > Also: > > > The API provided by a package being in the configured state is not > > whether the relevant daemon is running or not; that is runtime and can > > and will change many times while the package is in the configured state, > > so dpkg dependencies are not useful for expressing "this service must be > > running". > > I disagree with this. > > dpkg dependencies are not just about what sets of packages can be > coinstalled. They also imply sequencing of package setup. And since > starting daemons is part of package setup, dpkg dependencies imply a > sequencing of daemon startup. If you include the word «attempted» there, I might agree. policy-rc.d for instance enters the picture here. Blacklisting in the init system does as well, probably others too. The landscape is pretty crowded with actors. > That is actually necessary in the case where the startup of daemon B > can only successfully completed if daemon A is up, That's the job of the init system's dependency resolution mechanisms, not dpkg's. dpkg does not have information about what is running and so can't do this. Ordering is also separate from dependencies, at least for some init systems. -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Sat, 22 Sep 2018 07:51:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Wouter Verhelst <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Sat, 22 Sep 2018 07:51:03 GMT) (full text, mbox, link).
Message #72 received at [email protected] (full text, mbox, reply):
Hi Tollef, On Fri, Sep 21, 2018 at 09:53:13PM +0200, Tollef Fog Heen wrote: > ]] Wouter Verhelst > > > On Tue, Sep 18, 2018 at 10:04:26PM +0200, Tollef Fog Heen wrote: > > [...] > > > > The API provided by a package being in the configured state is not > > > whether the relevant daemon is running or not; that is runtime and can > > > and will change many times while the package is in the configured state, > > > so dpkg dependencies are not useful for expressing «this service must be > > > running». > > > > No. But it *is* a useful way to express "this service must be able to > > run". > > That's not what «configured» means, though. Disagree. > «apt install foo ; rm /etc/foo.conf» and the package will be in a «running, > but can't restart» state, but also configured in dpkg terms. Well, sure, but that's true for any kind of configuration, and is not specific to daemons: if you blow away a package's configuration, all bets are off, so I fail to see your point. The point is not "what happens after the install run has happened"; it is about finding problems early rather than late. > > Additionally, if something fails to restart, then that is a serious > > problem that I, as a system administrator, would like to know about. > > Failure to configure a package signals that there is a serious problem > > that I need to fix, so that informs me. > > I think monitoring should be implemented using monitoring tools, so if > you actually care if a service is up, you should monitor it rather than > relying on postinsts failing or succeeding. First, the fact that there are tools to deal with this externally from dpkg shouldn't mean that dpkg itself can't deal with it. Second, if I manually upgrade something and postinst fails, I know immediately that something is wrong; in contrast, if I upgrade something but postinst does not fail, and then I have to rely on monitoring to notify me, it may take a while before I notice something is wrong, because monitoring tools often only tell me after a few minutes. Third, the person who performs the upgrade is not necessarily the same person as the one who notices something is wrong on the monitoring system; the lack of immediate feedback that the upgrade broke things will make debugging and fixing the problem more involved than it should be. I think "there are tools to do X" is a terrible argument for "postinst shouldn't do X". > Alternatively, you could just add «systemctl is-system-running» to a > post-dpkg-invoke hook, it'll tell you if there are daemons that have > failed. The fact that I can do something to fix the fact that someone (you?) broke reasonable expectations isn't an excuse for breaking those reasonable expectations in the first place. > [...] > > > There are really only two[1] reasons why a daemon could fail to restart: > > > > - The maintainer made a mistake in the default configuration, and the > > user didn't make any changes so the old conffiles are being replaced > > by the new ones, or the package is being newly installed; now the > > daemon encounters a syntax error. This is a bug, plain and simple, and > > catching bugs earlier rather than later is a good idea, which will > > happen if the daemon restart failure causes a postinst failure. > > - The maintainer made no mistake, but the upgrading user made some local > > changes, so the conffile system ensures that the syntactic differences > > in the configuration are not incorporated and the daemon fails to > > restart. As a system administrator, I would want to know when > > something like that happens sooner rather than later, so that I can > > fix it (also sooner rather than later). Failing to finish postinst > > correctly ensures that that does happen. > > In addition to this: Any number of runtime problems. The disk might be > full. The service might try to look up a user whose name is in LDAP and > the network is down and thus the user lookup fails. Some hardware the > service needs is not plugged in or doesn't work correctly. Data files > are corrupted. Out of memory. I'm sure you can come up with more. :-) Well, yeah, and I like it if dpkg gives me an error when I try to install something and, say, the disk is full. > This then also ties into what the semantics of «daemon is started» > should be: is it that the service has started, or that it is working? > What should happen if you, on a host with no network connectivity (or > just heavily firewalled), do «apt install ntp»? Should it wait until > the clock is synced (effectively forever in this case? Should the > postinst fail until you've fixed the firewall?)? If the daemon is running and it would work as soon as it can reach then internet? No, it should continue. If the daemon is failing to start because of, say, mandatory access control not being configured yet? Yes, in that case it should fail, because that is a dependency bug, and we want to know about it. > > [1] There is also the possibility of "the package ships with incomplete > > configuration on purpose, because there are no sane defaults to use > > and installing the package requires manual steps from the maintainer > > before it can be made to work", but (a) our best practices recommend > > against doing that if at all possible, and (b) in that case starting > > the daemon shouldn't even be attempted from postinst, and so failure > > to start can't be a consideration in the exit state of postinst. > > You might still want to restart it on upgrade to ensure you don't run > outdated binaries. Sure. This bug isn't about "you might still want to do X" though, it's about "what should we do if X fails". Let's stick to the core issue? -- Could you people please use IRC like normal people?!? -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Sat, 22 Sep 2018 07:51:05 GMT) (full text, mbox, link).
Acknowledgement sent
to Wouter Verhelst <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Sat, 22 Sep 2018 07:51:05 GMT) (full text, mbox, link).
Message #77 received at [email protected] (full text, mbox, reply):
On Fri, Sep 21, 2018 at 10:07:31PM +0200, Tollef Fog Heen wrote: > ]] Ian Jackson > > > Tollef Fog Heen writes ("Bug#904558: What should happen when maintscripts fail to restart a service"): > > [...] > > > > This means that failure to start a daemon should generally not cause the > > > postinst to fail. > > > > ... I disagree with that. I think that in the usual case, if the > > daemon is broken, and the package's purpose is to provide that daemon > > service, then the package probably isn't providing its API. > > I don't think dpkg relationships are a good fit for expressing those > kinds of statements. They are not about in-memory and process state > management, they're about what's on disk. The point here is that failure to restart the daemon is a *symptom* of breakage of the *on-disk state*, so we're really arguing the same thing? [...] > > I disagree with this. > > > > dpkg dependencies are not just about what sets of packages can be > > coinstalled. They also imply sequencing of package setup. And since > > starting daemons is part of package setup, dpkg dependencies imply a > > sequencing of daemon startup. > > If you include the word «attempted» there, I might agree. policy-rc.d > for instance enters the picture here. Blacklisting in the init system > does as well, probably others too. The landscape is pretty crowded with > actors. Nobody is arguing that if the init system or policy-rc.d block service starts, that then postinst should silently not start the daemon. However, in the absense of such things, if postinst fails to restart the daemon, it knows something is wrong. "something is wrong" should not happen after package upgrade; if it did, we failed. "failed" means postinst should not exist successfully. > > That is actually necessary in the case where the startup of daemon B > > can only successfully completed if daemon A is up, > > That's the job of the init system's dependency resolution mechanisms, > not dpkg's. dpkg does not have information about what is running and so > can't do this. Ordering is also separate from dependencies, at least > for some init systems. Some init system dependency resolution mechanisms also only work properly once all the packages involved have been configured, so that's not a valid point. -- Could you people please use IRC like normal people?!? -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Sat, 22 Sep 2018 07:57:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Wouter Verhelst <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Sat, 22 Sep 2018 07:57:03 GMT) (full text, mbox, link).
Message #82 received at [email protected] (full text, mbox, reply):
On Sat, Sep 22, 2018 at 09:50:11AM +0200, Wouter Verhelst wrote: > Nobody is arguing that if the init system or policy-rc.d block service > starts, that then postinst should silently not start the daemon. That should read: Nobody is arguing that if the init system or policy-rc.d block service starts, that then postinst should fail for not starting the daemon. I'll go get some IV with caffeine now ;-) -- Could you people please use IRC like normal people?!? -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Sat, 22 Sep 2018 20:12:06 GMT) (full text, mbox, link).
Acknowledgement sent
to Anthony DeRobertis <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Sat, 22 Sep 2018 20:12:06 GMT) (full text, mbox, link).
Message #87 received at [email protected] (full text, mbox, reply):
Someone asked for an example, here is one I've often seen when doing a release upgrade on many webservers I administer: Apache will fail to start. I don't recall if that currently causes Apache postinst to fail, but if not, it really ought to continue. Apache has a complicated config, and upstream makes backwards-incompatible changes often enough that every Debian release seems to have some. It's often not possible to automatically update the config (and even if it were, the variety of configuration management systems in use mean you wouldn't want that to happen automatically). It's much easier to fix after the upgrade. And to the extent anything depends on Apache, Apache being completely broken doesn't generally break them (unless they try to restart apache themselves, e.g., apache modules). Now, if my local DNS cache failed to start, that needs to be fixed before continuing (since, e.g., even apt-get won't work). Same with an LDAP (etc.) server, you may no longer have user accounts. Some things definitely lead to a cascade of failures. I think in an ideal world, there would be two separate failure states for postinst: one for failed but probably safe to continue the upgrade, one for failed and probably going to cause a cascade of failures (or worse). dpkg (and the various frontends) would let you know about fail-but-continue errors after finishing, and maybe before starting, but still continue to work. At least for daemon failed to start and with systemd, we already can have pretty close: have the postinst ignore the failed to start error (when it's of the safe to continue the upgrade variety), then use `systemctl --failed` to get the list of daemons that failed to start.
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Sun, 07 Oct 2018 10:51:05 GMT) (full text, mbox, link).
Acknowledgement sent
to Simon McVittie <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Sun, 07 Oct 2018 10:51:05 GMT) (full text, mbox, link).
Message #92 received at [email protected] (full text, mbox, reply):
Attempting to summarize what was said on this topic in the thread so far, and at the last technical committee meeting: It's perhaps important to note that we are not discussing ideal situations here: any time this conversation becomes relevant, something is already wrong. We're aiming to recommend the lesser evil, rather than something actually desirable. One of the points of view here is Ian and Wouter's assertion that whenever a service fails to restart in a maintainer script, the most important thing is to make sure the sysadmin pays attention and fixes it before proceeding. Julien Cristau made another point in support of "failure to restart implies failure to configure" on IRC, namely that the only straightforward thing for an automated upgrade to do is to look at the successful or failed exit status of the package manager (whether that means dpkg, apt, unattended-upgrades or whatever), and assume that exiting 0 means everything is fine and exiting nonzero means attention is required. At the opposite extreme, Marga's team manages thousands of desktops, and having to do *anything* manual to any significant number of them doesn't scale. We can think of inexperienced users' desktops as a bit like this scenario too, except that instead of having a professional sysadmin, they have to ask volunteers for help through channels like debian-user and #debian (and those volunteers' help doesn't really scale well either). It's also undesirable if the mechanism we use to escalate the failure to the user is one that itself makes it harder to diagnose or fix the problem, and in particular there's a concern that when packages fail to configure, that can make it harder to use apt to install the necessary tools to diagnose what has gone wrong; Stuart points out that in his experience of helping people in #debian, this is a practical problem. Ian considers it to be design flaw in apt that the actions the user can take while a package is unconfigured are so constrained; however, we work with the tools we have, not the tools we'd like to have. We seem to have consensus among the technical committee that it is at least occasionally appropriate for failure to restart to cause failure to configure, although this might be the exception rather than the rule. The examples given where the error path is most important were packages that provide a system-level API to other packages, so their failures are likely to cause other packages to fail to configure (such as local DNS caches and authentication services like LDAP); and packages that provide remote access, so their failures need to be fixed before a potentially remote sysadmin logs out to prevent the sysadmin from being locked out longer-term (like sshd). I'm not sure whether we have a concrete example yet of packages at the opposite extreme, that are the least important to be able to restart. I'd like to propose the game servers that I maintain, like openarena-server, as a concrete example here: I hope we can agree that inability to capture the flag does not justify getting the package management system into a problematic state? :-) (I think this is currently a bug in those packages, but I'm not going to fix it until we have consensus here.) There's a general feeling among the technical committee that a package failing to configure is far from a user-friendly way to signal errors: Phil's memorable analogy was that it's like telling a car driver that they are low on fuel by having the wheels fall off. Historically, we had few other ways to manage service failures, and perhaps when all you have is a hammer, everything looks like the Failed-Config state; but in a default Debian installation we now have a service manager that monitors the state of all services at all times (not just when they happen to be upgraded) and collects their stderr at all times (not just writing it to the console during boot, and dpkg's stderr during upgrades). Even before we considered non-sysv init systems, monitoring systems like Nagios were available. It's perhaps also worth noting that most services, if they fail during boot rather than during upgrade, don't cause a drastic reaction. Historically, initscripts would (attempt to) carry on regardless from just about any failure mode, including failure of services that ought to be considered critical-path. With systemd as default, our default init system does have a more dramatic response to certain failures (going to an emergency-mode shell), but it only does that for a very limited subset of services (fsck and mount on required filesystems, according to the man page). As Anthony points out, we could benefit from there being a way for packages to report "something is wrong, but carry on anyway": continuing to get the system into the least-degraded state possible, but then arranging for dpkg/apt to exit with a nonzero status so that automated systems can detect that something is not right. However, this mechanism does not currently exist. One possible implementation for the default init system might be an apt Dpkg::Post-Invoke hook that runs `systemctl is-system-running` and, if the result is not success, `systemctl list-units --failed`. An init-system-agnostic implementation would require some other convention for maintainer scripts to signal partial success (or non-fatal failure, depending how you look at it) to apt/dpkg. During the technical committee IRC meeting, we considered whether the recommendation to "set -e" in maintainer scripts was consistent with considering a maintainer script failing to be a Very Bad Thing. We concluded that even if we want to disregard most or all failed service restarts, it is still good to "set -e", because if something does go wrong (for instance a typo in the maintainer script, a system that is already seriously broken, or some other unforeseen circumstance), we want the maintainer script to fail safe: stop what it's doing, rather than carry on regardless. If a particular failure is something we can reasonably predict, reason about and tolerate (as we are arguing failure to restart a service is, at least sometimes) then someone should make a conscious decision to add "|| true" (or preferably "|| some-failure-reporting-mechanism") to that command. Finally, here are the debhelper mechanisms that most packages use to manage their services, which I think represent the status quo: * dh_installinit: defaults to "failure to (re)start is failure to configure", but can be overridden with --error-handler; some packages set the error handler to "true" (e.g. apache2, isc-dhcp) or to a custom shell function (e.g. krb5, samba). This is used for LSB init scripts, and for systemd units that have a corresponding LSB init script. * dh_systemd_start: unconditionally uses "|| true". This is only used for systemd units that *do not* have a corresponding LSB init script. A dh_installinit-style --error-handler would probably be a reasonable feature request. smcv
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Sun, 07 Oct 2018 15:33:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Sam Hartman <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Sun, 07 Oct 2018 15:33:03 GMT) (full text, mbox, link).
Message #97 received at [email protected] (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
>>>>> "Simon" == Simon McVittie <[email protected]> writes: Simon> the error path is most important were packages that provide a Simon> system-level API to other packages, so their failures are Simon> likely to cause other packages to fail to configure (such as Simon> local DNS caches and authentication services like LDAP); and Simon> packages that provide remote access, so their failures need Simon> to be fixed before a potentially remote sysadmin logs out to Simon> prevent the sysadmin from being locked out longer-term (like Simon> sshd). As a maintainer of one of the more important packages (krb5-kdc and krb5-admin-server), ;I'd like to chime in here. krb5-kdc provides enterprise level authentication and if it fails may well take out authentication for an entire environment. Even so, I've found that causing upgrades to fail does far more harm than good even for this package. Here is my experience based on my own observations and based on bug reports and helping people diagnose problems in krb5: * The vast majority of failures are when krb5-kdc gets installed on a system where it is not actually needed, or where it was partially configured for a test. In these cases, breaking an kupgrade does much more harm than good. It may break other services, because those services may end up in a half-configured state, so a service that is not critical for a given system may break critical services for that system. * When krb5 is a critical service, it's failure is going to be quite obvious regardless of whatever the maint script does. * It is almost always the case that debugging the situation involves installing some package and that the first thing I end up doing is walking a user through adding exit 0 at the top of postinst in /var/lib/dpkg/info before going forward. Even if I don't need some additional tool, I've been burned by other parts of the system being in half-configured state. * Leaving large chunks of the system in half-configured states is about one of the worst things you can do for system stability. It's not something we test very often, and the interactions are very difficult to predict. If I understood the cause of an error in a maintainer script and knew that it indicated a problem that the sysadmin needed to fix (and one that likely indicated krb5 was important on this system) I would be open to returning a failure in postinst. In almost all other situations I'd rather simply let the service fail to start.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Tue, 09 Oct 2018 08:54:06 GMT) (full text, mbox, link).
Acknowledgement sent
to Wouter Verhelst <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Tue, 09 Oct 2018 08:54:06 GMT) (full text, mbox, link).
Message #102 received at [email protected] (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Hi Simon, Thanks for your summary. On Sun, Oct 07, 2018 at 11:49:09AM +0100, Simon McVittie wrote: > Attempting to summarize what was said on this topic in the thread so > far, and at the last technical committee meeting: > > It's perhaps important to note that we are not discussing ideal situations > here: any time this conversation becomes relevant, something is already > wrong. We're aiming to recommend the lesser evil, rather than something > actually desirable. > > One of the points of view here is Ian and Wouter's assertion that > whenever a service fails to restart in a maintainer script, the most > important thing is to make sure the sysadmin pays attention and fixes > it before proceeding. > > Julien Cristau made another point in support of "failure to restart > implies failure to configure" on IRC, namely that the only straightforward > thing for an automated upgrade to do is to look at the successful or > failed exit status of the package manager (whether that means dpkg, > apt, unattended-upgrades or whatever), and assume that exiting 0 means > everything is fine and exiting nonzero means attention is required. I think this is the core of the issue: it is incorrect to state that when a service restart was successful, that then everything was fine. There was a problem. We currently don't have a way to distinguish between "there was a terrible problem and the sky is going to fall" and "there was a problem but you might want ignore it", so technically the only correct thing to do is to exit with a nonzero exit state, signalling a problem. Put otherwise, I think that if the following preconditions are true: 1. The service was running before the package upgrade 2. The package's postinst wants to restart the daemon 3. After the package upgrade, the service fails to start again Then that means the package upgrade broke something, and the system administrator should be informed of that fact. We currently have only one *certain* avenue to inform the system administrator, and that is through producing a nonzero exit state from apt. A debconf error or message to stdout or stderr would work too in some cases, but the first is not always shown and the second might scroll by too fast to be noticeable, so it is not a certain way to tell the system administrator. As such, exiting nonzero is the only avenue open to maintainers to do the right thing. Having said all that... > At the opposite extreme, Marga's team manages thousands of desktops, > and having to do *anything* manual to any significant number of them > doesn't scale. We can think of inexperienced users' desktops as a bit > like this scenario too, except that instead of having a professional > sysadmin, they have to ask volunteers for help through channels like > debian-user and #debian (and those volunteers' help doesn't really scale > well either). It's also undesirable if the mechanism we use to escalate > the failure to the user is one that itself makes it harder to diagnose or > fix the problem, and in particular there's a concern that when packages > fail to configure, that can make it harder to use apt to install the > necessary tools to diagnose what has gone wrong; Stuart points out that in > his experience of helping people in #debian, this is a practical problem. It is true that there is a larger picture, and that in some environments, breaking all future upgrades is way more problematic than not restarting a service once. This is arguably a bug in apt though, and it feels wrong to me to "fix" such an issue by introducing what is essentially a workaround in multiple unrelated places; if then the problem gets fixed properly, we would have to go around the whole system to undo the workarounds again, which would be a sad state of affairs. I can think of some alternatives that could be done and that would work towards a resolution (rather than a workaround) for this problem: - The policy-rc.d interface could be extended to allow it to signal a "restart, but do not fail on error" kind of policy. This would work for the "we have thousands of desktops and don't care about a service failing to restart" kind of enviromnent. - Apt could be fixed so that when a package fails to configure, it would still be impossible to install and/or configure reverse-dependencies of the failing package, but not of packages that are unrelated. This would help the "users asking in our support channels can't install diagnostic tools to investigate" kind of situation. - A new state could be created in dpkg to signal "configuration failed, but package will work for dependencies". When this is the case, apt should inform the user that configuration of some package failed and that they might want to investigate, but should not refuse to install and/or configure other packages, even reverse dependencies of the failing package. This feels right, but I can't come up with a good example of the kind of situation which this would fix; perhaps that's not a good sign. Some of these will require more work than others; but "requires more work" by itself has never been a good enough reason not to do something in Debian. > Ian considers it to be design flaw in apt that the actions the user > can take while a package is unconfigured are so constrained; however, > we work with the tools we have, not the tools we'd like to have. I do not think this argument holds merit. By the same argument, the tools we have are maintainer scripts with nonzero exit state, and we should keep those and fix the infrastructure around them. The TC should make a decision based on what the correct technical outcome is, not based on what the current state of affairs is. If that means the TC needs to recommend changes beyond what it was originally asked to do, then it should do so, rather than shirking away from that, because "the tools just don't work that way". All the tools have source code, and source code can be fixed. [...] > I'm not sure whether we have a concrete example yet of packages at the > opposite extreme, that are the least important to be able to restart. I'd > like to propose the game servers that I maintain, like openarena-server, > as a concrete example here: I hope we can agree that inability to capture > the flag does not justify getting the package management system into a > problematic state? :-) (I think this is currently a bug in those packages, > but I'm not going to fix it until we have consensus here.) While getting the package management system in a wholly problematic state is, indeed, a problem, I do think that "failure to restart openarena-server" might be a critical issue if the only reason you're paying for a VM or a dedicated server or whatnot is so that you and your friends (or your customers) can run openarena. As such, this really depends on the environment, and I think it is wrong for a maintainer to do anything but signal such failure in the appropriate way. > There's a general feeling among the technical committee that a package > failing to configure is far from a user-friendly way to signal errors: > Phil's memorable analogy was that it's like telling a car driver that they > are low on fuel by having the wheels fall off. Historically, we had few > other ways to manage service failures, and perhaps when all you have is > a hammer, everything looks like the Failed-Config state; but in a default > Debian installation we now have a service manager that monitors the state > of all services at all times (not just when they happen to be upgraded) > and collects their stderr at all times (not just writing it to the console > during boot, and dpkg's stderr during upgrades). Even before we considered > non-sysv init systems, monitoring systems like Nagios were available. Correct, but it is not correct to state that such monitoring systems are installed and available on *every* Debian system. If they are, then it is reasonable to reconfigure the system so that service restart is not considered a failure; this could be done with the policy-rc.d extension that I suggested earlier. However, in the absense of such configuration, the default course of action should be to signal that a problem has occurred, through the one way available (failing to configure the package). [...] > During the technical committee IRC meeting, we considered whether the > recommendation to "set -e" in maintainer scripts was consistent with > considering a maintainer script failing to be a Very Bad Thing. We > concluded that even if we want to disregard most or all failed service > restarts, it is still good to "set -e", because if something does go wrong > (for instance a typo in the maintainer script, a system that is already > seriously broken, or some other unforeseen circumstance), we want the > maintainer script to fail safe: stop what it's doing, rather than carry > on regardless. If a particular failure is something we can reasonably > predict, reason about and tolerate (as we are arguing failure to restart > a service is, at least sometimes) then someone should make a conscious > decision to add "|| true" (or preferably > "|| some-failure-reporting-mechanism") to that command. It is unclear to me how a typo in one file (postinst script) trumps a typo in another file (daemon configuration file causing failure to restart). Care to explain? > Finally, here are the debhelper mechanisms that most packages use to > manage their services, which I think represent the status quo: > > * dh_installinit: defaults to "failure to (re)start is failure to > configure", but can be overridden with --error-handler; some packages > set the error handler to "true" (e.g. apache2, isc-dhcp) or to a custom > shell function (e.g. krb5, samba). Perhaps the error handler should also be configurable by policy-rc.d, as I hinted to before. [...] > * dh_systemd_start: unconditionally uses "|| true". > This is only used for systemd units that *do not* have a corresponding > LSB init script. A dh_installinit-style --error-handler would probably > be a reasonable feature request. Same. -- Could you people please use IRC like normal people?!? -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Tue, 09 Oct 2018 11:27:06 GMT) (full text, mbox, link).
Acknowledgement sent
to Ian Jackson <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Tue, 09 Oct 2018 11:27:06 GMT) (full text, mbox, link).
Message #107 received at [email protected] (full text, mbox, reply):
Wouter Verhelst writes ("Re: Bug#904558: What should happen when maintscripts fail to restart a service"): > Perhaps the error handler should also be configurable by policy-rc.d, as > I hinted to before. I think this is a key point. We do not have to make a single decision which everyone has to be happy with. We can instead continue to be all things to all people. I think the best answer would be: * Individual maintainers decide for themselves whether to treat service (re)start failure as postinst failure, based on their own perception; maintainers may make different decisions for different init systems. * If the maintainer has no particular reason to diverge the right answer is usually to fail the postinst with init systems that do not provide service supervision; but to not fail the postinst with ones that do. (I think from earlier messages that this is how the default implementations already work.) * The administrator should be able to override this policy question globally for the whole system, or on a per-package basis. This is probably a manageable amount of actual work: the prescription for individual package sis roughly what they do right now. The support for configuration in something like policy-rc.d has a few design decisions to be made but doesn't seem really difficult. Also nothing blocks on it. The TC would simply be saying "this would be a good thing to have". Ian. -- Ian Jackson <[email protected]> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Tue, 09 Oct 2018 15:21:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Sam Hartman <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Tue, 09 Oct 2018 15:21:03 GMT) (full text, mbox, link).
Message #112 received at [email protected] (full text, mbox, reply):
>>>>> "Ian" == Ian Jackson <[email protected]> writes: Ian> * If the maintainer has no particular reason to diverge the Ian> right answer is usually to fail the postinst with init systems Ian> that do not provide service supervision; but to not fail the Ian> postinst with ones that do. (I think from earlier messages Ian> that this is how the default implementations already work.) So, it's not really the case that this is the default for init systems today, and that actually has some important historical significance and implications for perceived user-facing changes. It's absolutely been the case that if an init script (init.d lsb script) fails, the default behavior was to fail the postinst. However, start-stop-daemon did not detect a lot of failures, especially after fork. So, there are all sorts of things that caused daemons to fail to start that used to not cause postinst failures. I don't know what the default is today, but certainly for Jessie and for a lot of the stretch cycle, dh_installinit would fail the postinst whenever systemctl failed to start or restart a service. Now, depending on how you wrote your service units, you might get the same behavior as with sysvinit. But you probably didn't do that. So, suddenly, a whole bunch more conditions started showing up as things that caused postinst to fail. If somewhere in stretch and with the migration from dh_installinit for service units fto dh_systemd_*, we managed to change the default, then we're probably reasonably close to what happened in the pre-systemd days. And that was reasonably OK.
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Tue, 09 Oct 2018 18:39:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Wouter Verhelst <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Tue, 09 Oct 2018 18:39:03 GMT) (full text, mbox, link).
Message #117 received at [email protected] (full text, mbox, reply):
On Tue, Oct 09, 2018 at 10:52:15AM +0200, Wouter Verhelst wrote: > - The policy-rc.d interface could be extended to allow it to signal a > "restart, but do not fail on error" kind of policy. This would work > for the "we have thousands of desktops and don't care about a service > failing to restart" kind of enviromnent. Wanting to investigate this a bit further, I find that, actually, such a possibility already exists. According to "man invoke-rc.d", policy-rc.d can exit with exit state 106 and provide a number of actions on stdout. These are then actions that invoke-rc.d must try in order "until one of them succeeds". As such, a policy-rc.d implementation written like so: #!/bin/sh if [ "$1" != ssh ] then exit 0 fi echo "$2 stop" exit 106 would result in the system attempting whatever init script action was being asked for, followed by a "stop" action (except in the case of the "ssh" service, which must not fail before we close a shell, ever). This assumes that a "stop" action when the daemon fails to start will be successful; I don't know whether all init scripts in Debian act that way, but I do think that they should. If they do, then this will cause mean that init scripts which fail will not cause general packaging unhappiness. With that background, IMHO the proper reply to this question before the committee is that yes, postinst scripts should fail when an init script fails, but we should also better document the policy-rc.d interface to point out that the above is possible and can be done where it makes sense. If long-time Debian Developers (not just me, but also the members of the committee) do not know well how it works, then clearly it is underdocumented! (Having said that, I haven't tested any of this, so it is certainly possible that the implementation does not match the documentation...) -- Could you people please use IRC like normal people?!? -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Wed, 10 Oct 2018 06:57:04 GMT) (full text, mbox, link).
Acknowledgement sent
to Wouter Verhelst <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Wed, 10 Oct 2018 06:57:04 GMT) (full text, mbox, link).
Message #122 received at [email protected] (full text, mbox, reply):
I must stop writing emails when tired... On Tue, Oct 09, 2018 at 08:35:33PM +0200, Wouter Verhelst wrote: > On Tue, Oct 09, 2018 at 10:52:15AM +0200, Wouter Verhelst wrote: > > - The policy-rc.d interface could be extended to allow it to signal a > > "restart, but do not fail on error" kind of policy. This would work > > for the "we have thousands of desktops and don't care about a service > > failing to restart" kind of enviromnent. > > Wanting to investigate this a bit further, I find that, actually, such a > possibility already exists. > > According to "man invoke-rc.d", policy-rc.d can exit with exit state 106 > and provide a number of actions on stdout. These are then actions that > invoke-rc.d must try in order "until one of them succeeds". As such, a > policy-rc.d implementation written like so: > > #!/bin/sh > > if [ "$1" != ssh ] That is, of course, a logic inversion. Whoops. > then > exit 0 For clarity, this means that whatever action was requested would be allowed; and so if things fail they will cause the init script to fail, too. > fi > echo "$2 stop" > exit 106 -- Could you people please use IRC like normal people?!? -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Wed, 17 Oct 2018 20:51:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Simon McVittie <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Wed, 17 Oct 2018 20:51:03 GMT) (full text, mbox, link).
Message #127 received at [email protected] (full text, mbox, reply):
On Tue, 09 Oct 2018 at 20:35:33 +0200, Wouter Verhelst wrote: > According to "man invoke-rc.d", policy-rc.d can exit with exit state 106 > and provide a number of actions on stdout. These are then actions that > invoke-rc.d must try in order "until one of them succeeds". As such, a > policy-rc.d implementation written like so: > > #!/bin/sh > > if [ "$1" = ssh ] # logic error fixed as per subsequent mail > then > exit 0 > fi > echo "$2 stop" > exit 106 > > would result in the system attempting whatever init script action was > being asked for, followed by a "stop" action (except in the case of the > "ssh" service, which must not fail before we close a shell, ever). This > assumes that a "stop" action when the daemon fails to start will be > successful If I'm reading invoke-rc.d correctly, this is implemented (in a cross-init way), but probably doesn't interact well with the logic that avoids (re)starting services that are disabled, because that doesn't consider "restart stop" to match "restart". Obviously, if I'm right about that limitation, then that's a bug, and bugs can be fixed. However, it makes me concerned that the exit status 106 thing is not well-understood or well-tested, even by invoke-rc.d maintainers. Packages that have systemd units with no corresponding LSB init script (not necessarily services - timer, socket, path and (auto)mount units are also units) use deb-systemd-invoke instead of invoke-rc.d. deb-systemd-invoke doesn't implement the full generality of the policy-rc.d interface, but only 0, 101 and 104 (in particular not 106). That would be a reasonable feature request, particularly if we want to encourage this route, but it isn't currently implemented. While discussing this on IRC we wondered whether maintainer scripts that restart services should be normally be using an interface that is analogous to "systemctl try-restart", namely: check whether the service is running, then restart it if it was. (This can't work for maintainer scripts that stop the service in prerm and start it in postinst, but that is no longer the default behaviour in recent debhelper compat levels.) However, both dh_installinit and dh_installsystemd currently use plain "restart", so if the service is not running (possibly because it's already broken), it will usually be started. > With that background, IMHO the proper reply to this question before the > committee is that yes, postinst scripts should fail when an init script > fails, but we should also better document the policy-rc.d interface to > point out that the above is possible and can be done where it makes > sense. This would solve Marga's use case with a very large fleet of machines maintained by a small number of sysadmins: they can install a policy-rc.d on all those machines that does the right thing. However, it leaves the default as "fail hard", which I'm not convinced is the most appropriate thing for systems that lack an experienced sysadmin (which are the systems where defaults matter most, because an inexperienced user is the least able to make an informed decision about where they should deviate from defaults). policy-rc.d also has some practical integration issues. It normally relies on putting an unpackaged file in /usr/sbin (unless you have installed policyrcd-script-zg2), and it's common for tools like debootstrap and debian-installer to create and delete policy-rc.d to suppress service startup while carrying out bootstrap operations. One Debian derivative that I'm involved in (SteamOS) is *meant* to have a policy-rc.d, but we recently discovered that it has always been deleted at the end of the debian-installer run, and so doesn't exist in practice. smcv
Information forwarded
to [email protected], Technical Committee <[email protected]>
:
Bug#904558
; Package tech-ctte
.
(Thu, 18 Oct 2018 08:57:06 GMT) (full text, mbox, link).
Acknowledgement sent
to Wouter Verhelst <[email protected]>
:
Extra info received and forwarded to list. Copy sent to Technical Committee <[email protected]>
.
(Thu, 18 Oct 2018 08:57:06 GMT) (full text, mbox, link).
Message #132 received at [email protected] (full text, mbox, reply):
Hi, On Wed, Oct 17, 2018 at 09:47:57PM +0100, Simon McVittie wrote: > However, it leaves the default as "fail hard", which I'm not convinced > is the most appropriate thing for systems that lack an experienced > sysadmin (which are the systems where defaults matter most, because an > inexperienced user is the least able to make an informed decision about > where they should deviate from defaults). I think that's where we disagree, so allow me to focus on that. I think everyone would agree that when a service fails to (re)start upon package installation or upgrade, that there is a problem and that this problem needs to be reported in whatever way is most appropriate (if not, we have a bigger disagreement than I thought and we need to take a step back ;-) The question that remains is "how". Currently, Debian has four ways of informing a system administrator of such failures: - Log a message to stdout and/or stderr. This is liable to scroll by unnoticed, and therefore is not a reliable way to inform the system administrator. For that reason, I don't think it's a good idea. - Log a message to syslog and/or the systemd journal. This will not scroll by, but relies on the system administrator to actively hunt for problems in system logs, which they probably won't do unless and until they notice that the daemon isn't running anymore (and by that time it may be too late). - Produce a debconf error note. This is mildly better than the above two, since debconf error notes are shown at highest priority, and therefore will only be hidden if debconf is configured to be noninteractive; in that case, debconf will send an email to root. On systems without a configured MTA, this will not help; and for daemons where failure to restart is a catastrophic that needs to be resolved ASAP, such as sshd, this might not be desirable. - Exit from postinst with nonzero exit state. This is unlikely to be missed by system administrators; however, it has several disadvantages that were pointed out by other people during this discussion. I think it is perfectly fine to have the TC say that "failures to restart a service must be reported, either by exiting nonzero, or by another appropriate action", without going in detail what those other actions could be. > policy-rc.d also has some practical integration issues. It normally relies > on putting an unpackaged file in /usr/sbin (unless you have installed > policyrcd-script-zg2), and it's common for tools like debootstrap and > debian-installer to create and delete policy-rc.d to suppress service > startup while carrying out bootstrap operations. One Debian derivative > that I'm involved in (SteamOS) is *meant* to have a policy-rc.d, but we > recently discovered that it has always been deleted at the end of the > debian-installer run, and so doesn't exist in practice. I think that problem is not something that should be resolved by this discussion. I'll readily admit that I did not actually test any of the suggestions I made wrt policy-rc.d. There are other issues with it too; I'm thinking of filing a wishlist bug to have it replaced by something better. On top of that, policy-rc.d has alwyas irked me as a bit of an awkward interface; it is the only type of Debian-specific configuration that does not go into /etc, but for which you need to write a script in /usr/sbin. This is confusing, as shown by debian-installer removing it unconditionally. In an ideal world, the policies currently implementable through policy-rc.d should be configuration snippets in a run-parts style directory. The "just drop a script somewhere" idea is a poorly-defined interface which is inflexible and inappropriate for the purpose of a distribution, but "policy-rc.d should be replaced by something better" is not an appropriate response to the question "what should happen when a service fails to restart in postinst". Also related to this problem is what happens with postinst failing for other reasons than "the daemon doesn't restart". While that is probably the most likely reason for postinst failures today, it is by no means the only one; so if you say "postinst failing because of daemon restart failing" is something that should not ever happen, I think you should then also make guidelines as to when, exactly, a postinst should be allowed to fail (and muck up the whole system). -- To the thief who stole my anti-depressants: I hope you're happy -- seen somewhere on the Internet on a photo of a billboard
Reply sent
to Margarita Manterola <[email protected]>
:
You have taken responsibility.
(Wed, 17 Apr 2019 19:45:04 GMT) (full text, mbox, link).
Notification sent
to Sean Whitton <[email protected]>
:
Bug acknowledged by developer.
(Wed, 17 Apr 2019 19:45:04 GMT) (full text, mbox, link).
Message #137 received at [email protected] (full text, mbox, reply):
Apologies for the long delay. We discussed this issue in several TC meetings without being able to make real progress. After several rounds of discussions we came to the conclusion that the reason why we can't make progress is that we always end up hitting the wall of "The Technical Committee does not engage in design of new proposals and policies". While we recognize that this is a problem worth fixing, this is not something that we can fix as a body and need the help of the Developers to do it. On the one hand, maintainers want to be able to notify sysadmins when things don't go as expected. On the other hand, sysadmins don't want their systems to be left in weird/broken states because one single thing didn't go as expected. A failing maintscript is a horrible way of notifying sysadmins, but it's the only one available up to now and so package maintainers use it when they think the failure is critical enough. So, the TC declines to rule on what should maintscripts do when failing to (re)start a service (or otherwise encountering a similarly serious problem). Instead, we recommend that a work group of developers is formed, to create a better mechanism of notification that can be used to let sysadmins know when things don't go as expected on their systems, without leaving the machines in weird/broken states. Given that this is a problem faced by many Linux distributions, it would be nice if this mechanism was developed and published in a non Debian specific way that made it also available for other distributions to use. Once that mechanism exists, we would strongly recommend that almost all failures use this mechanism, instead of failing maintscripts. -- Marga, on behalf of the Technical Committee
Bug archived.
Request was from Debbugs Internal Request <[email protected]>
to [email protected]
.
(Thu, 16 May 2019 07:27:01 GMT) (full text, mbox, link).
Send a report that this bug log contains spam.
Debbugs is free software and licensed under the terms of the GNU General Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.
Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.