Debian Bug report logs - #985633
warn about watch files that use github and include full refs

version graph

Package: lintian; Maintainer for lintian is Debian Lintian Maintainers <[email protected]>; Source for lintian is src:lintian (PTS, buildd, popcon).

Reported by: Jelmer Vernooij <[email protected]>

Date: Sun, 21 Mar 2021 02:27:02 UTC

Severity: normal

Found in version lintian/2.104.0

Reply or subscribe to this bug.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to [email protected], Debian Lintian Maintainers <[email protected]>:
Bug#985633; Package lintian. (Sun, 21 Mar 2021 02:27:03 GMT) (full text, mbox, link).


Acknowledgement sent to Jelmer Vernooij <[email protected]>:
New Bug report received and forwarded. Copy sent to Debian Lintian Maintainers <[email protected]>. (Sun, 21 Mar 2021 02:27:03 GMT) (full text, mbox, link).


Message #5 received at [email protected] (full text, mbox, reply):

From: Jelmer Vernooij <[email protected]>
To: Debian Bug Tracking System <[email protected]>
Cc: [email protected]
Subject: warn about watch files that use github and include full refs
Date: Sun, 21 Mar 2021 02:24:53 +0000
Package: lintian
Version: 2.104.0
Severity: normal

Some watch files are now broken because GitHub archive URLs now include the
full ref rather than the tag name. It would be great if lintian could warn when this was the case.

See e.g. the watch file for jupyter-core:

https://qa.debian.org/cgi-bin/watch?pkg=jupyter-core

which reports the current upstream version as refs/tags/4.7.1

-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-4-amd64 (SMP w/2 CPU threads)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages lintian depends on:
ii  binutils                        2.35.2-2
ii  bzip2                           1.0.8-4
ii  diffstat                        1.64-1
ii  dpkg                            1.20.7.1
ii  dpkg-dev                        1.20.7.1
ii  file                            1:5.39-3
ii  gettext                         0.21-4
ii  gpg                             2.2.27-1
ii  intltool-debian                 0.35.0+20060710.5
ii  libapt-pkg-perl                 0.1.40
ii  libarchive-zip-perl             1.68-1
ii  libcapture-tiny-perl            0.48-1
ii  libclass-xsaccessor-perl        1.19-3+b7
ii  libclone-perl                   0.45-1+b1
ii  libconfig-tiny-perl             2.26-1
ii  libcpanel-json-xs-perl          4.25-1+b1
ii  libdata-dpath-perl              0.58-1
ii  libdata-validate-___domain-perl    0.10-1.1
ii  libdevel-size-perl              0.83-1+b2
ii  libdpkg-perl                    1.20.7.1
ii  libemail-address-xs-perl        1.04-1+b3
ii  libfile-basedir-perl            0.08-1
ii  libfile-find-rule-perl          0.34-1
ii  libfont-ttf-perl                1.06-1.1
ii  libhtml-html5-entities-perl     0.004-1.1
ii  libipc-run3-perl                0.048-2
ii  libjson-maybexs-perl            1.004003-1
ii  liblist-compare-perl            0.55-1
ii  liblist-moreutils-perl          0.430-2
ii  liblist-utilsby-perl            0.11-1
ii  libmoo-perl                     2.004004-1
ii  libmoox-aliases-perl            0.001006-1.1
ii  libnamespace-clean-perl         0.27-1
ii  libpath-tiny-perl               0.118-1
ii  libperlio-gzip-perl             0.19-1+b7
ii  libproc-processtable-perl       0.59-2+b1
ii  libsereal-decoder-perl          4.018+ds-1+b1
ii  libsereal-encoder-perl          4.018+ds-1+b1
ii  libtext-glob-perl               0.11-1
ii  libtext-levenshteinxs-perl      0.03-4+b8
ii  libtext-markdown-discount-perl  0.12-1+b1
ii  libtext-xslate-perl             3.5.8-1+b1
ii  libtime-duration-perl           1.21-1
ii  libtime-moment-perl             0.44-1+b3
ii  libtimedate-perl                2.3300-2
ii  libtry-tiny-perl                0.30-1
ii  libtype-tiny-perl               1.012001-2
ii  libunicode-utf8-perl            0.62-1+b2
ii  liburi-perl                     5.08-1
ii  libxml-libxml-perl              2.0134+dfsg-2+b1
ii  libyaml-libyaml-perl            0.82+repack-1+b1
ii  lzip                            1.22-3
ii  lzop                            1.04-2
ii  man-db                          2.9.4-2
ii  patchutils                      0.4.2-1
ii  perl [libdigest-sha-perl]       5.32.1-3
ii  t1utils                         1.41-4
ii  unzip                           6.0-26
ii  xz-utils                        5.2.5-2

lintian recommends no packages.

Versions of packages lintian suggests:
pn  binutils-multiarch     <none>
ii  libtext-template-perl  1.59-1

-- no debconf information



Information forwarded to [email protected], Debian Lintian Maintainers <[email protected]>:
Bug#985633; Package lintian. (Sun, 21 Mar 2021 04:48:02 GMT) (full text, mbox, link).


Acknowledgement sent to Felix Lechner <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>. (Sun, 21 Mar 2021 04:48:02 GMT) (full text, mbox, link).


Message #10 received at [email protected] (full text, mbox, reply):

From: Felix Lechner <[email protected]>
To: Jelmer Vernooij <[email protected]>
Cc: [email protected], [email protected]
Subject: Re: Bug#985633: warn about watch files that use github and include full refs
Date: Sat, 20 Mar 2021 21:44:07 -0700
Hi Jelmer,

On Sat, Mar 20, 2021 at 7:27 PM Jelmer Vernooij <[email protected]> wrote:
>
> https://qa.debian.org/cgi-bin/watch?pkg=jupyter-core

I saw the traffic on IRC where someone suggested we replace

    .*archive/v?([0-9.]*).tar.gz

with

    .*archive/.*/v?([0-9.]*).tar.gz

to fix at least 1,500 affected packages. Unfortunately, that may not
work for jupyter-core, which does not prefix tags with a "v" and for
which "(.*)" catches the slash (or maybe even slashes).

As a tool without network access, Lintian is not well positioned to
figure out, in general, whether a URL/regex combination works. Would
it be okay if Lintian instead issues two now classification tags?

The first would occur once per source. It shows the watch file URL and
the regular expression for HTML parsing, possibly followed by "debian
update" (or similar). The second tag would occur once for each of the
options selected, i.e. multiple times. Armed with that information,
the Janitor could probe the URL and figure out which parts need
fixing.

The watch file version is already available in UDD, as you know, so
you could reconstruct the watch file and perhaps even enlist 'uscan'
to help you.

The parsing for these components is in place. If it is time sensitive,
I could provide the new tags via UDD within 48 hours. What do you
think? Thank you!

Kind regards
Felix Lechner



Information forwarded to [email protected], Debian Lintian Maintainers <[email protected]>:
Bug#985633; Package lintian. (Sun, 21 Mar 2021 08:48:07 GMT) (full text, mbox, link).


Acknowledgement sent to Gordon Ball <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>. (Sun, 21 Mar 2021 08:48:07 GMT) (full text, mbox, link).


Message #15 received at [email protected] (full text, mbox, reply):

From: Gordon Ball <[email protected]>
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: warn about watch files that use github and include full refs
Date: Sun, 21 Mar 2021 08:34:58 +0000
I started a branch for lintian-brush here:
https://salsa.debian.org/chronitis/lintian-brush/-/tree/github-archive-url

(using a nonexistant lintian tag, so having a real one would definitely
be a first step).

However, it turned out to be a bit more complex than I first thought (or
hoped):

* Lots of unrelated test cases get broken (since it rewrites their watch
  files)
* Lots of different ways of spelling the match pattern - amongst my
  there were at least three (and subvariants of each)

    - .*/archive/v([0-9.]+)    # now matches nothing
    - .*/archive/(.+)          # now matches refs/tags/x.y.z
    - .*/archive/@ANY_VERSION@ # now matches nothing

  and the discussion on IRC suggested other cases too (adding a wildcard
  for the new /refs/tags/ part, just matching @[email protected], etc.
* Unpreservable formatting in several of the test cases I was using
  (continuation lines in comments?)
* What new pattern to actually write? The initial idea was just to
  literally replace /archive/ with /archive/refs/tags/, which _should_
  meet the idea of being conservative about what to fix (but might still
  collide with hand-written fixes for this issue like ./archive/.*/v...

I _think_ a good indicator for lintian (and a fixer) would be if the
matching expression contains "archive" followed by no wildcard pattern
before the capturing group for the version.

Let me know if this makes sense to develop further.

Gordon



Information forwarded to [email protected], Debian Lintian Maintainers <[email protected]>:
Bug#985633; Package lintian. (Mon, 22 Mar 2021 01:33:02 GMT) (full text, mbox, link).


Acknowledgement sent to Jelmer Vernooij <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>. (Mon, 22 Mar 2021 01:33:03 GMT) (full text, mbox, link).


Message #20 received at [email protected] (full text, mbox, reply):

From: Jelmer Vernooij <[email protected]>
To: Felix Lechner <[email protected]>
Cc: [email protected], [email protected]
Subject: Re: Bug#985633: warn about watch files that use github and include full refs
Date: Mon, 22 Mar 2021 01:30:18 +0000
[Message part 1 (text/plain, inline)]
Hi Felix,

On Sat, Mar 20, 2021 at 09:44:07PM -0700, Felix Lechner wrote:
> On Sat, Mar 20, 2021 at 7:27 PM Jelmer Vernooij <[email protected]> wrote:
> >
> > https://qa.debian.org/cgi-bin/watch?pkg=jupyter-core
> 
> I saw the traffic on IRC where someone suggested we replace
> 
>     .*archive/v?([0-9.]*).tar.gz
> 
> with
> 
>     .*archive/.*/v?([0-9.]*).tar.gz
> 
> to fix at least 1,500 affected packages. Unfortunately, that may not
> work for jupyter-core, which does not prefix tags with a "v" and for
> which "(.*)" catches the slash (or maybe even slashes).
> 
> As a tool without network access, Lintian is not well positioned to
> figure out, in general, whether a URL/regex combination works. Would
> it be okay if Lintian instead issues two now classification tags?
> 
> The first would occur once per source. It shows the watch file URL and
> the regular expression for HTML parsing, possibly followed by "debian
> update" (or similar). The second tag would occur once for each of the
> options selected, i.e. multiple times. Armed with that information,
> the Janitor could probe the URL and figure out which parts need
> fixing.
I was hoping that lintian could verify that there is at least
something after "/archive/" in the matching pattern that could
match slashes without relying on the main regex group - that could be
done without querying GitHub. That said, that code would have to be
updated if GitHub changes again in the future and it may be somewhat
tricky code.

The offer for informational tags is appreciated, but as you say - the
data is already available in UDD so just providing the pure uscan
contents wouldn't help much.

The alternative is to just let lintian-brush work without a signal
from lintian, and gradually grind through the archive. That'll work
too, though it'll take a few months - and we lose the verification
from lintian after the fix.

Jelmer

-- 
Jelmer Vernooij <[email protected]>
PGP Key: https://www.jelmer.uk/D729A457.asc
[signature.asc (application/pgp-signature, inline)]

Information forwarded to [email protected], Debian Lintian Maintainers <[email protected]>:
Bug#985633; Package lintian. (Mon, 22 Mar 2021 04:12:02 GMT) (full text, mbox, link).


Acknowledgement sent to Jelmer Vernooij <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>. (Mon, 22 Mar 2021 04:12:02 GMT) (full text, mbox, link).


Message #25 received at [email protected] (full text, mbox, reply):

From: Jelmer Vernooij <[email protected]>
To: Gordon Ball <[email protected]>
Cc: [email protected], [email protected]
Subject: Re: warn about watch files that use github and include full refs
Date: Mon, 22 Mar 2021 04:08:56 +0000
[Message part 1 (text/plain, inline)]
On Sun, Mar 21, 2021 at 08:34:58AM +0000, Gordon Ball wrote:
> I started a branch for lintian-brush here:
> https://salsa.debian.org/chronitis/lintian-brush/-/tree/github-archive-url
> 
> (using a nonexistant lintian tag, so having a real one would definitely
> be a first step).
> 
> However, it turned out to be a bit more complex than I first thought (or
> hoped):
> 
> * Lots of unrelated test cases get broken (since it rewrites their watch
>   files)
> * Lots of different ways of spelling the match pattern - amongst my
>   there were at least three (and subvariants of each)
> 
>     - .*/archive/v([0-9.]+)    # now matches nothing
>     - .*/archive/(.+)          # now matches refs/tags/x.y.z
>     - .*/archive/@ANY_VERSION@ # now matches nothing
> 
>   and the discussion on IRC suggested other cases too (adding a wildcard
>   for the new /refs/tags/ part, just matching @[email protected], etc.
> * Unpreservable formatting in several of the test cases I was using
>   (continuation lines in comments?)
> * What new pattern to actually write? The initial idea was just to
>   literally replace /archive/ with /archive/refs/tags/, which _should_
>   meet the idea of being conservative about what to fix (but might still
>   collide with hand-written fixes for this issue like ./archive/.*/v...
> 
> I _think_ a good indicator for lintian (and a fixer) would be if the
> matching expression contains "archive" followed by no wildcard pattern
> before the capturing group for the version.
Thanks! I've merged your branch with some additional changes:

* reformatted the watch files in the examples to use a format that
  debian/watch can preserve
* changed the logic to follow your last suggestion

Ideally the other ways of formatting debian/watch would be handled
too, but that's something that is a work in progress and needs to be
fixed in debmutate.watch.

Jelmer

-- 
Jelmer Vernooij <[email protected]>
PGP Key: https://www.jelmer.uk/D729A457.asc
[signature.asc (application/pgp-signature, inline)]

Information forwarded to [email protected], Debian Lintian Maintainers <[email protected]>:
Bug#985633; Package lintian. (Mon, 22 Mar 2021 04:45:02 GMT) (full text, mbox, link).


Acknowledgement sent to Felix Lechner <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>. (Mon, 22 Mar 2021 04:45:03 GMT) (full text, mbox, link).


Message #30 received at [email protected] (full text, mbox, reply):

From: Felix Lechner <[email protected]>
To: Jelmer Vernooij <[email protected]>
Cc: Gordon Ball <[email protected]>, [email protected]
Subject: Re: Bug#985633: warn about watch files that use github and include full refs
Date: Sun, 21 Mar 2021 21:40:32 -0700
Hi Jelmer,

On Sun, Mar 21, 2021 at 6:30 PM Jelmer Vernooij <[email protected]> wrote:
>
> I was hoping that lintian could verify that there is at least
> something after "/archive/" in the matching pattern

Could Lintian-brush or the Janitor do so, when Lintian provides the string?

Kind regards
Felix Lechner



Information forwarded to [email protected], Debian Lintian Maintainers <[email protected]>:
Bug#985633; Package lintian. (Mon, 22 Mar 2021 13:21:02 GMT) (full text, mbox, link).


Acknowledgement sent to Jelmer Vernooij <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>. (Mon, 22 Mar 2021 13:21:02 GMT) (full text, mbox, link).


Message #35 received at [email protected] (full text, mbox, reply):

From: Jelmer Vernooij <[email protected]>
To: Felix Lechner <[email protected]>
Cc: Gordon Ball <[email protected]>, [email protected]
Subject: Re: Bug#985633: warn about watch files that use github and include full refs
Date: Mon, 22 Mar 2021 13:17:22 +0000
[Message part 1 (text/plain, inline)]
On Sun, Mar 21, 2021 at 09:40:32PM -0700, Felix Lechner wrote:
> On Sun, Mar 21, 2021 at 6:30 PM Jelmer Vernooij <[email protected]> wrote:
> >
> > I was hoping that lintian could verify that there is at least
> > something after "/archive/" in the matching pattern
> Could Lintian-brush or the Janitor do so, when Lintian provides the string?
It could, but at that point there isn't much value in having lintian
involved - the janitor could just get that directly from UDD.

The main value in having lintian in the loop here is:

 * lintian actively goes out and discovers issues (especially on fully
   built packages), so that lintian-brush runs can be prioritized.
 * so we can verify that issues that lintian found and lintian-brush
   thought it fixed are actually fixed

Not all issues can be fixed by the janitor either, so it would be
useful to have a lintian tag for those packages that are affected but
can't be fixed.

-- 
Jelmer Vernooij <[email protected]>
PGP Key: https://www.jelmer.uk/D729A457.asc
[signature.asc (application/pgp-signature, inline)]

Information forwarded to [email protected], Debian Lintian Maintainers <[email protected]>:
Bug#985633; Package lintian. (Mon, 22 Mar 2021 15:39:02 GMT) (full text, mbox, link).


Acknowledgement sent to Felix Lechner <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>. (Mon, 22 Mar 2021 15:39:02 GMT) (full text, mbox, link).


Message #40 received at [email protected] (full text, mbox, reply):

From: Felix Lechner <[email protected]>
To: Jelmer Vernooij <[email protected]>
Cc: Gordon Ball <[email protected]>, [email protected], Mattia Rizzolo <[email protected]>
Subject: Re: Bug#985633: warn about watch files that use github and include full refs
Date: Mon, 22 Mar 2021 08:35:03 -0700
Hi Jelmer,

On Mon, Mar 22, 2021 at 6:17 AM Jelmer Vernooij <[email protected]> wrote:
>
> the janitor could just get that directly from UDD.

That might be better. According to mapreri:

> there is a column with uscan errors
> that information is actually in the "warnings" column, apparently
> udd=> select distinct warnings from upstream where warnings like '%%github%%';
> 2598 rows

Kind regards
Felix Lechner



Send a report that this bug log contains spam.


Debian bug tracking system administrator <[email protected]>. Last modified: Tue May 13 09:31:19 2025; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU General Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.