Acknowledgement sent
to Jelmer Vernooij <[email protected]>:
New Bug report received and forwarded. Copy sent to Debian Lintian Maintainers <[email protected]>.
(Sun, 21 Mar 2021 02:27:03 GMT) (full text, mbox, link).
Subject: warn about watch files that use github and include full refs
Date: Sun, 21 Mar 2021 02:24:53 +0000
Package: lintian
Version: 2.104.0
Severity: normal
Some watch files are now broken because GitHub archive URLs now include the
full ref rather than the tag name. It would be great if lintian could warn when this was the case.
See e.g. the watch file for jupyter-core:
https://qa.debian.org/cgi-bin/watch?pkg=jupyter-core
which reports the current upstream version as refs/tags/4.7.1
-- System Information:
Debian Release: bullseye/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 5.10.0-4-amd64 (SMP w/2 CPU threads)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages lintian depends on:
ii binutils 2.35.2-2
ii bzip2 1.0.8-4
ii diffstat 1.64-1
ii dpkg 1.20.7.1
ii dpkg-dev 1.20.7.1
ii file 1:5.39-3
ii gettext 0.21-4
ii gpg 2.2.27-1
ii intltool-debian 0.35.0+20060710.5
ii libapt-pkg-perl 0.1.40
ii libarchive-zip-perl 1.68-1
ii libcapture-tiny-perl 0.48-1
ii libclass-xsaccessor-perl 1.19-3+b7
ii libclone-perl 0.45-1+b1
ii libconfig-tiny-perl 2.26-1
ii libcpanel-json-xs-perl 4.25-1+b1
ii libdata-dpath-perl 0.58-1
ii libdata-validate-___domain-perl 0.10-1.1
ii libdevel-size-perl 0.83-1+b2
ii libdpkg-perl 1.20.7.1
ii libemail-address-xs-perl 1.04-1+b3
ii libfile-basedir-perl 0.08-1
ii libfile-find-rule-perl 0.34-1
ii libfont-ttf-perl 1.06-1.1
ii libhtml-html5-entities-perl 0.004-1.1
ii libipc-run3-perl 0.048-2
ii libjson-maybexs-perl 1.004003-1
ii liblist-compare-perl 0.55-1
ii liblist-moreutils-perl 0.430-2
ii liblist-utilsby-perl 0.11-1
ii libmoo-perl 2.004004-1
ii libmoox-aliases-perl 0.001006-1.1
ii libnamespace-clean-perl 0.27-1
ii libpath-tiny-perl 0.118-1
ii libperlio-gzip-perl 0.19-1+b7
ii libproc-processtable-perl 0.59-2+b1
ii libsereal-decoder-perl 4.018+ds-1+b1
ii libsereal-encoder-perl 4.018+ds-1+b1
ii libtext-glob-perl 0.11-1
ii libtext-levenshteinxs-perl 0.03-4+b8
ii libtext-markdown-discount-perl 0.12-1+b1
ii libtext-xslate-perl 3.5.8-1+b1
ii libtime-duration-perl 1.21-1
ii libtime-moment-perl 0.44-1+b3
ii libtimedate-perl 2.3300-2
ii libtry-tiny-perl 0.30-1
ii libtype-tiny-perl 1.012001-2
ii libunicode-utf8-perl 0.62-1+b2
ii liburi-perl 5.08-1
ii libxml-libxml-perl 2.0134+dfsg-2+b1
ii libyaml-libyaml-perl 0.82+repack-1+b1
ii lzip 1.22-3
ii lzop 1.04-2
ii man-db 2.9.4-2
ii patchutils 0.4.2-1
ii perl [libdigest-sha-perl] 5.32.1-3
ii t1utils 1.41-4
ii unzip 6.0-26
ii xz-utils 5.2.5-2
lintian recommends no packages.
Versions of packages lintian suggests:
pn binutils-multiarch <none>
ii libtext-template-perl 1.59-1
-- no debconf information
Acknowledgement sent
to Felix Lechner <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>.
(Sun, 21 Mar 2021 04:48:02 GMT) (full text, mbox, link).
Subject: Re: Bug#985633: warn about watch files that use github and include
full refs
Date: Sat, 20 Mar 2021 21:44:07 -0700
Hi Jelmer,
On Sat, Mar 20, 2021 at 7:27 PM Jelmer Vernooij <[email protected]> wrote:
>
> https://qa.debian.org/cgi-bin/watch?pkg=jupyter-core
I saw the traffic on IRC where someone suggested we replace
.*archive/v?([0-9.]*).tar.gz
with
.*archive/.*/v?([0-9.]*).tar.gz
to fix at least 1,500 affected packages. Unfortunately, that may not
work for jupyter-core, which does not prefix tags with a "v" and for
which "(.*)" catches the slash (or maybe even slashes).
As a tool without network access, Lintian is not well positioned to
figure out, in general, whether a URL/regex combination works. Would
it be okay if Lintian instead issues two now classification tags?
The first would occur once per source. It shows the watch file URL and
the regular expression for HTML parsing, possibly followed by "debian
update" (or similar). The second tag would occur once for each of the
options selected, i.e. multiple times. Armed with that information,
the Janitor could probe the URL and figure out which parts need
fixing.
The watch file version is already available in UDD, as you know, so
you could reconstruct the watch file and perhaps even enlist 'uscan'
to help you.
The parsing for these components is in place. If it is time sensitive,
I could provide the new tags via UDD within 48 hours. What do you
think? Thank you!
Kind regards
Felix Lechner
Acknowledgement sent
to Gordon Ball <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>.
(Sun, 21 Mar 2021 08:48:07 GMT) (full text, mbox, link).
Subject: Re: warn about watch files that use github and include full refs
Date: Sun, 21 Mar 2021 08:34:58 +0000
I started a branch for lintian-brush here:
https://salsa.debian.org/chronitis/lintian-brush/-/tree/github-archive-url
(using a nonexistant lintian tag, so having a real one would definitely
be a first step).
However, it turned out to be a bit more complex than I first thought (or
hoped):
* Lots of unrelated test cases get broken (since it rewrites their watch
files)
* Lots of different ways of spelling the match pattern - amongst my
there were at least three (and subvariants of each)
- .*/archive/v([0-9.]+) # now matches nothing
- .*/archive/(.+) # now matches refs/tags/x.y.z
- .*/archive/@ANY_VERSION@ # now matches nothing
and the discussion on IRC suggested other cases too (adding a wildcard
for the new /refs/tags/ part, just matching @[email protected], etc.
* Unpreservable formatting in several of the test cases I was using
(continuation lines in comments?)
* What new pattern to actually write? The initial idea was just to
literally replace /archive/ with /archive/refs/tags/, which _should_
meet the idea of being conservative about what to fix (but might still
collide with hand-written fixes for this issue like ./archive/.*/v...
I _think_ a good indicator for lintian (and a fixer) would be if the
matching expression contains "archive" followed by no wildcard pattern
before the capturing group for the version.
Let me know if this makes sense to develop further.
Gordon
Acknowledgement sent
to Jelmer Vernooij <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>.
(Mon, 22 Mar 2021 01:33:03 GMT) (full text, mbox, link).
Hi Felix,
On Sat, Mar 20, 2021 at 09:44:07PM -0700, Felix Lechner wrote:
> On Sat, Mar 20, 2021 at 7:27 PM Jelmer Vernooij <[email protected]> wrote:
> >
> > https://qa.debian.org/cgi-bin/watch?pkg=jupyter-core
>
> I saw the traffic on IRC where someone suggested we replace
>
> .*archive/v?([0-9.]*).tar.gz
>
> with
>
> .*archive/.*/v?([0-9.]*).tar.gz
>
> to fix at least 1,500 affected packages. Unfortunately, that may not
> work for jupyter-core, which does not prefix tags with a "v" and for
> which "(.*)" catches the slash (or maybe even slashes).
>
> As a tool without network access, Lintian is not well positioned to
> figure out, in general, whether a URL/regex combination works. Would
> it be okay if Lintian instead issues two now classification tags?
>
> The first would occur once per source. It shows the watch file URL and
> the regular expression for HTML parsing, possibly followed by "debian
> update" (or similar). The second tag would occur once for each of the
> options selected, i.e. multiple times. Armed with that information,
> the Janitor could probe the URL and figure out which parts need
> fixing.
I was hoping that lintian could verify that there is at least
something after "/archive/" in the matching pattern that could
match slashes without relying on the main regex group - that could be
done without querying GitHub. That said, that code would have to be
updated if GitHub changes again in the future and it may be somewhat
tricky code.
The offer for informational tags is appreciated, but as you say - the
data is already available in UDD so just providing the pure uscan
contents wouldn't help much.
The alternative is to just let lintian-brush work without a signal
from lintian, and gradually grind through the archive. That'll work
too, though it'll take a few months - and we lose the verification
from lintian after the fix.
Jelmer
--
Jelmer Vernooij <[email protected]>
PGP Key: https://www.jelmer.uk/D729A457.asc
Acknowledgement sent
to Jelmer Vernooij <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>.
(Mon, 22 Mar 2021 04:12:02 GMT) (full text, mbox, link).
On Sun, Mar 21, 2021 at 08:34:58AM +0000, Gordon Ball wrote:
> I started a branch for lintian-brush here:
> https://salsa.debian.org/chronitis/lintian-brush/-/tree/github-archive-url
>
> (using a nonexistant lintian tag, so having a real one would definitely
> be a first step).
>
> However, it turned out to be a bit more complex than I first thought (or
> hoped):
>
> * Lots of unrelated test cases get broken (since it rewrites their watch
> files)
> * Lots of different ways of spelling the match pattern - amongst my
> there were at least three (and subvariants of each)
>
> - .*/archive/v([0-9.]+) # now matches nothing
> - .*/archive/(.+) # now matches refs/tags/x.y.z
> - .*/archive/@ANY_VERSION@ # now matches nothing
>
> and the discussion on IRC suggested other cases too (adding a wildcard
> for the new /refs/tags/ part, just matching @[email protected], etc.
> * Unpreservable formatting in several of the test cases I was using
> (continuation lines in comments?)
> * What new pattern to actually write? The initial idea was just to
> literally replace /archive/ with /archive/refs/tags/, which _should_
> meet the idea of being conservative about what to fix (but might still
> collide with hand-written fixes for this issue like ./archive/.*/v...
>
> I _think_ a good indicator for lintian (and a fixer) would be if the
> matching expression contains "archive" followed by no wildcard pattern
> before the capturing group for the version.
Thanks! I've merged your branch with some additional changes:
* reformatted the watch files in the examples to use a format that
debian/watch can preserve
* changed the logic to follow your last suggestion
Ideally the other ways of formatting debian/watch would be handled
too, but that's something that is a work in progress and needs to be
fixed in debmutate.watch.
Jelmer
--
Jelmer Vernooij <[email protected]>
PGP Key: https://www.jelmer.uk/D729A457.asc
Acknowledgement sent
to Felix Lechner <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>.
(Mon, 22 Mar 2021 04:45:03 GMT) (full text, mbox, link).
Subject: Re: Bug#985633: warn about watch files that use github and include
full refs
Date: Sun, 21 Mar 2021 21:40:32 -0700
Hi Jelmer,
On Sun, Mar 21, 2021 at 6:30 PM Jelmer Vernooij <[email protected]> wrote:
>
> I was hoping that lintian could verify that there is at least
> something after "/archive/" in the matching pattern
Could Lintian-brush or the Janitor do so, when Lintian provides the string?
Kind regards
Felix Lechner
Acknowledgement sent
to Jelmer Vernooij <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>.
(Mon, 22 Mar 2021 13:21:02 GMT) (full text, mbox, link).
On Sun, Mar 21, 2021 at 09:40:32PM -0700, Felix Lechner wrote:
> On Sun, Mar 21, 2021 at 6:30 PM Jelmer Vernooij <[email protected]> wrote:
> >
> > I was hoping that lintian could verify that there is at least
> > something after "/archive/" in the matching pattern
> Could Lintian-brush or the Janitor do so, when Lintian provides the string?
It could, but at that point there isn't much value in having lintian
involved - the janitor could just get that directly from UDD.
The main value in having lintian in the loop here is:
* lintian actively goes out and discovers issues (especially on fully
built packages), so that lintian-brush runs can be prioritized.
* so we can verify that issues that lintian found and lintian-brush
thought it fixed are actually fixed
Not all issues can be fixed by the janitor either, so it would be
useful to have a lintian tag for those packages that are affected but
can't be fixed.
--
Jelmer Vernooij <[email protected]>
PGP Key: https://www.jelmer.uk/D729A457.asc
Acknowledgement sent
to Felix Lechner <[email protected]>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <[email protected]>.
(Mon, 22 Mar 2021 15:39:02 GMT) (full text, mbox, link).
Subject: Re: Bug#985633: warn about watch files that use github and include
full refs
Date: Mon, 22 Mar 2021 08:35:03 -0700
Hi Jelmer,
On Mon, Mar 22, 2021 at 6:17 AM Jelmer Vernooij <[email protected]> wrote:
>
> the janitor could just get that directly from UDD.
That might be better. According to mapreri:
> there is a column with uscan errors
> that information is actually in the "warnings" column, apparently
> udd=> select distinct warnings from upstream where warnings like '%%github%%';
> 2598 rows
Kind regards
Felix Lechner
Debbugs is free software and licensed under the terms of the GNU General
Public License version 2. The current version can be obtained
from https://bugs.debian.org/debbugs-source/.