Debian Bug report logs - #1037929
git grep fails with "Invalid collation character" for valid collation sequences in []

version graph

Package: git; Maintainer for git is Jonathan Nieder <[email protected]>; Source for git is src:git (PTS, buildd, popcon).

Reported by: наб <[email protected]>

Date: Wed, 14 Jun 2023 13:30:02 UTC

Severity: normal

Found in version git/1:2.40.1-1

Reply or subscribe to this bug.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to [email protected], Jonathan Nieder <[email protected]>:
Bug#1037929; Package git. (Wed, 14 Jun 2023 13:30:04 GMT) (full text, mbox, link).


Acknowledgement sent to наб <[email protected]>:
New Bug report received and forwarded. Copy sent to Jonathan Nieder <[email protected]>. (Wed, 14 Jun 2023 13:30:04 GMT) (full text, mbox, link).


Message #5 received at [email protected] (full text, mbox, reply):

From: наб <[email protected]>
To: Debian Bug Tracking System <[email protected]>
Subject: git grep fails with "Invalid collation character" for valid collation sequences in []
Date: Wed, 14 Jun 2023 15:26:09 +0200
[Message part 1 (text/plain, inline)]
Package: git
Version: 1:2.39.2-1.1
Version: 1:2.40.1-1
Severity: normal

Dear Maintainer,

  $ git grep '[а-я]'
  fatal: command line, '[а-я]': Invalid collation character
and even
  nabijaczleweli@tarta:~/code/voreutils$ locale
  LANG=en_GB.UTF-8
  LANGUAGE=en_GB:en
  LC_CTYPE="en_GB.UTF-8"
  LC_NUMERIC="en_GB.UTF-8"
  LC_TIME="en_GB.UTF-8"
  LC_COLLATE="en_GB.UTF-8"
  LC_MONETARY="en_GB.UTF-8"
  LC_MESSAGES="en_GB.UTF-8"
  LC_PAPER="en_GB.UTF-8"
  LC_NAME="en_GB.UTF-8"
  LC_ADDRESS="en_GB.UTF-8"
  LC_TELEPHONE="en_GB.UTF-8"
  LC_MEASUREMENT="en_GB.UTF-8"
  LC_IDENTIFICATION="en_GB.UTF-8"
  LC_ALL=
  $ git grep '[а-я]'
  fatal: command line, '[а-я]': Invalid collation character
  $ LC_ALL=ru_RU.UTF-8 locale
  LANG=en_GB.UTF-8
  LANGUAGE=en_GB:en
  LC_CTYPE="ru_RU.UTF-8"
  LC_NUMERIC="ru_RU.UTF-8"
  LC_TIME="ru_RU.UTF-8"
  LC_COLLATE="ru_RU.UTF-8"
  LC_MONETARY="ru_RU.UTF-8"
  LC_MESSAGES="ru_RU.UTF-8"
  LC_PAPER="ru_RU.UTF-8"
  LC_NAME="ru_RU.UTF-8"
  LC_ADDRESS="ru_RU.UTF-8"
  LC_TELEPHONE="ru_RU.UTF-8"
  LC_MEASUREMENT="ru_RU.UTF-8"
  LC_IDENTIFICATION="ru_RU.UTF-8"
  LC_ALL=ru_RU.UTF-8
  $ LC_ALL=ru_RU.UTF-8 git grep '[а-я]'
  fatal: command line, '[а-я]': Invalid collation character

This ought to match the entire modern russian alphabet,
and correctly does so under GNU grep and glibc regcomp(3)
on bookworm and sid.

I don't really see how а or я are "invalid" here;
oddly, git grep does accept '[ая]' &c.

Best,
наб

-- System Information:
Debian Release: 12.0
  APT prefers stable-security
  APT policy: (500, 'stable-security'), (500, 'stable-debug'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.1.0-9-amd64 (SMP w/24 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_FIRMWARE_WORKAROUND, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages git depends on:
ii  git-man          1:2.39.2-1.1
ii  libc6            2.36-9
ii  libcurl3-gnutls  7.88.1-10
ii  liberror-perl    0.17029-2
ii  libexpat1        2.5.0-1
ii  libpcre2-8-0     10.42-1
ii  perl             5.36.0-7
ii  zlib1g           1:1.2.13.dfsg-1

Versions of packages git recommends:
ii  ca-certificates              20230311
ii  less                         590-2
ii  openssh-client [ssh-client]  1:9.2p1-2
ii  patch                        2.7.6-7

Versions of packages git suggests:
ii  gettext-base                          0.21-12
pn  git-cvs                               <none>
pn  git-daemon-run | git-daemon-sysvinit  <none>
pn  git-doc                               <none>
pn  git-email                             <none>
pn  git-gui                               <none>
pn  git-mediawiki                         <none>
pn  git-svn                               <none>
pn  gitk                                  <none>
pn  gitweb                                <none>

-- no debconf information
[signature.asc (application/pgp-signature, inline)]

Information forwarded to [email protected], Jonathan Nieder <[email protected]>:
Bug#1037929; Package git. (Wed, 14 Jun 2023 13:54:06 GMT) (full text, mbox, link).


Acknowledgement sent to наб <[email protected]>:
Extra info received and forwarded to list. Copy sent to Jonathan Nieder <[email protected]>. (Wed, 14 Jun 2023 13:54:06 GMT) (full text, mbox, link).


Message #10 received at [email protected] (full text, mbox, reply):

From: наб <[email protected]>
To: [email protected]
Subject: Re: Bug#1037929: git grep fails with "Invalid collation character" for valid collation sequences in []
Date: Wed, 14 Jun 2023 15:52:21 +0200
[Message part 1 (text/plain, inline)]
  $ cat dumpregex.c
  #define _GNU_SOURCE
  #include <dlfcn.h>
  #include <locale.h>
  #include <regex.h>
  #include <stdio.h>
  
  int regcomp(regex_t * restrict preg, const char * restrict regex, int cflags) {
  	fprintf(stderr, "LC_ALL=%s\n", setlocale(LC_ALL, NULL));
  	fprintf(stderr, "LC_ADDRESS=%s\n", setlocale(LC_ADDRESS, NULL));
  	fprintf(stderr, "LC_COLLATE=%s\n", setlocale(LC_COLLATE, NULL));
  	fprintf(stderr, "LC_CTYPE=%s\n", setlocale(LC_CTYPE, NULL));
  	fprintf(stderr, "LC_IDENTIFICATION=%s\n", setlocale(LC_IDENTIFICATION, NULL));
  	fprintf(stderr, "LC_MEASUREMENT=%s\n", setlocale(LC_MEASUREMENT, NULL));
  	fprintf(stderr, "LC_MESSAGES=%s\n", setlocale(LC_MESSAGES, NULL));
  	fprintf(stderr, "LC_MONETARY=%s\n", setlocale(LC_MONETARY, NULL));
  	fprintf(stderr, "LC_NAME=%s\n", setlocale(LC_NAME, NULL));
  	fprintf(stderr, "LC_NUMERIC=%s\n", setlocale(LC_NUMERIC, NULL));
  	fprintf(stderr, "LC_PAPER=%s\n", setlocale(LC_PAPER, NULL));
  	fprintf(stderr, "LC_TELEPHONE=%s\n", setlocale(LC_TELEPHONE, NULL));
  	fprintf(stderr, "LC_TIME=%s\n", setlocale(LC_TIME, NULL));
  
  	fprintf(stderr, "regcomp(%p, '%s', %d %s %s %s %s)=", preg, regex, cflags, (cflags & REG_EXTENDED) ? "REG_EXTENDED" : "",
  	        (cflags & REG_ICASE) ? "REG_ICASE" : "", (cflags & REG_NEWLINE) ? "REG_NEWLINE" : "", (cflags & REG_NOSUB) ? "REG_NOSUB" : "");
  	int ret = ((int (*)(regex_t * restrict preg, const char * restrict regex, int cflags))dlsym(RTLD_NEXT, __func__))(preg, regex, cflags);
  	fprintf(stderr, "%d\n", ret);
  	return ret;
  }
  $ cc dumpregex.c -ldl -shared -odumpregex.so
  $ LD_PRELOAD=./dumpregex.so git grep '[а-я]'
and indeed
  $ ltrace -esetlocale -e newlocale git grep '[а-я]'
  git->setlocale(LC_CTYPE, "")                                                                              = "en_GB.UTF-8"
  git->setlocale(LC_MESSAGES, "")                                                                           = "en_GB.UTF-8"
  git->setlocale(LC_TIME, "")                                                                               = "en_GB.UTF-8"
  fatal: command line, '[а-я]': Invalid collation character
  +++ exited (status 128) +++

Why on earth would you do this or want that?

Best,
наб
[signature.asc (application/pgp-signature, inline)]

Send a report that this bug log contains spam.


Debian bug tracking system administrator <[email protected]>. Last modified: Tue May 13 09:17:32 2025; Machine Name: bembo

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU General Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.