Debian Bug report logs -
#1037929
git grep fails with "Invalid collation character" for valid collation sequences in []
Reported by: наб <[email protected] >
Date: Wed, 14 Jun 2023 13:30:02 UTC
Severity: normal
Found in version git/1:2.40.1-1
Reply or subscribe to this bug.
Display info messages
Message #5 received at [email protected] (full text , mbox , reply ):
[Message part 1 (text/plain, inline)]
Package: git
Version: 1:2.39.2-1.1
Version: 1:2.40.1-1
Severity: normal
Dear Maintainer,
$ git grep '[а-я]'
fatal: command line, '[а-я]': Invalid collation character
and even
nabijaczleweli@tarta:~/code/voreutils$ locale
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
$ git grep '[а-я]'
fatal: command line, '[а-я]': Invalid collation character
$ LC_ALL=ru_RU.UTF-8 locale
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=ru_RU.UTF-8
$ LC_ALL=ru_RU.UTF-8 git grep '[а-я]'
fatal: command line, '[а-я]': Invalid collation character
This ought to match the entire modern russian alphabet,
and correctly does so under GNU grep and glibc regcomp(3)
on bookworm and sid.
I don't really see how а or я are "invalid" here;
oddly, git grep does accept '[ая]' &c.
Best,
наб
-- System Information:
Debian Release: 12.0
APT prefers stable-security
APT policy: (500, 'stable-security'), (500, 'stable-debug'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 6.1.0-9-amd64 (SMP w/24 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_FIRMWARE_WORKAROUND, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages git depends on:
ii git-man 1:2.39.2-1.1
ii libc6 2.36-9
ii libcurl3-gnutls 7.88.1-10
ii liberror-perl 0.17029-2
ii libexpat1 2.5.0-1
ii libpcre2-8-0 10.42-1
ii perl 5.36.0-7
ii zlib1g 1:1.2.13.dfsg-1
Versions of packages git recommends:
ii ca-certificates 20230311
ii less 590-2
ii openssh-client [ssh-client] 1:9.2p1-2
ii patch 2.7.6-7
Versions of packages git suggests:
ii gettext-base 0.21-12
pn git-cvs <none>
pn git-daemon-run | git-daemon-sysvinit <none>
pn git-doc <none>
pn git-email <none>
pn git-gui <none>
pn git-mediawiki <none>
pn git-svn <none>
pn gitk <none>
pn gitweb <none>
-- no debconf information
[signature.asc (application/pgp-signature, inline)]
Message #10 received at [email protected] (full text , mbox , reply ):
[Message part 1 (text/plain, inline)]
$ cat dumpregex.c
#define _GNU_SOURCE
#include <dlfcn.h>
#include <locale.h>
#include <regex.h>
#include <stdio.h>
int regcomp(regex_t * restrict preg, const char * restrict regex, int cflags) {
fprintf(stderr, "LC_ALL=%s\n", setlocale(LC_ALL, NULL));
fprintf(stderr, "LC_ADDRESS=%s\n", setlocale(LC_ADDRESS, NULL));
fprintf(stderr, "LC_COLLATE=%s\n", setlocale(LC_COLLATE, NULL));
fprintf(stderr, "LC_CTYPE=%s\n", setlocale(LC_CTYPE, NULL));
fprintf(stderr, "LC_IDENTIFICATION=%s\n", setlocale(LC_IDENTIFICATION, NULL));
fprintf(stderr, "LC_MEASUREMENT=%s\n", setlocale(LC_MEASUREMENT, NULL));
fprintf(stderr, "LC_MESSAGES=%s\n", setlocale(LC_MESSAGES, NULL));
fprintf(stderr, "LC_MONETARY=%s\n", setlocale(LC_MONETARY, NULL));
fprintf(stderr, "LC_NAME=%s\n", setlocale(LC_NAME, NULL));
fprintf(stderr, "LC_NUMERIC=%s\n", setlocale(LC_NUMERIC, NULL));
fprintf(stderr, "LC_PAPER=%s\n", setlocale(LC_PAPER, NULL));
fprintf(stderr, "LC_TELEPHONE=%s\n", setlocale(LC_TELEPHONE, NULL));
fprintf(stderr, "LC_TIME=%s\n", setlocale(LC_TIME, NULL));
fprintf(stderr, "regcomp(%p, '%s', %d %s %s %s %s)=", preg, regex, cflags, (cflags & REG_EXTENDED) ? "REG_EXTENDED" : "",
(cflags & REG_ICASE) ? "REG_ICASE" : "", (cflags & REG_NEWLINE) ? "REG_NEWLINE" : "", (cflags & REG_NOSUB) ? "REG_NOSUB" : "");
int ret = ((int (*)(regex_t * restrict preg, const char * restrict regex, int cflags))dlsym(RTLD_NEXT, __func__))(preg, regex, cflags);
fprintf(stderr, "%d\n", ret);
return ret;
}
$ cc dumpregex.c -ldl -shared -odumpregex.so
$ LD_PRELOAD=./dumpregex.so git grep '[а-я]'
and indeed
$ ltrace -esetlocale -e newlocale git grep '[а-я]'
git->setlocale(LC_CTYPE, "") = "en_GB.UTF-8"
git->setlocale(LC_MESSAGES, "") = "en_GB.UTF-8"
git->setlocale(LC_TIME, "") = "en_GB.UTF-8"
fatal: command line, '[а-я]': Invalid collation character
+++ exited (status 128) +++
Why on earth would you do this or want that?
Best,
наб
[signature.asc (application/pgp-signature, inline)]
Send a report that this bug log contains spam .
Debian bug tracking system administrator <[email protected] >.
Last modified:
Tue May 13 09:17:32 2025;
Machine Name:
bembo
Debian Bug tracking system
Debbugs is free software and licensed under the terms of the GNU General
Public License version 2. The current version can be obtained
from https://bugs.debian.org/debbugs-source/ .
Copyright © 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson,
2005-2017 Don Armstrong, and many other contributors.