Debian Bug report logs - #208816
considers UTF-16 files to be binary

version graph

Package: subversion; Maintainer for subversion is James McCoy <[email protected]>; Source for subversion is src:subversion (PTS, buildd, popcon).

Reported by: Wichert Akkerman <[email protected]>

Date: Fri, 5 Sep 2003 10:03:02 UTC

Severity: wishlist

Tags: upstream

Found in version 0.26.0-1

Full log


🔗 View this message in rfc822 format

X-Loop: [email protected]
Subject: Bug#208816: considers UTF-16 files to be binary
Reply-To: Philip Martin <[email protected]>, [email protected]
Resent-From: Philip Martin <[email protected]>
Original-Sender: Philip Martin <[email protected]>
Resent-To: [email protected]
Resent-CC: David Kimdon <[email protected]>
Resent-Date: Wed, 29 Oct 2003 00:33:19 UTC
Resent-Message-ID: <[email protected]>
Resent-Sender: [email protected]
X-Debian-PR-Message: report 208816
X-Debian-PR-Package: subversion
X-Debian-PR-Keywords: 
Received: via spool by [email protected] id=B208816.10673873912161
          (code B ref 208816); Wed, 29 Oct 2003 00:33:19 UTC
Received: (at 208816) by bugs.debian.org; 29 Oct 2003 00:29:51 +0000
Received: from cpc5-flee1-4-0-cust176.glfd.cable.ntl.com (debian2) [81.109.200.176] 
	by master.debian.org with esmtp (Exim 3.35 1 (Debian))
	id 1AEeCU-0000N7-00; Tue, 28 Oct 2003 18:28:38 -0600
Received: from pm by debian2 with local (Exim 3.35 #1 (Debian))
	id 1AEeCP-0007iC-00; Wed, 29 Oct 2003 00:28:33 +0000
To: Wichert Akkerman <[email protected]>, [email protected]
From: Philip Martin <[email protected]>
Date: Wed, 29 Oct 2003 00:28:33 +0000
Message-ID: <[email protected]>
User-Agent: Gnus/5.1002 (Gnus v5.10.2) XEmacs/21.4 (Common Lisp, linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: Philip Martin <[email protected]>
Delivered-To: [email protected]
X-Spam-Status: No, hits=1.0 required=4.0
	tests=BAYES_60
	version=2.53-bugs.debian.org_2003_10_28
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 2.53-bugs.debian.org_2003_10_28 (1.174.2.15-2003-03-30-exp)
When Subversion updates (while local changes are present), merges or
diffs a text file it does so on a line by line basis, with the single
byte '\n' being used to identify line endings.

UTF-16 is a variable width encoding with 2-byte code units.  I'm not
sure what encodings are valid, is it possible for a 0x0a byte to
appear anywhere other than in the last byte of a UTF-16 character?  If
so, then were Subversion to treat a UTF-16 file as text it might split
such characters and so generate an invalid UTF-16 file.

If you want to treat UTF-16 files as text I suggest you configure
Subversion to use external diff/diff3 commands, and choose ones that
do the right thing on UTF-16 files.

-- 
Philip Martin



Send a report that this bug log contains spam.


Debian bug tracking system administrator <[email protected]>. Last modified: Fri May 16 01:12:25 2025; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU General Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.