From owner-ntemacs-users@june Tue Aug 27 17:24:38 1996 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "27" "August" "1996" "16:45:00" "PDT" "George V. Reilly" "georger@microcrafts.com" nil "27" "RE: More ctrl-M stuff" "^From:" nil nil "8" nil nil nil nil] nil) Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.7.5/7.2ju) with SMTP id RAA29214 for ; Tue, 27 Aug 1996 17:24:38 -0700 Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id RAA30852 for ; Tue, 27 Aug 1996 17:24:37 -0700 Received: from halcyon.com (smtp2.halcyon.com [198.137.231.18]) by june.cs.washington.edu (8.7.5/7.2ju) with SMTP id QAA25586 for ; Tue, 27 Aug 1996 16:46:01 -0700 Received: from ms-smtp.wa.com by halcyon.com with SMTP id AA11191 (5.65c/IDA-1.4.4 for ); Tue, 27 Aug 1996 16:46:00 -0700 Received: by ms-smtp.wa.com with Microsoft Mail id <32238978@ms-smtp.wa.com>; Tue, 27 Aug 96 16:49:12 PDT Message-Id: <32238978@ms-smtp.wa.com> Encoding: 27 TEXT X-Mailer: Microsoft Mail V3.0 From: "George V. Reilly" To: ntemacs-users Subject: RE: More ctrl-M stuff Date: Tue, 27 Aug 96 16:45:00 PDT The solution that Vim uses, which works well in practice, is to have two variables, textauto (global) and textmode (buffer-local). If textmode is set, a file is written with DOS-style (CR-LF line separators); if it's off, the file is written with Unix-style (LF line separators). By default, textmode is set on all new buffers for DOS-like systems (DOS, OS/2, Win32) and cleared on all other systems. If textauto is set, then textmode is set for a buffer when a file is read in which has every line separated by CR-LFs and cleared otherwise. In either case, the file looks fine on screen. If you edit and write a file, the line separator settings will remain the same unless you explicitly override them. This is something I find very annoying with NT Emacs---especially when diffing a modified file against an original file which came from Unix and having diff report the whole file has changed. If the file has non-standard separator settings for the OS (e.g., LFs on NT), you'll see a note about it in the message line. -- /George V. Reilly MicroCrafts, Inc., 17371 NE 67th Ct #205, Redmond, WA 98052, USA. Tel: +1 206/250-0014 Fax: 206/250-0100 Web: www.microcrafts.com Vim 4 (vi clone) for NT & Windows 95: http://www.halcyon.com/gvr/ pgp fingerprint: e2 b4 83 64 11 52 21 ea bf d8 51 c2 11 00 78 fc From owner-ntemacs-users@june Fri Nov 1 08:18:26 1996 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Fri" " 1" "November" "1996" "16:34:38" "+0100" "Frederic Corne" "frederic.corne@erli.fr" "<9611011534.AA07747@orme.sunserv>" "28" "Pb of crlf with Samba and untranslate" "^From:" nil nil "11" nil nil nil nil] nil) Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.7.6/7.2ju) with SMTP id IAA10826 for ; Fri, 1 Nov 1996 08:18:26 -0800 Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id IAA25748 for ; Fri, 1 Nov 1996 08:18:24 -0800 Received: from polaris.gsi.fr (polaris.gsi.fr [150.175.128.2]) by june.cs.washington.edu (8.7.6/7.2ju) with ESMTP id HAA08122 for ; Fri, 1 Nov 1996 07:34:55 -0800 Received: from erli.fr ([150.175.65.76]) by polaris.gsi.fr (8.7.3/8.6.12) with SMTP id QAA04907 for ; Fri, 1 Nov 1996 16:35:53 +0100 (MET) Received: from orme.sunserv by erli.fr (4.1/SMI-4.1) id AA19201; Fri, 1 Nov 96 16:34:40 +0100 Received: by orme.sunserv (5.x/SMI-SVR4) id AA07747; Fri, 1 Nov 1996 16:34:38 +0100 Message-Id: <9611011534.AA07747@orme.sunserv> Reply-To: frederic.corne@erli.fr From: Frederic Corne To: ntemacs-users@cs.washington.edu Subject: Pb of crlf with Samba and untranslate Date: Fri, 1 Nov 1996 16:34:38 +0100 NOTE : This is a repost. It seems my previous message was lost. I have installed Samba 1.9.16p7 on my unix box and I use untranslate.el with emacs19.31.1 on my NT machine. (load "untranslate") (add-untranslated-filesystem "E:") at the top of my .emacs file When I read and write a simple file ( for ex a README file) all are OK. No crlf before and after. But when the file is of a particular mode (c, c++, text, ...) the read is correct ( no crlf) but when I save the file after modification, crlf is added. Any idea ? FC -- **** Frederic CORNE GSI-ERLI frederic.corne@erli.fr **** From da@dcs.ed.ac.uk Wed Jan 22 04:21:09 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Wed" "22" "January" "1997" "12:20:18" "+0000" "David Aspinall" "da@dcs.ed.ac.uk" "<199701221221.EAA09932@june.cs.washington.edu>" "35" "Re: DOS (text) mode" "^From:" nil nil "1" nil nil nil nil] nil) Received: from rainich.dcs.ed.ac.uk (rainich.dcs.ed.ac.uk [129.215.160.105]) by june.cs.washington.edu (8.8.3+CSE/7.2ju) with ESMTP id EAA09932 for ; Wed, 22 Jan 1997 04:21:04 -0800 Message-Id: <199701221221.EAA09932@june.cs.washington.edu> Received: from INVOKE.demon.co.uk (actually host modem3.dcs.ed.ac.uk) by rainich.dcs.ed.ac.uk with SMTP (PP); Wed, 22 Jan 1997 12:19:57 +0000 X-Mailer: emacs 19.34.1 (via feedmail 3 Q) In-Reply-To: <199701220756.XAA25816@joker.cs.washington.edu> References: <199701151430.GAA18597@june.cs.washington.edu> <199701220756.XAA25816@joker.cs.washington.edu> From: David Aspinall To: voelker@cs.washington.edu (Geoff Voelker) Cc: da@dcs.ed.ac.uk Subject: Re: DOS (text) mode Date: Wed, 22 Jan 1997 12:20:18 +0000 > I'm unfamiliar with format-alist; what support is missing? format-alist: "List of information about understood file formats." I think it was added to deal with enriched mode where text properties are saved to the file. I don't know much about it --- I just read the doc string. From that it seems as if it might cope nicely with DOS text files, if a regular expression could be used to match the start of a file. (If not, perhaps format-alist could be extended to use a regexp or a function argument). Then it will automatically call hooks to encode and decode the buffer. I don't think this would add anything new to existing mechanisms (whether the built-in handling of binary files, or the "DOS" minor mode), but since Emacs now provides a hook for decoding different file formats it might seem wise to integrate with it? After discussions on the list about various DOS translation ideas I thought I should mention this variable. Personally I dislike the current mechanism: I would rather that files were handled in "binary" mode by default, and only in DOS-text mode if they can be deduced to be in DOS-text mode when visited. (Perhaps some file extensions should trigger DOS-text mode, but I am not convinced). There should be an easy way to switch to DOS-text mode, just as with enriched mode. I think this would be a nice behaviour for those of us that use mixed text-formats; for people who use only DOS-text, perhaps there could be a variable to enable the current DOS-loving behaviour. - David. From waider@autodealing.com Wed Mar 5 04:25:54 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" " 5" "March" "1997" "11:24" "GMT" "Ronan Waide" "waider@autodealing.com" nil "20" "bug in load from ange-ftp directory?" "^From:" nil nil "3" nil nil nil nil] nil) Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id EAA15893 for ; Wed, 5 Mar 1997 04:25:48 -0800 Received: from mail (gate.autodealing.com [194.125.131.131]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with SMTP id DAA04609 for ; Wed, 5 Mar 1997 03:59:38 -0800 (PST) Received: from waider.cognotec.com by mail with smtp (Smail3.1.29.1 #3) id m0w2EoI-002mKGC; Wed, 5 Mar 97 11:24 GMT Message-Id: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Organization: AutoDealing Software, Ltd. From: Ronan Waide To: Geoff Voelker , Andrew Innes Subject: bug in load from ange-ftp directory? Date: Wed, 5 Mar 97 11:24 GMT Hiho, I'm using the recent patched version of emacs 19.34 on win95 at the moment. In an attempt to consolidate disparate emacs src and lib directories, I've put a lot of stuff on a local ftp-able machine, and I load it from there. However, emacs seems to have some trouble loading .elc files via the ftp link; it successfully downloads them to the local drive, but then fails to load them, usually complaining of a missing bracket. Doing a find-file followed by eval-current-buffer works fine, however. I suspect it may be loading the downloaded file in text-mode, since ange-ftp creates the downloaded file as a temporary file with no extension. Could either of you confirm this suspicion? Regards, Waider. I'll try hacking ange-ftp-load (again!) in the meantime. -- waider@autodealing.com / AutoDealing Software Ltd / +353-1-6766455 Never attribute to malloc that which can be adequately explained by stupidity From owner-ntemacs-users@trout Tue Apr 8 06:12:18 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 8" "April" "1997" "13:29:59" "+0100" "Andrew Innes" "andrewi@harlequin.co.uk" nil "67" "Re: Attachments via ange-ftp" "^From:" nil nil "4" nil nil nil nil] nil) Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id GAA04047 for ; Tue, 8 Apr 1997 06:12:17 -0700 Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id GAA30228 for ; Tue, 8 Apr 1997 06:12:17 -0700 Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id FAA27051 for ; Tue, 8 Apr 1997 05:31:51 -0700 (PDT) Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA03036 for ; Tue, 8 Apr 1997 05:31:48 -0700 Received: from propos.long.harlequin.co.uk (propos.long.harlequin.co.uk [193.128.93.50]) by holly.cam.harlequin.co.uk (8.8.4/8.7.3) with ESMTP id NAA01533; Tue, 8 Apr 1997 13:30:46 +0100 (BST) Received: from elan.long.harlequin.co.uk (elan.long.harlequin.co.uk [193.128.93.78]) by propos.long.harlequin.co.uk (8.8.4/8.6.12) with SMTP id NAA29309; Tue, 8 Apr 1997 13:29:59 +0100 (BST) Message-Id: <199704081229.NAA29309@propos.long.harlequin.co.uk> In-reply-to: (message from Kyle Jones on Tue, 1 Apr 1997 21:34:19 -0500 (EST)) From: Andrew Innes To: kyle_jones@wonderworks.com CC: gray@austin.apc.slb.com, info-vm@uunet.uu.net, ntemacs-users@cs.washington.edu Subject: Re: Attachments via ange-ftp Date: Tue, 8 Apr 1997 13:29:59 +0100 (BST) On Tue, 1 Apr 1997 13:42:36 -0600, gray@austin.apc.slb.com (Douglas Gray Stephens) said: >I suspect that my problem is PC related, but I'm not sure if it >can/should be fixed in VM, or nt-emacs, hence I'm cross posting this >to ntemacs-users@cs.washington.edu to see if the nt-emacs side have >any suggestions. Yes, this problem is PC specific (for the most part). On Tue, 1 Apr 1997 21:34:19 -0500 (EST), Kyle Jones said: >Douglas Gray Stephens writes: >>[...] >>This ^M will be causing vm to encode the message in base64. >> >>I am not sure why you've used >>insert-file-contents-literally >>instead of >>insert-file-contents > >To avoid problems with file handlers uncompressing or otherwise >fiddling with the input. Maybe this is the wrong thing to do. >I'm willing to switch to insert-file-contents and see if that >works better. Given that we are talking about including files as MIME attachments, I think using insert-file-contents-literally is, in principle, the right thing to do; the "problem" in this context is that it disables the (imperfect) file type detection code used on Windows as well as inhibiting the various handlers and hook functions. Strictly speaking, if the original file uses DOS line endings, then that is what should be transmitted (in base64 encoding if required). However, if it is simply a plain text file, it would generally be more helpful to treat it as such, and convert it to whatever line ending convention is most suitable - in this case, convert to Unix line endings so that the contents are transmitted in the clear. So, although insert-file-contents-literally is strictly correct, in this instance it would be more helpful to use a modified version which only inhibits the handlers and hook functions, but leaves the file type code in place. Such a change should be safe to make, since it will only affect Windows where it will generally do the right thing. Aside: The whole issue of how text files are handled, by the DOS and Windows ports of Emacs at least, is really overdue for a major rethink. The current method for determining whether a file is text (implicitly meaning DOS text) or binary is based on regular expression matching against the file name. This leads to all sorts of hassles, most of which could be easily avoided by using a simple content scanning heuristic to identify whether a file is text or binary, and the line ending convention (DOS, Mac, Unix) if text. Personally, I would like to see this heuristic incorporated into Emacs (on all platforms, not just DOS and Windows) - it would make editing and manipulating text files from different sources mostly transparent. I don't know how likely it is this will happen though, since the Mule capabilities currently being added to Emacs (which must deal with the more general language/charset encoding properties of files and other data streams) will probably subsume this issue, and may do so in a completely different and more general way. Still, I expect that the line ending convention is usually orthogonal to charset encoding, so maybe there is a chance to do this anyway. AndrewI From owner-ntemacs-users@trout Tue Apr 8 09:48:12 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 8" "April" "1997" "12:06:41" "-0400" "John R. Dennis" "jdennis@ultranet.com" nil "80" "Re: Attachments via ange-ftp" "^From:" nil nil "4" nil nil nil nil] nil) Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA18123 for ; Tue, 8 Apr 1997 09:48:11 -0700 Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id JAA30317 for ; Tue, 8 Apr 1997 09:48:10 -0700 Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id JAA02649 for ; Tue, 8 Apr 1997 09:06:54 -0700 (PDT) Received: from cinna.ultra.net (cinna.ultra.net [199.232.56.8]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA14789 for ; Tue, 8 Apr 1997 09:06:52 -0700 Received: from DAKOTA (d9.dial-3.wor.ma.ultra.net [146.115.69.73]) by cinna.ultra.net (8.8.5/ult1.04) with SMTP id MAA04163; Tue, 8 Apr 1997 12:06:41 -0400 (EDT) Message-Id: <199704081606.MAA04163@cinna.ultra.net> In-reply-to: <199704081229.NAA29309@propos.long.harlequin.co.uk> (message from Andrew Innes on Tue, 8 Apr 1997 13:29:59 +0100 (BST)) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII From: "John R. Dennis" To: andrewi@harlequin.co.uk, John Dennis CC: kyle_jones@wonderworks.com, gray@austin.apc.slb.com, ntemacs-users@cs.washington.edu Subject: Re: Attachments via ange-ftp Date: Tue, 8 Apr 1997 12:06:41 -0400 (EDT) >>>>> "Andrew" == Andrew Innes writes: Andrew> Given that we are talking about including files as MIME Andrew> attachments, I think using insert-file-contents-literally Andrew> is, in principle, the right thing to do; the "problem" in Andrew> this context is that it disables the (imperfect) file type Andrew> detection code used on Windows as well as inhibiting the Andrew> various handlers and hook functions. Andrew> The whole issue of how text files are handled, by the DOS Andrew> and Windows ports of Emacs at least, is really overdue for Andrew> a major rethink. I cannot believe how topical this issue is. I just spent all Friday morning debugging a similar problem in mime.el. Even though I had set all the variables I knew of that caused CRLF translation when inserting into a buffer... (let ((start (point)) (emx-binary-mode t) ;Stop LF to CRLF conversion in OS/2 (buffer-file-type t) ;Stop LF to CRLF conversion in DOS/NT (binary-process-input t)) ;Stop LF to CRLF conversion in DOS/NT the conversion was still happening because in fileio.c the implementation of insert-file-contents overwrites the user supplied value of buffer-file-type: current_buffer->buffer_file_type = call1 (Qfind_buffer_file_type, filename); The elisp code knew it wanted to insert the contents of the file as binary so it explicitly set buffer-file-type, but the implementation of insert-file-contents ignored that setting and tried to determine the translation mode by a regular expression match on the filename. I fixed the problem by calling insert-file-contents-literally which undefines find-buffer-file-type so the call in insert-file-contents to find-buffer-file-type won't succeed. But I don't think the C code in insert-buffer-contents should ignore the documented variable (buffer-file-type) that is supposed to toggle the CRLF translation! All of this is pretty ugly, prone to failure, and more to the point undocumented for the most part as far as I can tell. After spending the better part of day digging through the binary vs. text issues I was left with the distinct impression that most of this code is a "hack" waiting to break. I absolutely agree with Andrew that this is in need of a major rethink. To begin the discussion I will make the following observations: * Determining binary/text based on regular expression matching of filenames is fundamentally flawed. There is not enough naming discipline with filenames and extensions to make this work reliably. I have been burned by this more times than I care to remember. * The only way to tell if a file is binary is to scan the file and look for non-ascii bytes. * The documentation on the text/binary issues is woefully inadequate and the implementation is inconsistent. * The binary/text translation should be controlled by a user settable variable that is ALWAYS respected. After all, the user is ultimately more knowledgable about the contents of a file than the implementation. * There should be second user settable variable that toggles whether translation variable is automatically set based on the contents of the file. In this way you get automatic translation in the 99% of the cases you want it AND you can force the translation on/off when you have to. * We'ed all be happier without operating systems that make the artifical distinction between text and binary files and attempts to insert/delete/modify bytes that are not in the actual file to undo the damage introduced by this ill-conceived distinction in the first place (sorry, this last point was a completely personal soapbox comment :-) John Dennis From owner-ntemacs-users@trout Wed Mar 26 09:53:47 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "26" "March" "1997" "09:07:47" "-0800" "Don Erway" "derway@ndc.com" nil "27" "Re: > toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil] nil) Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA25357 for ; Wed, 26 Mar 1997 09:53:47 -0800 Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id JAA23324 for ; Wed, 26 Mar 1997 09:53:46 -0800 Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id JAA12324 for ; Wed, 26 Mar 1997 09:07:50 -0800 (PST) Received: from maya.ndc.com (maya.ndc.com [192.101.92.41]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA22697 for ; Wed, 26 Mar 1997 09:07:49 -0800 Received: from heidi.ndc-new.com (heidi [192.101.92.15]) by maya.ndc.com (8.7.5/8.7.3) with SMTP id JAA12674 for ; Wed, 26 Mar 1997 09:06:16 -0800 (PST) Received: from HAL.ndc.com by heidi.ndc-new.com (SMI-8.6/SMI-SVR4) id JAA13517; Wed, 26 Mar 1997 09:07:47 -0800 Message-Id: <199703261707.JAA13517@heidi.ndc-new.com> In-reply-to: <199703261320.AA11627@lambda.unx.sas.com> (message from David Biesack on Wed, 26 Mar 1997 08:20:33 -0500) Mime-Version: 1.0 (generated by tm-edit 7.92) Content-Type: text/plain; charset=US-ASCII From: Don Erway To: ntemacs-users@cs.washington.edu Subject: Re: > toggle binary/text mode of current buffer Date: Wed, 26 Mar 1997 09:07:47 -0800 >>>>> "db" == David Biesack writes: db> suggested: db> (defvar binary-mode-distance 500 db> "Number of characters to search for CR/LF when looking for a binary file.") db> (defun check-buffer-file-type (filename) db> (if (and (looking-at ".*\r\n") ;; It has CR-LF sequence db> ;; and has no LF w/o CR within sight db> (not (re-search-forward "[^\r]\n]" binary-mode-distance t))) db> nil ;; so use text mode db> t)) ;; else use binary mode This works fine. However, auto detection still does not work under unix. I am running 19.32 on NT, and 19.33 on Solaris. In the 19.33 solaris version, there is no file-name-buffer-file-type-alist defined. So without this alist, and some code to process it, there is no surprise that it doesn't work. Is the idea to use winnt.el even when running on unix? If I load winnt.el into the unix version, it complains that the set-message-beep function doesn't exist. But I can always work around that if this is even the right approach. Don From owner-ntemacs-users@trout Wed Mar 26 06:03:34 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "26" "March" "1997" "08:20:33" "-0500" "David Biesack" "sasdjb@unx.sas.com" nil "41" "> toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil] nil) Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id GAA14317 for ; Wed, 26 Mar 1997 06:03:33 -0800 Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id GAA17020 for ; Wed, 26 Mar 1997 06:03:32 -0800 Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id FAA07763 for ; Wed, 26 Mar 1997 05:20:43 -0800 (PST) Received: from lamb.sas.com (lamb.sas.com [192.35.83.8]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id FAA13537 for ; Wed, 26 Mar 1997 05:20:41 -0800 Received: from mozart by lamb.sas.com (5.65c/SAS/Gateway/01-23-95) id AA11423; Wed, 26 Mar 1997 08:20:39 -0500 Received: from lambda.unx.sas.com by mozart (5.65c/SAS/Domains/5-6-90) id AA21315; Wed, 26 Mar 1997 08:20:33 -0500 Received: by lambda.unx.sas.com (5.65c/SAS/Generic 9.01/3-26-93) id AA11627; Wed, 26 Mar 1997 08:20:33 -0500 Message-Id: <199703261320.AA11627@lambda.unx.sas.com> In-Reply-To: <199703252229.OAA06160@sampras.isi.com> (message from Kin Cho on Tue, 25 Mar 1997 14:29:49 -0800) From: David Biesack To: ntemacs-users@cs.washington.edu Subject: > toggle binary/text mode of current buffer Date: Wed, 26 Mar 1997 08:20:33 -0500 > ;;; This examines the actual contents of the loaded file to see if > ;;; it should use text mode or binary: > (defun check-buffer-file-type (filename) > (if (and (looking-at ".*\r\n") ;; It has CR-LF sequence > (not (search-forward "[^\r]\n]" nil t))) ;; and has no LF w/o CR > nil ;; so use text mode > t)) ;; else use binary mode Someone else pointed out that the search-forward should be a re-search-forward. However, also note that passing nil to the search will cause inspection of the entire buffer, which is not always negligible. It might be better to make this a variable as is done in dos-mode.el ;;; LCD Archive Entry: ;;; dos-mode|Andy Norman|ange@hplb.hpl.hp.com ;;; |MSDOS minor mode for GNU Emacs ;;; |$Date: 2001/02/13 00:53:57 $|$Revision: 1.1 $| which passes (min (point-max) dos-mode-distance) to re-search-forward where (defvar dos-mode-distance 200 "Number of characters to search for RETURN when looking for a DOS file.") to determine if a file is in DOS CR/LF mode. You can change dos-mode-distance to 1000 or some other reasonable value in your .emacs suggested: (defvar binary-mode-distance 500 "Number of characters to search for CR/LF when looking for a binary file.") (defun check-buffer-file-type (filename) (if (and (looking-at ".*\r\n") ;; It has CR-LF sequence ;; and has no LF w/o CR within sight (not (re-search-forward "[^\r]\n]" binary-mode-distance t))) nil ;; so use text mode t)) ;; else use binary mode From owner-ntemacs-users@trout Tue Mar 25 18:23:53 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "25" "March" "1997" "20:20:21" "-0500" "Geoff Odhner" "odhner@recom.com" nil "69" "Re: toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil] nil) Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id SAA20274 for ; Tue, 25 Mar 1997 18:23:53 -0800 Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id SAA17699 for ; Tue, 25 Mar 1997 18:23:51 -0800 Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id RAA27386 for ; Tue, 25 Mar 1997 17:19:46 -0800 (PST) Received: from recom.recom.com (freeholders.co.camden.nj.us [204.213.88.1]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id RAA15955 for ; Tue, 25 Mar 1997 17:19:46 -0800 Received: from odhner (dial31.mt-holly.emanon.net [204.213.88.131]) by recom.recom.com (8.6.12/8.6.9) with SMTP id UAA02882; Tue, 25 Mar 1997 20:25:16 -0500 Message-ID: <333879D5.2FEC@recom.com> X-Mailer: Mozilla 2.01Gold (Win95; I) MIME-Version: 1.0 References: <199703241934.LAA10013@heidi.ndc-new.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Geoff Odhner To: Don Erway CC: kin@isi.com, ntemacs-users@cs.washington.edu Subject: Re: toggle binary/text mode of current buffer Date: Tue, 25 Mar 1997 20:20:21 -0500 Don Erway wrote: > > The one funny is that files to which I have only read-only access come up as > > writeable. > > I spoke too soon. It appears that the visiting read-only files works on NT, > but under unix, a read-only file is not translated correctly. > check-buffer-file-type works, but translation does not occur. > > On NT, the files do get translated, and do not come up as writeable. Try my latest version. It should address this problem. It works on win95, but I haven't yet tested it on unix, though I'm expecting no problem. BTW, one caveat about using this on unix: Though this code works to toggle the buffer type, the mode line indicator doesn't work on unix, at least not on SunOS. If you add the mode line %t indicator, it always indicates T on the mode line. I expect that requires a fix to the C code and a recompile. I guess they figured noone would ever use it on unix. :-) Happy editing... -Geoff And here's the new version, as promised: ;;; If you have loaded a file as binary that actually has the ^M's in it, ;;; then switching to text mode will remove them in the buffer. Of course ;;; now that it's in text mode, it will save with the ^M's inserted. ;;; Switching to binary mode does NOT have a reverse effect. If you want ;;; to disable that change on entering text mode, then use a negative ;;; prefix argument, as described below. ;;; A prefix argument will force the mode change in a particular ;;; direction. A positive prefix argument forces it to binary. A zero ;;; prefix argument forces text mode allowing the removal of ^M's (only ;;; preceding ^J's). A negative prefix argument forces text mode ;;; disallowing the removal of ^M's. ;;; When the mode is changed the state of modification of the buffer is ;;; preserved, even if the ^M's are removed. (defun toggle-buffer-file-type (arg) "Alternate value of buffer-file-type" (interactive "P") (let ((old buffer-file-type) (mod (buffer-modified-p)) (buffer-read-only nil)) (setq buffer-file-type (if arg (>= arg 1) (not buffer-file-type))) (if (and old (not buffer-file-type) (or (not arg) (> arg -2))) (save-excursion (beginning-of-buffer) (while (search-forward "\r\n" nil t) (replace-match "\n" nil t)) (set-buffer-modified-p mod)))) (force-mode-line-update)) ;; And my preferred key binding: (global-set-key [?\A-t] 'toggle-buffer-file-type) From kin@isi.com Tue Mar 25 14:28:47 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "25" "March" "1997" "14:29:49" "-0800" "Kin Cho" "kin@isi.com" nil "30" "Re: toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil] nil) Received: from sampras.isi.com (sampras.isi.com [192.103.53.29]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id OAA03847; Tue, 25 Mar 1997 14:28:47 -0800 Received: (from kin@localhost) by sampras.isi.com (8.6.10/8.6.10) id OAA06160; Tue, 25 Mar 1997 14:29:49 -0800 Message-Id: <199703252229.OAA06160@sampras.isi.com> In-reply-to: <33355BAF.2353@recom.com> (message from Geoff Odhner on Sun, 23 Mar 1997 11: 34:55 -0500) From: Kin Cho To: odhner@recom.com, voelker@cs.washington.edu CC: derway@ndc.com, ntemacs-users@cs.washington.edu Subject: Re: toggle binary/text mode of current buffer Date: Tue, 25 Mar 1997 14:29:49 -0800 Thanks, this is good! A real solution as compared to the workarounds that came before. If only this works in UNIX as well! Please put it in the FAQ, or even better, integrate it with main line code. -kin p.s., this is my mod: (list (cons "" 'check-buffer-file-type)))) ;;; Associate the universal match regexp "" with the ;;; function check-buffer-file-type, so any file will be ;;; examined to automatically select the appropriate mode. ;;; Add this check only after known filename patterns are ;;; treated the way they should be. (That's why we append ;;; to the list, instead of replacing it). You might want ;;; to use more more restrictive pattern(s) for doing this ;;; check. (setq file-name-buffer-file-type-alist (append file-name-buffer-file-type-alist (list (cons "" 'check-buffer-file-type)))) ;;; This examines the actual contents of the loaded file to see if ;;; it should use text mode or binary: (defun check-buffer-file-type (filename) (if (and (looking-at ".*\r\n") ;; It has CR-LF sequence (not (search-forward "[^\r]\n]" nil t))) ;; and has no LF w/o CR nil ;; so use text mode t)) ;; else use binary mode From owner-ntemacs-users@trout Sun Mar 23 09:13:15 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sun" "23" "March" "1997" "11:34:55" "-0500" "Geoff Odhner" "odhner@recom.com" nil "33" "Re: toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil] nil) Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA20826 for ; Sun, 23 Mar 1997 09:13:15 -0800 Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id JAA26846 for ; Sun, 23 Mar 1997 09:13:13 -0800 Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id IAA08257 for ; Sun, 23 Mar 1997 08:34:38 -0800 (PST) Received: from recom.recom.com (freeholders.co.camden.nj.us [204.213.88.1]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id IAA19986 for ; Sun, 23 Mar 1997 08:34:37 -0800 Received: from odhner (dial15.mt-holly.emanon.net [204.213.88.115]) by recom.recom.com (8.6.12/8.6.9) with SMTP id LAA11941; Sun, 23 Mar 1997 11:39:44 -0500 Message-ID: <33355BAF.2353@recom.com> X-Mailer: Mozilla 2.01Gold (Win95; I) MIME-Version: 1.0 References: <199703222055.MAA14453@heidi.ndc-new.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Geoff Odhner To: Don Erway CC: kin@isi.com, ntemacs-users@cs.washington.edu Subject: Re: toggle binary/text mode of current buffer Date: Sun, 23 Mar 1997 11:34:55 -0500 Don Erway wrote: > Finally, it needs an auto option, to make it possible to > automatically go into text mode if the content is strictly > text and crlfs are already present in a file. This should > not be based on file name extensions or file systems, but > only on file content. I have written a few more bits of code that help automate binary/text mode selection. This approach is possible due to the Geoff Voelker's foresight in designing the infrastructure to be configurable in this way. Thanks, Geoff. ;;; Associate the universal match regexp "" with the ;;; function check-buffer-file-type, so any file will be ;;; examined to automatically select the appropriate mode. ;;; Add this check only after known filename patterns are ;;; treated the way they should be. (That's why we append ;;; to the list, instead of replacing it). You might want ;;; to use more more restrictive pattern(s) for doing this ;;; check. (setq file-name-buffer-file-type-alist (append file-name-buffer-file-type-alist (cons "" 'check-buffer-file-type))) ;;; This examines the actual contents of the loaded file to see if ;;; it should use text mode or binary: (defun check-buffer-file-type (filename) (if (and (looking-at ".*\r\n") ;; It has CR-LF sequence (not (search-forward "[^\r]\n]" nil t))) ;; and has no LF w/o CR nil ;; so use text mode t)) ;; else use binary mode From owner-ntemacs-users@trout Sat Mar 22 13:28:17 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sat" "22" "March" "1997" "12:55:31" "-0800" "Don Erway" "derway@ndc.com" nil "23" "Re: toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil] nil) Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id NAA14305 for ; Sat, 22 Mar 1997 13:28:17 -0800 Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id NAA23216 for ; Sat, 22 Mar 1997 13:28:15 -0800 Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id MAA24518 for ; Sat, 22 Mar 1997 12:56:39 -0800 (PST) Received: from maya.ndc.com (maya.ndc.com [192.101.92.41]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA13538 for ; Sat, 22 Mar 1997 12:56:38 -0800 Received: from heidi.ndc-new.com (heidi [192.101.92.15]) by maya.ndc.com (8.7.5/8.7.3) with SMTP id MAA05675; Sat, 22 Mar 1997 12:54:00 -0800 (PST) Received: from HAL.ndc.com by heidi.ndc-new.com (SMI-8.6/SMI-SVR4) id MAA14453; Sat, 22 Mar 1997 12:55:31 -0800 Message-Id: <199703222055.MAA14453@heidi.ndc-new.com> In-reply-to: <33343F98.2A7@recom.com> (message from Geoff Odhner on Sat, 22 Mar 1997 15:22:48 -0500) Mime-Version: 1.0 (generated by tm-edit 7.92) Content-Type: text/plain; charset=US-ASCII From: Don Erway To: odhner@recom.com CC: kin@isi.com, ntemacs-users@cs.washington.edu Subject: Re: toggle binary/text mode of current buffer Date: Sat, 22 Mar 1997 12:55:31 -0800 This is good. I can now happily make everything binary by default, and use your toggle funciton for the few cases it is really needed. This is better than using crypt's DOS mode, because it is faster. Now, if only it would work in unix emacs, we could completely share files either way. Finally, it needs an auto option, to make it possible to automatically go into text mode if the content is strictly text and crlfs are already present in a file. This should not be based on file name extensions or file systems, but only on file content. Thanks for the useful hack. Don Don Erway derway@ndc.com NDC Systems 818-939-3847 5314 N. Irwindale Ave Fax:939-3870 Irwindale, CA, 91706 From owner-ntemacs-users@trout Sat Mar 22 13:01:33 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sat" "22" "March" "1997" "15:22:48" "-0500" "Geoff Odhner" "odhner@recom.com" nil "52" "Re: toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil] nil) Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id NAA13680 for ; Sat, 22 Mar 1997 13:01:33 -0800 Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id NAA25772 for ; Sat, 22 Mar 1997 13:01:31 -0800 Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id MAA23716 for ; Sat, 22 Mar 1997 12:22:12 -0800 (PST) Received: from recom.recom.com (recom.recom.com [204.213.88.1]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id MAA12359 for ; Sat, 22 Mar 1997 12:22:12 -0800 Received: from odhner (dial5.mt-holly.emanon.net [204.213.88.105]) by recom.recom.com (8.6.12/8.6.9) with SMTP id PAA01331; Sat, 22 Mar 1997 15:27:36 -0500 Message-ID: <33343F98.2A7@recom.com> X-Mailer: Mozilla 2.01Gold (Win95; I) MIME-Version: 1.0 References: <199703212025.MAA03727@sampras.isi.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Geoff Odhner To: Kin Cho CC: ntemacs-users@cs.washington.edu Subject: Re: toggle binary/text mode of current buffer Date: Sat, 22 Mar 1997 15:22:48 -0500 Kin Cho wrote: > > Is that a function that does this? > Trying to work around yet another PC<->UNIX integration problem. > > Thanks. > > -kin I have yet another version of my toggle-buffer-file-type function. This one updates the status bar, which is necessary if you bind it to a key. -Geoff ;;; If you have loaded a file as binary that actually has the ^M's in it, ;;; then switching to text mode will remove them in the buffer. Of course ;;; now that it's in text mode, it will save with the ^M's inserted. ;;; Switching to binary mode does NOT have a reverse effect. If you want ;;; to disable that change on entering text mode, then use a negative ;;; prefix argument, as described below. ;;; A prefix argument will force the mode change in a particular ;;; direction. A positive prefix argument forces it to binary. A zero ;;; prefix argument forces text mode allowing the removal of ^M's (only ;;; preceding ^J's). A negative prefix argument forces text mode ;;; disallowing the removal of ^M's. ;;; When the mode is changed the state of modification of the buffer is ;;; preserved, even if the ^M's are removed. (defun toggle-buffer-file-type (arg) "Alternate value of buffer-file-type" (interactive "P") (let ((old buffer-file-type) (mod (buffer-modified-p))) (setq buffer-file-type (if arg (>= arg 1) (not buffer-file-type))) (if (and old (not buffer-file-type) (or (not arg) (> arg -2))) (save-excursion (beginning-of-buffer) (while (search-forward "\r\n" nil t) (replace-match "\n" nil t)) (set-buffer-modified-p mod)))) (force-mode-line-update)) ;;; Here's my personal selection for a key binding for this function: (global-set-key [?\A-t] 'toggle-buffer-file-type) From andrewi@harlequin.co.uk Tue Apr 15 06:15:20 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "15" "April" "1997" "14:14:36" "+0100" "Andrew Innes" "andrewi@harlequin.co.uk" nil "38" "Questions about MULE" "^From:" nil nil "4" nil nil nil nil] nil) Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id GAA22624 for ; Tue, 15 Apr 1997 06:15:19 -0700 Received: from propos.long.harlequin.co.uk (propos.long.harlequin.co.uk [193.128.93.50]) by holly.cam.harlequin.co.uk (8.8.4/8.7.3) with ESMTP id OAA28494; Tue, 15 Apr 1997 14:15:10 +0100 (BST) Received: from elan.long.harlequin.co.uk (elan.long.harlequin.co.uk [193.128.93.78]) by propos.long.harlequin.co.uk (8.8.4/8.6.12) with SMTP id OAA25514; Tue, 15 Apr 1997 14:14:36 +0100 (BST) Message-Id: <199704151314.OAA25514@propos.long.harlequin.co.uk> In-reply-to: <199704150439.AAA16856@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Tue, 15 Apr 1997 00:39:58 -0400) From: Andrew Innes To: rms@gnu.ai.mit.edu cc: voelker@cs.washington.edu Subject: Questions about MULE Date: Tue, 15 Apr 1997 14:14:36 +0100 (BST) On Tue, 15 Apr 1997 00:39:58 -0400, Richard Stallman said: >I see nothing problematical in these changes. >The ones that have to do with cr conversion will have to be >redone totally differently for the next Emacs release, though, >because MULE affects this very much. I am only dimly aware of what the MULE work for 19.35 entails, so if you have time I would like to ask a few questions about it. Since the issue of DOS vs Unix line ending conventions for text files is currently handled poorly in 19.34 (in the DOS and Windows ports), there has been a fair bit of discussion recently on the ntemacs-users mailing list about possible mechanisms for improving this in the future. This applies primarily in the context of working with a mixture of text files using both line ending conventions. The main difficulties at present are that Emacs doesn't, in general, correctly identify text vs. binary files, and for text files doesn't remember which line ending convention was used. The general thrust of suggestions for improvement is to implement some kind of mostly-automatic mechanism to detect which files are text, and remember the line ending convention in use (DOS, Unix or possibly Mac). Obvious heuristics based on scanning the first part of each file when loaded for "funny" characters could be used. More sophisticated extensions which detect mistakes in the assumed format follow on from that. I know this issue overlaps somewhat with the more general language and character encoding issues that are handled by MULE, but I'm not sure how exactly. Is there any documentation about MULE, as being implemented in 19.35, that I could read? It seems to me that the line ending convention employed by text files is often orthogonal to the character encoding convention (at least for single-and multi-byte encoding, and for Unicode as well after allowing for wider characters), and so a mechanism for automatically detecting and propagating the convention in use could still be of value. AndrewI From rms@gnu.ai.mit.edu Tue Jul 1 17:55:36 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 1" "July" "1997" "20:55:47" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "20" "New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA13267 for ; Tue, 1 Jul 1997 17:55:35 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id UAA26226; Tue, 1 Jul 1997 20:55:47 -0400 Message-Id: <199707020055.UAA26226@psilocin.gnu.ai.mit.edu> From: Richard Stallman To: eliz@is.elta.co.il, voelker@cs.washington.edu Subject: New way of handling CRLF Date: Tue, 1 Jul 1997 20:55:47 -0400 The MULE features include a new way of handling CRLF conversion. It detects the need to convert CRLF using the same mechanism that detects the need to convert international character sets. One consequence of this is that it ought to succeed in editing files that use LF or files that use CRLF. Regardless of what type of system you are on and what type of file system you are using, the file will appear in the normal Emacs way, with newlines between the lines. Does this mean that some of the features for text vs binary files and untranslated file systems are now unnecessary? Can I simplify the "Text Files and Binary Files" in the manual? Please answer me as soon as you can; I am trying to finish the manual. Note: currently there is a bug: when you visit a file on Unix which uses CRLF between lines, it recognizes that, but buffer-file-coding-system is set to nil, which is not right. I will forward you the fix for this as soon as I get it. From eliz@is.elta.co.il Wed Jul 2 01:03:33 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Wed" " 2" "July" "1997" "11:03:09" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "" "34" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id BAA28595 for ; Wed, 2 Jul 1997 01:03:30 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id LAA27499; Wed, 2 Jul 1997 11:03:10 +0300 X-Sender: eliz@is In-Reply-To: <199707020055.UAA26226@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: voelker@cs.washington.edu Subject: Re: New way of handling CRLF Date: Wed, 2 Jul 1997 11:03:09 +0300 (IDT) On Tue, 1 Jul 1997, Richard Stallman wrote: The following is purely theoretical, based on what you told in your message. I didn't have time yet to download and install the pretest, neither do I know how does MULE detect and convert the file format. > Does this mean that some of the features for text vs binary files > and untranslated file systems are now unnecessary? Can I simplify > the "Text Files and Binary Files" in the manual? I would guess that the manual needs to be changed, but not necessarily simplified. The text vs binary thing has two aspects: reading them into Emacs and writing them back to the filesystem. No matter how smart the CRLF detection mechanism is, there will be cases when users will want a buffer to be written in specific format of their preference, which might be different from the format of the original file as read by Emacs. I'm also not sure that the CRLF detection can be made fully automatic. Imagine a binary file (like an executable program) that includes a CRLF pair somewhere; would Emacs 20 strip the CR from it when it reads that file and treat it as text? So I think Emacs 20 will need to keep the special varieties of `find-file' that specify text or binary explicitly (btw, it seems as if they aren't mentioned anywhere in the 19.34 manual). There should also be a way to tell Emacs to write a buffer (or a region) with LFs translated to CRLFs. In particular, the (un)?translated filesystem feature should be kept IMHO. If the above reasoning is true, there should be minor changes to the manual (to explain the automatic CRLF detection feature), but the bulk of the text should be kept. From eliz@is.elta.co.il Thu Jul 3 08:39:40 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" " 3" "July" "1997" "18:36:11" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "14" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id IAA18394 for ; Thu, 3 Jul 1997 08:39:39 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id SAA01572; Thu, 3 Jul 1997 18:36:11 +0300 X-Sender: eliz@is In-Reply-To: <199707030040.UAA03584@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: voelker@cs.washington.edu Subject: Re: New way of handling CRLF Date: Thu, 3 Jul 1997 18:36:11 +0300 (IDT) On Wed, 2 Jul 1997, Richard Stallman wrote: > We have two mechanisms for deciding whether a file should have LF, not > CRLF based on the file name. One looks for "binary files" and one > looking for untranslated file systems. > > Could these be unified, I wonder? On "translated" file systems, Emacs should decide whether the file is text (and then convert CRLF -> LF) or binary. On "untranslated" file systems, all files should be read and written verbatim. From rms@gnu.ai.mit.edu Wed Jul 2 17:39:07 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" " 2" "July" "1997" "20:39:37" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "11" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA17875 for ; Wed, 2 Jul 1997 17:39:06 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id UAA03573; Wed, 2 Jul 1997 20:39:37 -0400 Message-Id: <199707030039.UAA03573@psilocin.gnu.ai.mit.edu> In-reply-to: <199707021953.MAA19844@joker.cs.washington.edu> (voelker@cs.washington.edu) References: <199707020055.UAA26226@psilocin.gnu.ai.mit.edu> <199707021953.MAA19844@joker.cs.washington.edu> From: Richard Stallman To: voelker@cs.washington.edu Subject: Re: New way of handling CRLF Date: Wed, 2 Jul 1997 20:39:37 -0400 I agree with Eli that users will still want a mechanism by which files are written in a format automatically determined by Emacs. I agree. Still, I would really really appreciate it if you would tell me how things DO work now! Does Emacs automatically figure out whether a file has CRLF or LF? (There is a bug in the pretest that fails to save a file with CRLF if it was recognized with CRLF. That has been fixed.) From rms@gnu.ai.mit.edu Wed Jul 2 17:40:15 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" " 2" "July" "1997" "20:40:45" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "6" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA17925 for ; Wed, 2 Jul 1997 17:40:14 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id UAA03584; Wed, 2 Jul 1997 20:40:45 -0400 Message-Id: <199707030040.UAA03584@psilocin.gnu.ai.mit.edu> In-reply-to: <199707021953.MAA19844@joker.cs.washington.edu> (voelker@cs.washington.edu) References: <199707020055.UAA26226@psilocin.gnu.ai.mit.edu> <199707021953.MAA19844@joker.cs.washington.edu> From: Richard Stallman To: voelker@cs.washington.edu CC: eliz@is.elta.co.il Subject: Re: New way of handling CRLF Date: Wed, 2 Jul 1997 20:40:45 -0400 We have two mechanisms for deciding whether a file should have LF, not CRLF based on the file name. One looks for "binary files" and one looking for untranslated file systems. Could these be unified, I wonder? And could they both be done using file-coding-system-alist now? From rms@gnu.ai.mit.edu Thu Jul 3 12:18:41 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Thu" " 3" "July" "1997" "15:19:06" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199707031919.PAA10787@psilocin.gnu.ai.mit.edu>" "12" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA07559 for ; Thu, 3 Jul 1997 12:18:39 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id PAA10787; Thu, 3 Jul 1997 15:19:06 -0400 Message-Id: <199707031919.PAA10787@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Thu, 3 Jul 1997 18:36:11 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu Subject: Re: New way of handling CRLF Date: Thu, 3 Jul 1997 15:19:06 -0400 On "translated" file systems, Emacs should decide whether the file is text (and then convert CRLF -> LF) or binary. On "untranslated" file systems, all files should be read and written verbatim. That is what it does now--doesn't it? So what are you trying to say? Perhaps you misunderstood my question and answered a completely different one. Right now we have two separate mechanisms to do two similar jobs. Can we replace them with one mechanism that can do both jobs? From eliz@is.elta.co.il Sun Jul 6 07:22:25 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sun" " 6" "July" "1997" "17:22:00" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA24478 for ; Sun, 6 Jul 1997 07:22:23 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id RAA08656; Sun, 6 Jul 1997 17:22:01 +0300 X-Sender: eliz@is In-Reply-To: <199707031919.PAA10787@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: voelker@cs.washington.edu Subject: Re: New way of handling CRLF Date: Sun, 6 Jul 1997 17:22:00 +0300 (IDT) On Thu, 3 Jul 1997, Richard Stallman wrote: > On "translated" file systems, Emacs should decide whether the file is > text (and then convert CRLF -> LF) or binary. > > On "untranslated" file systems, all files should be read and written > verbatim. > > That is what it does now--doesn't it? So what are you trying to > say? I was trying to say that the two should be combined. (un)?translated says whether the CRLF<->LF conversion is at all an issue, and the detection of the file type says whether this particular file needs the conversion, given that it belongs to a "translated" filesystem. If it already works this way, then my comments are redundant. From eliz@is.elta.co.il Sun Jul 6 07:23:08 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sun" " 6" "July" "1997" "17:22:44" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "35" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA24484 for ; Sun, 6 Jul 1997 07:23:05 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id RAA08662; Sun, 6 Jul 1997 17:22:44 +0300 X-Sender: eliz@is In-Reply-To: <199707042102.OAA34672@joker.cs.washington.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Geoff Voelker cc: rms@gnu.ai.mit.edu, Andrew Innes Subject: Re: New way of handling CRLF Date: Sun, 6 Jul 1997 17:22:44 +0300 (IDT) On Fri, 4 Jul 1997, Geoff Voelker wrote: > correctly (e.g., on a text file with CRLF, both a find-file and a > find-file-binary create a buffer with the text file stripped of > CRLF, What about binary files with embedded CRLFs? How can Emacs tell which files are and which aren't ``text''? If it can't, then the above behavior is wrong: I might want to use `find-file-binary' to read a binary file (e.g., an executable program) that just happens to have embedded CRLF pairs, either as part of text messages or even just an opcode that happens to look like CRLF. Will I then be presented with the file with all CRs in CRLF pairs removed? > a text file without CRLF, Emacs reads it in correctly, but the > buffer-file-type is "text", and so the file gets written out with LF > converted to CRLF. This is not a bug in the coding-system code, but > rather due to the fact that, internally, Emacs under DOS_NT looks at > the buffer-file-type, sees "text", and opens the file in text mode, > and the operating system changes LF to CRLF. I'm not sure this is a bug, either. I can imagine cases where the user would like Unix-style text files be written as DOS-style text. I haven't decided yet what the default should be here, but at least a user-definable option should be available to get the non-default behavior. > Given the new coding-system framework, I think that all file I/O under > DOS_NT should now be done in binary mode since the data that Emacs > gives to the operating system does not need any conversion. If that is the case, how would a user tell Emacs that a file which originally had no CRs should have them added on output (assuming that you agree that such cases are possible)? From rms@gnu.ai.mit.edu Sun Jul 6 17:08:03 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sun" " 6" "July" "1997" "20:08:26" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199707070008.UAA15380@psilocin.gnu.ai.mit.edu>" "15" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA07584 for ; Sun, 6 Jul 1997 17:08:02 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id UAA15380; Sun, 6 Jul 1997 20:08:26 -0400 Message-Id: <199707070008.UAA15380@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Sun, 6 Jul 1997 17:22:44 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: New way of handling CRLF Date: Sun, 6 Jul 1997 20:08:26 -0400 What about binary files with embedded CRLFs? Specifying that a file is binary means specifying no conversion. Therefore, CRLF in these files will not be converted. > Given the new coding-system framework, I think that all file I/O under > DOS_NT should now be done in binary mode since the data that Emacs > gives to the operating system does not need any conversion. If that is the case, how would a user tell Emacs that a file which originally had no CRs should have them added on output (assuming that you agree that such cases are possible)? You can certainly do this by specifying a different coding system when you save the file. From eliz@is.elta.co.il Sun Jul 13 10:59:56 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sun" "13" "July" "1997" "20:59:41" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "58" "New way to handle CRLF in Emacs 20.0" "^From:" nil nil "7" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id KAA13469 for ; Sun, 13 Jul 1997 10:59:55 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id UAA28768; Sun, 13 Jul 1997 20:59:41 +0300 X-Sender: eliz@is Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Geoff Voelker cc: Richard Stallman , Andrew Innes Subject: New way to handle CRLF in Emacs 20.0 Date: Sun, 13 Jul 1997 20:59:41 +0300 (IDT) > Actually, the new coding-system framework appears to obviate the need > for buffer-file-type; file-coding-system-alist and > buffer-file-coding-system appear to be flexible enough to supercede > it. I will need to think more about this, though, since it is a > rather drastic change under DOS_NT. (Eli and Andrew, if you get a > chance to look at the coding system support, I would like to hear what > you think about doing away with buffer-file-type, too.) Here's what I think, after spending an evening reading the code and playing with Emacs. I also think that the coding system can supercede buffer-file-type. We need to make a list of filename patterns that will automatically guess the coding system given a filename. If a given file is not in the list, Emacs should try to guess its EOL format, like it does now. Since this guess might be wrong (for example, Emacs decides that the file is CRLF-style when it sees the first CRLF pair, and might thus be fooled by a binary file), it would be nice to have options e.g. to ask the user whether the guess is correct, or require more than a single CRLF before a decision is made. (I didn't think about this too much, so I might be wrong.) Emacs should only do the above for filesystems that aren't in the untranslated list (for which all file I/O should be unconverted). I'd like to see user options (other than to tell them set the coding system) to have Emacs write files in specific (CRLF or LF) format. the default behavior of preserving the original EOL encodings seems reasonable. The options would of course just set the coding system, but I'd rather people who need to do this don't have to know too much about coding systems. I also think that the (un)?translated filesystem feature might be useful to Unix users as well. I can imagine NT or even DOS disks mounted via networks, or people who run Linux-based systems and access DOS partitions there (I actually see quite a few complaints from the latter on gnu.emacs.help). These might benefit by adding such disks to translated systems' list and having Emacs handle the conversion. So maybe it's a good idea to move this feature to lisp/files.el? > Currently, the default for file-coding-system-alist is 'undecided. > Under DOS_NT, this should probably be 'emacs-mule so that CRLF is > decoded and encoded by default. I agree. But shouldn't we also set coding.eol_type, for the EOL conversion to take place? I though 'emacs-mule is not enough, no? > file-name-buffer-file-type-alist and the untranslate functions. The > last issue is whether to remove buffer-file-type, but I won't do > anything about that until more people agree that it is no longer > necessary. I think that it can go once the coding system handles everything. We need to decide whether the T: or B: in the modeline is necessary (it seems that the coding system characters show the same information). From eliz@is.elta.co.il Sun Jul 13 11:02:20 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sun" "13" "July" "1997" "21:02:01" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "65" "CRLF on DOS_NT" "^From:" nil nil "7" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id LAA13569 for ; Sun, 13 Jul 1997 11:02:19 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id VAA28793; Sun, 13 Jul 1997 21:02:01 +0300 X-Sender: eliz@is Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: Geoff Voelker , Andrew Innes Subject: CRLF on DOS_NT Date: Sun, 13 Jul 1997 21:02:01 +0300 (IDT) The following changes are required to make CRLF <-> LF conversion work in most common cases. I have deliberately not tried to get them into final shape, since I need to learn more about the coding systems, and because Geoff said he will work on that. I didn't install these changes, for these reasons (and also because I didn't have enough time to do that today). See also my other mail about the file format translation. (Geoff, the `callproc.c' patch is DOS-specific, since that fragment is for DOS only; you might consider looking up the relevant code for the NT subprocess support.) 1997-07-10 Eli Zaretskii * fileio.c (Fwrite_region) [DOS_NT]: Always use binary mode since coding conversion now takes care of NL -> CRLF. *** src/fileio.c~0 Tue Jul 8 11:36:00 1997 --- src/fileio.c Thu Jul 10 23:16:14 1997 *************** to the file, instead of any buffer conte *** 3799,3806 **** struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5; struct buffer *given_buffer; #ifdef DOS_NT ! int buffer_file_type ! = NILP (current_buffer->buffer_file_type) ? O_TEXT : O_BINARY; #endif /* DOS_NT */ struct coding_system coding; --- 3799,3805 ---- struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5; struct buffer *given_buffer; #ifdef DOS_NT ! int buffer_file_type = O_BINARY; #endif /* DOS_NT */ struct coding_system coding; 1997-07-11 Eli Zaretskii * callproc.c (Fcall_process) [MSDOS]: Request EOL conversion of the process output, unless we were promised it is binary. *** src/callproc.c~0 Mon Jul 7 00:56:00 1997 --- src/callproc.c Fri Jul 11 21:48:30 1997 *************** If you quit, the process is killed with *** 295,300 **** --- 295,311 ---- val = Qnil; } setup_coding_system (Fcheck_coding_system (val), &process_coding); + #ifdef MSDOS + /* FIXME: this probably should be moved into the guts of + `Ffind_operation_coding_system' for the case of `call-process'. */ + if (NILP (Vbinary_process_output)) + { + process_coding.eol_type = CODING_EOL_CRLF; + if (process_coding.type == coding_type_no_conversion) + /* FIXME: should we set type to undecided? */ + process_coding.type = coding_type_emacs_mule; + } + #endif } } From rms@gnu.ai.mit.edu Sun Jul 13 14:41:06 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sun" "13" "July" "1997" "17:41:39" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "6" "Re: New way to handle CRLF in Emacs 20.0" "^From:" nil nil "7" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA20208 for ; Sun, 13 Jul 1997 14:41:05 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA25670; Sun, 13 Jul 1997 17:41:39 -0400 Message-Id: <199707132141.RAA25670@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Sun, 13 Jul 1997 20:59:41 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: New way to handle CRLF in Emacs 20.0 Date: Sun, 13 Jul 1997 17:41:39 -0400 be nice to have options e.g. to ask the user whether the guess is correct, or require more than a single CRLF before a decision is made. (I didn't think about this too much, so I might be wrong.) I think that is not worth the trouble, given that we still have find-file-text and find-file-binary. From rms@gnu.ai.mit.edu Sun Jul 13 14:43:43 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sun" "13" "July" "1997" "17:44:11" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "9" "Re: New way to handle CRLF in Emacs 20.0" "^From:" nil nil "7" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA20278 for ; Sun, 13 Jul 1997 14:43:42 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA25707; Sun, 13 Jul 1997 17:44:11 -0400 Message-Id: <199707132144.RAA25707@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Sun, 13 Jul 1997 20:59:41 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: New way to handle CRLF in Emacs 20.0 Date: Sun, 13 Jul 1997 17:44:11 -0400 I also think that the (un)?translated filesystem feature might be useful to Unix users as well. I can imagine NT or even DOS disks mounted via networks, A feature like this could be useful; but some of the present details don't fit this new context. If you are running Emacs on a GNU system, "untranslated" file systems are the usual case; file systems for which new files should be translated are the special case. This is the opposite of the situation for MSDOS. From rms@gnu.ai.mit.edu Sun Jul 13 14:44:49 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Sun" "13" "July" "1997" "17:45:24" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199707132145.RAA25719@psilocin.gnu.ai.mit.edu>" "16" "Re: New way to handle CRLF in Emacs 20.0" "^From:" nil nil "7" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA20297 for ; Sun, 13 Jul 1997 14:44:49 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA25719; Sun, 13 Jul 1997 17:45:24 -0400 Message-Id: <199707132145.RAA25719@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Sun, 13 Jul 1997 20:59:41 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: New way to handle CRLF in Emacs 20.0 Date: Sun, 13 Jul 1997 17:45:24 -0400 > Currently, the default for file-coding-system-alist is 'undecided. > Under DOS_NT, this should probably be 'emacs-mule No, definitely not. so that CRLF is > decoded and encoded by default. CRLF encoding is supposed to happen just the same for undecided as it does for emacs-mule. We need to decide whether the T: or B: in the modeline is necessary (it seems that the coding system characters show the same information). Yes, that is something we should decide right now. From rms@gnu.ai.mit.edu Fri Jul 18 20:12:22 1997 X-VM-v5-Data: ([nil nil nil nil t t nil nil nil] [nil "Fri" "18" "July" "1997" "23:13:02" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199707190313.XAA18973@psilocin.gnu.ai.mit.edu>" "71" "Re: CRLF on DOS_NT" "^From:" nil nil "7" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id UAA22532 for ; Fri, 18 Jul 1997 20:12:21 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id XAA18973; Fri, 18 Jul 1997 23:13:02 -0400 Message-Id: <199707190313.XAA18973@psilocin.gnu.ai.mit.edu> In-reply-to: <199707182334.QAA39176@joker.cs.washington.edu> (voelker@cs.washington.edu) References: <199707132042.QAA25113@psilocin.gnu.ai.mit.edu> <199707160153.SAA23676@joker.cs.washington.edu> <199707182305.TAA17659@psilocin.gnu.ai.mit.edu> <199707182334.QAA39176@joker.cs.washington.edu> From: Richard Stallman To: voelker@cs.washington.edu Subject: Re: CRLF on DOS_NT Date: Fri, 18 Jul 1997 23:13:02 -0400 I've always interpreted the semantics of specifying 'nil' (text) in file-name-buffer-file-type-alist as being that you explicitly want CRLF separating lines. For example, no matter what, you want config.sys to have CRLFs between the lines. That is a good point. So here's the change I've made. But I wonder whether emacs-mule-dos is the right coding system in other respects. You've argued that -dos is right, but is emacs-mule right? *** dos-w32.el 1997/07/18 22:54:23 1.6 --- dos-w32.el 1997/07/19 03:10:17 *************** *** 102,128 **** If the match is nil (for text): 'emacs-mule-dos' Otherwise: If the file exists: 'undecided' ! If the file does not exist: 'emacs-mule-dos' If COMMAND is 'write-region', the coding system is chosen based upon the value of 'buffer-file-type': If t, the coding system is 'no-conversion', otherwise it is 'emacs-mule-dos'." (let ((op (nth 0 command)) (target) ! (binary) (undecided nil)) (cond ((eq op 'insert-file-contents) (setq target (nth 1 command)) (setq binary (find-buffer-file-type target)) ! (if (not binary) ! (setq undecided ! (and (file-exists-p target) ! (not (find-buffer-file-type-match target)))))) ((eq op 'write-region) (setq binary buffer-file-type))) (cond (binary '(no-conversion . no-conversion)) (undecided '(undecided . undecided)) ! (t '(emacs-mule-dos . emacs-mule-dos))))) (modify-coding-system-alist 'file "" 'find-buffer-file-type-coding-system) --- 102,129 ---- If the match is nil (for text): 'emacs-mule-dos' Otherwise: If the file exists: 'undecided' ! If the file does not exist: 'undecided-dos' If COMMAND is 'write-region', the coding system is chosen based upon the value of 'buffer-file-type': If t, the coding system is 'no-conversion', otherwise it is 'emacs-mule-dos'." (let ((op (nth 0 command)) (target) ! (binary nil) (text nil) (undecided nil)) (cond ((eq op 'insert-file-contents) (setq target (nth 1 command)) (setq binary (find-buffer-file-type target)) ! (unless binary ! (if (find-buffer-file-type-match target) ! (setq text t) ! (setq undecided (file-exists-p target))))) ((eq op 'write-region) (setq binary buffer-file-type))) (cond (binary '(no-conversion . no-conversion)) + (text '(emacs-mule-dos . emacs-mule-dos)) (undecided '(undecided . undecided)) ! (t '(undecided-dos . undecided-dos))))) (modify-coding-system-alist 'file "" 'find-buffer-file-type-coding-system) From Marc.Fleischeuers@kub.nl Fri Aug 1 01:07:26 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "" " 1" "August" "1997" "10:07:17" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" "" "69" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA21136 for ; Fri, 1 Aug 1997 01:07:24 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id KAA27228; Fri, 1 Aug 1997 10:07:19 +0200 (MET DST) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> In-Reply-To: Richard Stallman's message of Thu, 31 Jul 1997 19:42:35 -0400 Message-ID: Lines: 69 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: Richard Stallman Cc: voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 01 Aug 1997 10:07:17 +0200 > I'm not sure I understand what you are trying to do. When a file is > inside of Emacs, line are always terminated by newlines. The line > termination that exists when the file is in the filesystem is only > placed there when the file is written out. There is no need to > explicitly place CR or LF characters in a file to change the > termination used. > > You're right--but perhaps this can be a clue to finding a place > where the documentation needs to be made clearer. So it is worth figuring > out why Marc got the wrong idea. My intent was too straightforward, obviously. I noticed some problems with msdos-type files (note: these files were not created with emacs, but with other programs most notably netscape). I knew about the cr-lf line ending convention so in an attempt to create a msdos file in emacs, I ended lines with an explicit `C-q Cm C-q C-j'. Please note that this works as expected in emacs 19.33 (i386-*-Win NT 4.0). The variables Geoff mentioned (buffer-file-type and coding-system-for-write) have sent me off on a chase though emacs' help. Skip to the last paragraph if you are not interested in the dead ends. First, `C-h v buffer-file-type' mentions that this is a MS-DOG and Windows NT-only variable, and that it's value is nil. I tried to set the variable with M-x set-variable RET buffer-file-type but when I press return all I get is [no match]. I don't think this is a great loss though, surely with so many advanced encoding and decoding facilities, there is no more need for MSDOG as a special case? On to `coding-system-for-write'. The documentation mentions that this is a variable of internal use only. Setting it would probably require lisp. The appropriate values for this variable should be taken from `coding-system-alist'. There is however no documentation for this variable (`C-h v coding-system-alist' -> [no match]). Still, an internal variable is not the first thing to use if I want to creat an ms-dos file. Apropos'ing around I found another promising variable, `buffer-file-format', valid values for which are found in `format-alist'. In this alist there seems to be an appropriate format, `ibm'. However, `M-x set-variable RET buffer-file-format' again gives [no match]. What I should have used all along was `M-x set-buffer-file-coding-system RET iso-latin-1-dos'. This function is accessible from the menu ([menu-bar mule set-various-coding-systems set-buffer-file-coding-system]) and from the C-x RET keymap. However, it was only from the resulting file that I could see that it was what I wanted (in fact there may still be a better way). The documentation for `set-buffer-file-coding-system' does not mention to what values it can be set, and the description in `M-x describe-coding-system' does not mention what any of the listed coding systems do. In fact, after I selected iso-latin-1-dos, it was described as Current buffer file: buffer-file-coding-system - -- undecided-dos The short answer is the documentation for describe-coding-system and set-*-coding-system could be improved upon. For describe-coding-system, why is it necessary to mention the priority of coding systems? Instead, use the space to explain what the selected coding systems do. For `set-*-coding-systems', it could be mentioned to what values it can be set, and perhaps what they do. Marc -- Computer! End program! Computer! Create _new_ program! From rms@gnu.ai.mit.edu Sat Aug 2 03:18:33 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sat" " 2" "August" "1997" "06:18:47" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "13" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id DAA24743 for ; Sat, 2 Aug 1997 03:18:33 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id GAA13689; Sat, 2 Aug 1997 06:18:47 -0400 Message-Id: <199708021018.GAA13689@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Marc Fleischeuers on 01 Aug 1997 10:07:17 +0200) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> From: Richard Stallman To: Marc.Fleischeuers@kub.nl cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Sat, 2 Aug 1997 06:18:47 -0400 What I should have used all along was `M-x set-buffer-file-coding-system RET iso-latin-1-dos'. This function is accessible from the menu ([menu-bar mule set-various-coding-systems set-buffer-file-coding-system]) and from the C-x RET keymap. However, it was only from the resulting file that I could see that it was what I improved the doc of this command. But that won't fully solve the problem. What I really should do is to point you at this command from somewhere else that you would naturally look. Any suggestions for where that could be? From rms@gnu.ai.mit.edu Sat Aug 2 21:23:03 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sun" " 3" "August" "1997" "00:23:09" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "8" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id VAA20521 for ; Sat, 2 Aug 1997 21:23:03 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id AAA25764; Sun, 3 Aug 1997 00:23:09 -0400 Message-Id: <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Marc Fleischeuers on 01 Aug 1997 10:07:17 +0200) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> From: Richard Stallman To: Marc.Fleischeuers@kub.nl CC: voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Sun, 3 Aug 1997 00:23:09 -0400 I knew about the cr-lf line ending convention so in an attempt to create a msdos file in emacs, I ended lines with an explicit `C-q Cm C-q C-j'. Please note that this works as expected in emacs 19.33 (i386-*-Win NT 4.0). Is that really true? What algorithm does 19.33 use for LF to CRLF conversion? Maybe we should change the Emacs 20 EOL conversion to do the same thing. From handa@etl.go.jp Sun Aug 3 18:32:56 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" " 4" "August" "1997" "10:33:49" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "33" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA20790 for ; Sun, 3 Aug 1997 18:32:53 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id KAA06878; Mon, 4 Aug 1997 10:32:33 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA00812; Mon, 4 Aug 1997 10:32:33 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id KAA04718; Mon, 4 Aug 1997 10:33:49 +0900 Message-Id: <199708040133.KAA04718@etlken.etl.go.jp> In-reply-to: <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Sun, 3 Aug 1997 00:23:09 -0400) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Mon, 4 Aug 1997 10:33:49 +0900 Date: Sun, 3 Aug 1997 00:23:09 -0400 From: Richard Stallman I knew about the cr-lf line ending convention so in an attempt to create a msdos file in emacs, I ended lines with an explicit `C-q Cm C-q C-j'. Please note that this works as expected in emacs 19.33 (i386-*-Win NT 4.0). Is that really true? What algorithm does 19.33 use for LF to CRLF conversion? Maybe we should change the Emacs 20 EOL conversion to do the same thing. Since the above is the first mail I get about this thread, this reply may fail to catch the point... I don't know why the above doesn't work for Emacs 20. I've just tried the following. 1) At first, visit a new file. 2) type `a b c C-q C-m C-q C-j' 3) save it. 4) visit it again. Then the file is read as `undecided-dos' and the buffer contents are 4-byte of: abc\C-j This means that CR LF is decoded to single LF. But, since buffer-file-coding-system is undecided-dos, when I edit this file and save it, all LFs are encoded back to CR LF. --- Ken'ichi HANDA handa@etl.go.jp From Marc.Fleischeuers@kub.nl Mon Aug 4 01:42:41 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "" " 4" "August" "1997" "10:42:27" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "25" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA03503 for ; Mon, 4 Aug 1997 01:42:36 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id KAA27167; Mon, 4 Aug 1997 10:42:25 +0200 (MET DST) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708021018.GAA13689@psilocin.gnu.ai.mit.edu> In-Reply-To: Richard Stallman's message of Sat, 2 Aug 1997 06:18:47 -0400 Message-ID: Lines: 25 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: Richard Stallman Cc: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 04 Aug 1997 10:42:27 +0200 Richard Stallman writes: > What I really should do is to point you at this command from somewhere > else that you would naturally look. > > Any suggestions for where that could be? The command is in the menu and in the advertised C-x RET keymap; I think anyone inversitgating emacs' new features should find these functions easily (I did't go there straightaway because I first followed the suggestions by Geoff Voelker). In the menu-bar there is even a corresponding `describe' entry for input method and coding systems. I think this is a good thing, this is where I would look if I were a user. In fact I did look there when I first started emacs 20; it's just that the descriptions are not very informative about what the functions actually do for me (input methods do not work (yet?) so I cannot comment on that). If the documentation for set-buffer-file-coding-system, and `M-x describe-coding-system' give information about the available, resp. selected coding systems and what they do for me, I think this should do it. Marc From Marc.Fleischeuers@kub.nl Mon Aug 4 02:40:35 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "" " 4" "August" "1997" "11:40:11" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "45" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id CAA04516 for ; Mon, 4 Aug 1997 02:40:28 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id LAA00950; Mon, 4 Aug 1997 11:40:09 +0200 (MET DST) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> In-Reply-To: Kenichi Handa's message of Mon, 4 Aug 1997 10:33:49 +0900 Message-ID: Lines: 45 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: Kenichi Handa Cc: rms@gnu.ai.mit.edu, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 04 Aug 1997 11:40:11 +0200 Kenichi Handa writes: > I've just tried the following. > 1) At first, visit a new file. > 2) type `a b c C-q C-m C-q C-j' > 3) save it. > 4) visit it again. > > Then the file is read as `undecided-dos' and the buffer contents are > 4-byte of: > abc\C-j > This means that CR LF is decoded to single LF. But, since > buffer-file-coding-system is undecided-dos, when I edit this file and > save it, all LFs are encoded back to CR LF. This is the way it should be, unfortunately it is not for me. I have repeated the four steps above. When I first open a new file, the buffer-file-coding-system is nil and the mode-line indicator is `:'. If I insert `a b c C-q C-m C-q C-j' in the buffer and then save the file, the buffer contains the five bytes `abc\C-m\C-j', buffer-file-coding-system is still nil, and the mode-line indicator is still `:'. With `c:\emacs\bin\hexl abc', the contents of the file is `6162 630d 0d0a'. If I then re-visit the file (`C-x C-v RET') it contains six bytes `abc\C-j\C-j\C-j', buffer-file-coding-system is `- -- undecided-mac', and the mode-line indicator is `/'. I started emacs with `c:\emacs\bin\emacs.bat --no-site-file --no-init-file' The batch file sets a number of environment variables. It is not modified from the one generated by the install process. I use emacs 20.0.92 on Windows NT 4.0, compiled with MS VC++ 4.2. I have also used the following version, started the same way, to perform exactly the same steps: In GNU Emacs 19.33.1 (i386-*-nt4.0) of Wed Aug 14 1996 on BANANA-FISH configured using `configure NT' The file is written and read back in as the five bytes `abc\C-m\C-y'. There is a mode-line indicator `(T:', both when I first open the file and when I read it back in. Marc From handa@etl.go.jp Mon Aug 4 04:43:24 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" " 4" "August" "1997" "20:37:31" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "69" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id EAA06611 for ; Mon, 4 Aug 1997 04:43:23 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id UAA11557; Mon, 4 Aug 1997 20:36:15 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id UAA07947; Mon, 4 Aug 1997 20:36:15 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id UAA05301; Mon, 4 Aug 1997 20:37:31 +0900 Message-Id: <199708041137.UAA05301@etlken.etl.go.jp> In-reply-to: (message from Marc Fleischeuers on 04 Aug 1997 11:40:11 +0200) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> From: Kenichi Handa To: Marc.Fleischeuers@kub.nl CC: rms@gnu.ai.mit.edu, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Mon, 4 Aug 1997 20:37:31 +0900 From: Marc Fleischeuers Date: 04 Aug 1997 11:40:11 +0200 Kenichi Handa writes: > I've just tried the following. > 1) At first, visit a new file. > 2) type `a b c C-q C-m C-q C-j' > 3) save it. > 4) visit it again. > > Then the file is read as `undecided-dos' and the buffer contents are > 4-byte of: > abc\C-j > This means that CR LF is decoded to single LF. But, since > buffer-file-coding-system is undecided-dos, when I edit this file and > save it, all LFs are encoded back to CR LF. This is the way it should be, unfortunately it is not for me. I have repeated the four steps above. When I first open a new file, the buffer-file-coding-system is nil and the mode-line indicator is `:'. If I insert `a b c C-q C-m C-q C-j' in the buffer and then save the file, the buffer contains the five bytes `abc\C-m\C-j', buffer-file-coding-system is still nil, and the mode-line indicator is still `:'. With `c:\emacs\bin\hexl abc', the contents of the file is `6162 630d 0d0a'. Hmm, the sequence CR LF was written out as CR CR LF. It seems that the file is opened by O_TEXT instead of O_BINARY. But, this should have been fixed in 20.0.92 already. Strange... Could you please check the file src/fileio.c? Is it applied the following patch made by ? ------------------------------------------------------------ RCS file: RCS/fileio.c,v retrieving revision 1.250 retrieving revision 1.251 diff -u -r1.250 -r1.251 --- fileio.c 1997/07/12 06:43:08 1.250 +++ fileio.c 1997/07/13 20:37:01 1.251 @@ -3799,8 +3799,7 @@ struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5; struct buffer *given_buffer; #ifdef DOS_NT - int buffer_file_type - = NILP (current_buffer->buffer_file_type) ? O_TEXT : O_BINARY; + int buffer_file_type = O_BINARY; #endif /* DOS_NT */ struct coding_system coding; ------------------------------------------------------------ If I then re-visit the file (`C-x C-v RET') it contains six bytes `abc\C-j\C-j\C-j', buffer-file-coding-system is `- -- undecided-mac', and the mode-line indicator is `/'. This is an expected behaviour when Emacs reads `a b c CR CR LF'. When Emacs encounters CR not followed by LF, it thinks the end-of-line format for the file is CR (Mac's convention), and translate CR to LF. LF is read as is. So, the buffer contains three LFs. So, the problem seems to be in writing a file. Anyway, I don't have Windows NT. I asked a person who is an expert of Windows to check the code. --- Ken'ichi HANDA handa@etl.go.jp From Marc.Fleischeuers@kub.nl Mon Aug 4 05:21:28 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "" " 4" "August" "1997" "14:18:51" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "22" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA07716 for ; Mon, 4 Aug 1997 05:21:26 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id OAA10741; Mon, 4 Aug 1997 14:18:49 +0200 (MET DST) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <19 <199708041137.UAA05301@etlken.etl.go.jp> In-Reply-To: Kenichi Handa's message of Mon, 4 Aug 1997 20:37:31 +0900 Message-ID: Lines: 22 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: Kenichi Handa Cc: Marc.Fleischeuers@kub.nl, rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 04 Aug 1997 14:18:51 +0200 Kenichi Handa writes: > Hmm, the sequence CR LF was written out as CR CR LF. It seems that > the file is opened by O_TEXT instead of O_BINARY. But, this should > have been fixed in 20.0.92 already. Strange... > > Could you please check the file src/fileio.c? Is it applied the > following patch made by ? This patch is applied (that is, it says ``int buffer_file_type = O_BINARY'', that's what it should be I think) > So, the problem seems to be in writing a file. In a previous post today, I have described how both 19.33 and 20.0.92 both write the same bytes to disk. If the way in which this file is read in is correct (as I understand from your and RMS' posts) then a) the way the cr and lf sequences are interpreted differs between 19.34 and 20.0.92, and b) this difference in reading, is indeed not matched by an appropriate difference in writing. From handa@etl.go.jp Mon Aug 4 05:54:20 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" " 4" "August" "1997" "21:54:40" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "33" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA08453 for ; Mon, 4 Aug 1997 05:54:19 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id VAA13372; Mon, 4 Aug 1997 21:53:24 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id VAA09960; Mon, 4 Aug 1997 21:53:25 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id VAA05378; Mon, 4 Aug 1997 21:54:40 +0900 Message-Id: <199708041254.VAA05378@etlken.etl.go.jp> In-reply-to: (message from Marc Fleischeuers on 04 Aug 1997 14:18:51 +0200) From: Kenichi Handa To: Marc.Fleischeuers@kub.nl CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Mon, 4 Aug 1997 21:54:40 +0900 From: Marc Fleischeuers Date: 04 Aug 1997 14:18:51 +0200 > Could you please check the file src/fileio.c? Is it applied the > following patch made by ? This patch is applied (that is, it says ``int buffer_file_type = O_BINARY'', that's what it should be I think) I have just found that lisp/dos-w23.el is doing something about deciding coding system. Although I have not yet read the code in detail, I suspect that the code decides that coding system for writing a file on NT is undecided-dos. If it is true, it explains everything, because Emacs writes CR as is and converts LF to CR LF when it writes a file by undecided-dos. Perhaps, Mr. Voelker wrote this code so that NT users don't have to do special thing to create a DOS file. In your case, you don't have to insert \C-m by hand to creat a DOS file. Please just try the followings: 1) visit a new file 2) type `a b c RET' 3) save the file. 4) visit the file again. You should be able to create a file of `a b c CR LF' by step 3, and buffer-file-coding-system is set to undecided-dos by step 4. Mr. Voelker? Is this correct? --- Ken'ichi HANDA handa@etl.go.jp From Marc.Fleischeuers@kub.nl Mon Aug 4 05:59:03 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "" " 4" "August" "1997" "14:58:42" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "33" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA08582 for ; Mon, 4 Aug 1997 05:58:53 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id OAA14494; Mon, 4 Aug 1997 14:58:40 +0200 (MET DST) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <19 <199708041137.UAA05301@etlken.etl.go.jp> In-Reply-To: Marc Fleischeuers's message of 04 Aug 1997 14:18:51 +0200 Message-ID: Lines: 33 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: Marc Fleischeuers Cc: Kenichi Handa , rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 04 Aug 1997 14:58:42 +0200 Marc Fleischeuers writes: > Kenichi Handa writes: > > > Hmm, the sequence CR LF was written out as CR CR LF. It seems that > > the file is opened by O_TEXT instead of O_BINARY. But, this should > > have been fixed in 20.0.92 already. Strange... > > > > Could you please check the file src/fileio.c? Is it applied the > > following patch made by ? > > This patch is applied (that is, it says ``int buffer_file_type = > O_BINARY'', that's what it should be I think) > > > So, the problem seems to be in writing a file. I have examined the value of the lisp-variable `buffer-file-type' in several stages after reading and writing files containing cr and lf sequences. The value of this variable was always nil, indicating a text (i.e., non-binary) file. In buffer-file-type-alist it is set that files with extension '.tpu' are interpreted as binary, so I tried C-x C-f new.tpu a b c C-q C-m C-q C-j C-x C-s C-x C-v RET This file is created containing the intended 5 bytes, and it is read back in "correctly" (buffer contains `abc^M'). After reading the file in, the mode line indicator is `=:', and buffer-file-coding-system is `= -- no-conversion (alias: binary)'. May I argue that the concept of `buffer-file-type' and its associated variables and functions are removed from emacs 20? Marc From Marc.Fleischeuers@kub.nl Mon Aug 4 06:36:36 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "" " 4" "August" "1997" "15:36:27" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "14" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id GAA10461 for ; Mon, 4 Aug 1997 06:36:35 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id PAA16644; Mon, 4 Aug 1997 15:36:25 +0200 (MET DST) References: <199708041254.VAA05378@etlken.etl.go.jp> In-Reply-To: Kenichi Handa's message of Mon, 4 Aug 1997 21:54:40 +0900 Message-ID: Lines: 14 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: Kenichi Handa Cc: Marc.Fleischeuers@kub.nl, rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 04 Aug 1997 15:36:27 +0200 Kenichi Handa writes: > I have just found that lisp/dos-w23.el is doing something about > deciding coding system. Although I have not yet read the code in > detail, I suspect that the code decides that coding system for writing > a file on NT is undecided-dos. If it is true, it explains everything, I was reading there too. It appears that emacs does a lot of thinking for me. I have done some light testing, and it looks like everything acts like I expect it to, when (untranslated-file-p filename) returns t. This is what I'll be doing for a while, until something else breaks.. Marc From rms@gnu.ai.mit.edu Mon Aug 4 12:46:38 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" " 4" "August" "1997" "15:46:48" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "19" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA03249 for ; Mon, 4 Aug 1997 12:46:37 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id PAA22721; Mon, 4 Aug 1997 15:46:48 -0400 Message-Id: <199708041946.PAA22721@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Marc Fleischeuers on 04 Aug 1997 14:18:51 +0200) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <19 <199708041137.UAA05301@etlken.etl.go.jp> From: Richard Stallman To: Marc.Fleischeuers@kub.nl CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Mon, 4 Aug 1997 15:46:48 -0400 > Hmm, the sequence CR LF was written out as CR CR LF. It seems that > the file is opened by O_TEXT instead of O_BINARY. I would expect this is because of the usual DOS eol conversion. and not because of O_TEXT. This patch is applied (that is, it says ``int buffer_file_type = O_BINARY'', that's what it should be I think) I am not surprised. The same code in Emacs that converts just LF to CR LF will of course do so when the preceding character is a CR-- unless there is something special to stop it. As far as I know, there is nothing special to avoid encoding LF as CR LF based on the preceding character. Handa, is there? From rms@gnu.ai.mit.edu Mon Aug 4 12:55:40 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" " 4" "August" "1997" "15:55:46" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "11" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA03818 for ; Mon, 4 Aug 1997 12:55:39 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id PAA22807; Mon, 4 Aug 1997 15:55:46 -0400 Message-Id: <199708041955.PAA22807@psilocin.gnu.ai.mit.edu> In-reply-to: <199708041254.VAA05378@etlken.etl.go.jp> (message from Kenichi Handa on Mon, 4 Aug 1997 21:54:40 +0900) References: <199708041254.VAA05378@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Mon, 4 Aug 1997 15:55:46 -0400 If it is true, it explains everything, because Emacs writes CR as is and converts LF to CR LF when it writes a file by undecided-dos. Yes, of course. I've been telling both of you this over and over. You've been trying to unravel a mystery which is not a mystery at all. The real question is, should we put in a special feature to override that behavior when the buffer contains a CR? Should DOS-style EOL conversion recognize when the buffer contains CR LF, and output it as CR LF (rather than CR CR LF)? From rms@gnu.ai.mit.edu Mon Aug 4 13:05:08 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" " 4" "August" "1997" "16:05:23" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "18" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id NAA04450 for ; Mon, 4 Aug 1997 13:05:07 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id QAA22977; Mon, 4 Aug 1997 16:05:23 -0400 Message-Id: <199708042005.QAA22977@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Marc Fleischeuers on 04 Aug 1997 15:36:27 +0200) References: <199708041254.VAA05378@etlken.etl.go.jp> From: Richard Stallman To: Marc.Fleischeuers@kub.nl CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Mon, 4 Aug 1997 16:05:23 -0400 I was reading there too. It appears that emacs does a lot of thinking for me. I have done some light testing, and it looks like everything acts like I expect it to, when (untranslated-file-p filename) returns t. That sentence is ambiguous; it could mean there is a no problem, or it could mean there is a serious problem. untranslated-file-p is supposed to return t only when the file resides on a file system that is mounted on a Unix-like system. That is an unusual case for an MSDOS user; therefore, it is not the really important case. The really important case is when untranslated-file-p returns nil. So let's focus on the most important question first: what happens when untranslated-file-p returns nil? Do you get correct behavior in all cases? If not, can you tell us precisely which case is still incorrect? From handa@etl.go.jp Mon Aug 4 17:28:58 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 5" "August" "1997" "09:29:44" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "34" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA20984 for ; Mon, 4 Aug 1997 17:28:57 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id JAA25767; Tue, 5 Aug 1997 09:28:29 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id JAA29964; Tue, 5 Aug 1997 09:28:29 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id JAA05934; Tue, 5 Aug 1997 09:29:44 +0900 Message-Id: <199708050029.JAA05934@etlken.etl.go.jp> In-reply-to: <199708041946.PAA22721@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Mon, 4 Aug 1997 15:46:48 -0400) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <19 <199708041137.UAA05301@etlken.etl.go.jp> <199708041946.PAA22721@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Tue, 5 Aug 1997 09:29:44 +0900 Date: Mon, 4 Aug 1997 15:46:48 -0400 From: Richard Stallman The same code in Emacs that converts just LF to CR LF will of course do so when the preceding character is a CR-- unless there is something special to stop it. As far as I know, there is nothing special to avoid encoding LF as CR LF based on the preceding character. Handa, is there? You are right. I didn't wrote such a special code. If it is true, it explains everything, because Emacs writes CR as is and converts LF to CR LF when it writes a file by undecided-dos. Yes, of course. I've been telling both of you this over and over. You've been trying to unravel a mystery which is not a mystery at all. Please note that I joined this discussion from halfway. The real question is, should we put in a special feature to override that behavior when the buffer contains a CR? Should DOS-style EOL conversion recognize when the buffer contains CR LF, and output it as CR LF (rather than CR CR LF)? I don't like it because it's too kluge (can I use this word as an adjective?). But, if DOS users want it, it's not that difficult to implement it. --- Ken'ichi HANDA handa@etl.go.jp From rms@gnu.ai.mit.edu Mon Aug 4 23:31:17 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 5" "August" "1997" "02:30:07" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "24" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA03522 for ; Mon, 4 Aug 1997 23:31:16 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id CAA31410; Tue, 5 Aug 1997 02:30:07 -0400 Message-Id: <199708050630.CAA31410@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Marc Fleischeuers on 04 Aug 1997 11:40:11 +0200) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> From: Richard Stallman To: Marc.Fleischeuers@kub.nl CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk, rms@gnu.ai.mit.edu Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Tue, 5 Aug 1997 02:30:07 -0400 When I first open a new file, the buffer-file-coding-system is nil and the mode-line indicator is `:'. If I insert `a b c C-q C-m C-q C-j' in the buffer and then save the file, the buffer contains the five bytes `abc\C-m\C-j', buffer-file-coding-system is still nil, and the mode-line indicator is still `:'. With `c:\emacs\bin\hexl abc', the contents of the file is `6162 630d 0d0a'. This is the right behavior, as Emacs is currently designed. It may not be quite the best feature, but it is not a bug. If I then re-visit the file (`C-x C-v RET') it contains six bytes `abc\C-j\C-j\C-j', buffer-file-coding-system is `- -- undecided-mac', and the mode-line indicator is `/'. This is a bug. The presence of CR CR LF in the file should not cause mac EOL conversion to be used. I think that Emacs is being too quick to use mac EOL conversion. I suspect that right now any CR not followed by LF does this. If the file contains a LF anywhere near the beginning, then mac EOL conversion should not be used. Handa, can you fix this? From rms@gnu.ai.mit.edu Mon Aug 4 23:35:32 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 5" "August" "1997" "02:31:37" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "18" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA03608 for ; Mon, 4 Aug 1997 23:35:32 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id CAA31418; Tue, 5 Aug 1997 02:31:37 -0400 Message-Id: <199708050631.CAA31418@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Marc Fleischeuers on 04 Aug 1997 11:40:11 +0200) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> From: Richard Stallman To: Marc.Fleischeuers@kub.nl CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Tue, 5 Aug 1997 02:31:37 -0400 I have also used the following version, started the same way, to perform exactly the same steps: In GNU Emacs 19.33.1 (i386-*-nt4.0) of Wed Aug 14 1996 on BANANA-FISH configured using `configure NT' The file is written and read back in as the five bytes `abc\C-m\C-y'. There is a mode-line indicator `(T:', both when I first open the file and when I read it back in. If you write the file out and then visit it again, you are performing two experiments in series and you are telling only the result of the two of them. That isn't really useful. You need to tell us the result of each experiment. In other words, what exactly is in the file when you write it with Emacs 19 in this way? Is it a b c CR CR LF or a b c CR LF or what? From handa@etl.go.jp Tue Aug 5 01:10:29 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 5" "August" "1997" "17:10:48" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "30" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA06992 for ; Tue, 5 Aug 1997 01:10:27 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id RAA23247; Tue, 5 Aug 1997 17:09:34 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA26427; Tue, 5 Aug 1997 17:09:34 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id RAA06812; Tue, 5 Aug 1997 17:10:48 +0900 Message-Id: <199708050810.RAA06812@etlken.etl.go.jp> In-reply-to: <199708050630.CAA31410@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Tue, 5 Aug 1997 02:30:07 -0400) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050630.CAA31410@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: Marc.Fleischeuers@kub.nl, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk, rms@gnu.ai.mit.edu Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Tue, 5 Aug 1997 17:10:48 +0900 Richard Stallman writes: > If I then re-visit the file (`C-x C-v RET') it contains six bytes > `abc\C-j\C-j\C-j', buffer-file-coding-system is `- -- undecided-mac', > and the mode-line indicator is `/'. > This is a bug. The presence of CR CR LF in the file > should not cause mac EOL conversion to be used. > I think that Emacs is being too quick to use mac EOL conversion. > I suspect that right now any CR not followed by LF does this. Right. > If the file contains a LF anywhere near the beginning, > then mac EOL conversion should not be used. > Handa, can you fix this? Yes. But how about LF CR LF or CR LF LF? Should they be recognized as DOS format or Unix format? Hmmm, how about accumulating how many times each possible end-of-line format appears, and select the one which first occurs 3 times? If none occurs 3 times, perhaps we should select the one occurs last. Then, CR CR LF -> DOS LF CR LF -> DOS CR LF LF -> Unix CR LF CR -> Mac --- Ken'ichi HANDA handa@etl.go.jp From Marc.Fleischeuers@kub.nl Tue Aug 5 01:11:15 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "" " 5" "August" "1997" "10:11:00" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "42" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA07028 for ; Tue, 5 Aug 1997 01:11:09 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id KAA22959; Tue, 5 Aug 1997 10:11:00 +0200 (MET DST) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050631.CAA31418@psilocin.gnu.ai.mit.edu> In-Reply-To: Richard Stallman's message of Tue, 5 Aug 1997 02:31:37 -0400 Message-ID: Lines: 42 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: Richard Stallman Cc: Marc.Fleischeuers@kub.nl, handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 05 Aug 1997 10:11:00 +0200 Richard Stallman writes: > If you write the file out and then visit it again, > you are performing two experiments in series > and you are telling only the result of the two of them. > > That isn't really useful. You need to tell us the result > of each experiment. The emacsen are on different machines, hence I can safely use the same pathnames. GNU Emacs 19.33.1 (i386-*-nt4.0) GNU Emacs 20.0.92.1 (i386-*-nt4.0) started with: started with: C:\> c:\emacs\bin\emacs.bat -nw C:\> c:\emacs\bin\emacs.bat -nw --no-site-file --no-init-file --no-site-file --no-init-file Input: Input: C-x C-f t . t RET a b c C-q RET RET C-x C-f t . t RET a b c C-q RET RET d e f C-q RET RET C-x C-s d e f C-q RET RET C-x C-s Buffer looks like: Buffer looks like: abc^M abc^M def^M def^M File contents: File contents: 6162 630d 0d0a 6465 660d 0d0a 6162 630d 0d0a 6465 660d 0d0a Input: Input: C-x C-v RET C-x C-v RET Buffer looks like: Buffer looks like: abc^M abc def^M ------- def ------- Marc From Marc.Fleischeuers@kub.nl Tue Aug 5 01:23:15 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "" " 5" "August" "1997" "10:23:02" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "14" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA07239 for ; Tue, 5 Aug 1997 01:23:14 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id KAA23774; Tue, 5 Aug 1997 10:23:02 +0200 (MET DST) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050630.CAA31410@psilocin.gnu.ai.mit.edu> <199708050810.RAA06812@etlken.etl.go.jp> In-Reply-To: Kenichi Handa's message of Tue, 5 Aug 1997 17:10:48 +0900 Message-ID: Lines: 14 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: Kenichi Handa Cc: rms@gnu.ai.mit.edu, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 05 Aug 1997 10:23:02 +0200 Kenichi Handa writes: > Yes. But how about LF CR LF or CR LF LF? Should they be recognized > as DOS format or Unix format? And what about CR CR LF LF? LF CR CR LF? Yes I'm joking. However, after spending two days chasing a bug eventually finding myself outwitted by emacs' intelligence in dos-w32.el, I tend to think that emacs should not be too smart. If the distribution of CR and LF throughout the file do not form a clear pattern, would `no conversion' be an option? Marc From rms@gnu.ai.mit.edu Tue Aug 5 01:39:26 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 5" "August" "1997" "04:38:02" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "22" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA08178 for ; Tue, 5 Aug 1997 01:39:25 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id EAA00520; Tue, 5 Aug 1997 04:38:02 -0400 Message-Id: <199708050838.EAA00520@psilocin.gnu.ai.mit.edu> In-reply-to: <199708040133.KAA04718@etlken.etl.go.jp> (message from Kenichi Handa on Mon, 4 Aug 1997 10:33:49 +0900) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Tue, 5 Aug 1997 04:38:02 -0400 I've just tried the following. 1) At first, visit a new file. 2) type `a b c C-q C-m C-q C-j' 3) save it. 4) visit it again. Then the file is read as `undecided-dos' and the buffer contents are 4-byte of: abc\C-j This means that CR LF is decoded to single LF. Are you doing this on DOS, or on Unix? If you are on Unix, this behavior is correct, because on Unix new files are normally written with no EOL conversion. But if this happened on DOS, it woudl be a bug. I think Marc told us this is not what happens for him. Rather, the file is written with DOS EOL conversion, which is correct according to the current specs. Then reading the file again mistakenly uses Mac EOL conversion. From handa@etl.go.jp Tue Aug 5 05:01:43 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 5" "August" "1997" "21:00:37" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "103" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA11469 for ; Tue, 5 Aug 1997 05:01:42 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id UAA02607; Tue, 5 Aug 1997 20:59:22 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id UAA06694; Tue, 5 Aug 1997 20:59:22 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id VAA07192; Tue, 5 Aug 1997 21:00:37 +0900 Message-Id: <199708051200.VAA07192@etlken.etl.go.jp> In-reply-to: <199708050838.EAA00520@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Tue, 5 Aug 1997 04:38:02 -0400) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050838.EAA00520@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Tue, 5 Aug 1997 21:00:37 +0900 Richard Stallman writes: > I've just tried the following. > 1) At first, visit a new file. > 2) type `a b c C-q C-m C-q C-j' > 3) save it. > 4) visit it again. > Then the file is read as `undecided-dos' and the buffer contents are > 4-byte of: > abc\C-j > This means that CR LF is decoded to single LF. > Are you doing this on DOS, or on Unix? I'm using Unix. > If you are on Unix, this behavior is correct, because on Unix new > files are normally written with no EOL conversion. > But if this happened on DOS, it woudl be a bug. Right. > I think Marc told us this is not what happens for him. > Rather, the file is written with DOS EOL conversion, > which is correct according to the current specs. > Then reading the file again mistakenly uses Mac EOL conversion. Yes, I now know it. Marc Fleischeuers writes: >> Yes. But how about LF CR LF or CR LF LF? Should they be recognized >> as DOS format or Unix format? > And what about CR CR LF LF? LF CR CR LF? > Yes I'm joking. However, after spending two days chasing a bug > eventually finding myself outwitted by emacs' intelligence in > dos-w32.el, I tend to think that emacs should not be too smart. I agree. > If the distribution of CR and LF throughout the file do not form a > clear pattern, would `no conversion' be an option? To decide the pattern is clear or not is very difficult. In addition, we had better not do exhaustive scanning throughout the file. So, I suggest the following code. This scans buffer until it encounters 3 end-of-lines. If it founds two different patterns while scanning, it decides not to decode end-of-line (by returning CODING_EOL_LF). So, in any of the following cases, it doesn't decode end-of-line. CR CR LF, LF CR LF, CR LF LF, CR CR LF LF, LF CR CR LF. I think it is clear enough, and users won't be surprised that much. What do you all think? --- Ken'ichi HANDA handa@etl.go.jp --in src/coding.c------------------------------------------------------------ /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF, CODING_EOL_CR, and CODING_EOL_UNDECIDED. */ #define MAX_EOL_CHECK_COUNT 3 int detect_eol_type (src, src_bytes) unsigned char *src; int src_bytes; { unsigned char *src_end = src + src_bytes; unsigned char c; int total = 0; /* How many end-of-lines are found so far. */ int eol_type = CODING_EOL_UNDECIDED; int this_eol_type; while (src < src_end && total < MAX_EOL_CHECK_COUNT) { c = *src++; if (c == '\n' || c == '\r') { total++; if (c == '\n') this_eol_type = CODING_EOL_LF; else if (src >= src_end || *src != '\n') this_eol_type = CODING_EOL_CR; else this_eol_type = CODING_EOL_CRLF, src++; if (eol_type == CODING_EOL_UNDECIDED) /* This is the first end-of-line. */ eol_type = this_eol_type; else if (eol_type != this_eol_type) /* The found type is different from what found before. We had better not decode end-of-line. */ return CODING_EOL_LF; } } return (total ? eol_type : CODING_EOL_UNDECIDED); } From andrewi@harlequin.co.uk Tue Aug 5 09:01:30 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 5" "August" "1997" "17:00:38" "+0100" "Andrew Innes" "andrewi@harlequin.co.uk" nil "63" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA21469 for ; Tue, 5 Aug 1997 09:01:26 -0700 Received: from propos.long.harlequin.co.uk (propos.long.harlequin.co.uk [193.128.93.50]) by holly.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP id RAA15639; Tue, 5 Aug 1997 17:01:13 +0100 (BST) Received: from woozle.long.harlequin.co.uk (woozle.long.harlequin.co.uk [193.128.93.77]) by propos.long.harlequin.co.uk (8.8.4/8.6.12) with SMTP id RAA14620; Tue, 5 Aug 1997 17:00:38 +0100 (BST) Message-Id: <199708051600.RAA14620@propos.long.harlequin.co.uk> In-reply-to: <199708051200.VAA07192@etlken.etl.go.jp> (message from Kenichi Handa on Tue, 5 Aug 1997 21:00:37 +0900) From: Andrew Innes To: handa@etl.go.jp CC: rms@gnu.ai.mit.edu, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Tue, 5 Aug 1997 17:00:38 +0100 (BST) (Replying to several messages.) Aside: I haven't had a chance to study the new coding support yet, so my comments may be based on misunderstanding or ignorance of it. On Mon, 4 Aug 1997 15:55:46 -0400, Richard Stallman said: >The real question is, should we put in a special feature to override >that behavior when the buffer contains a CR? Should DOS-style EOL >conversion recognize when the buffer contains CR LF, and output it as >CR LF (rather than CR CR LF)? I agree with Handa that Emacs should not do this. Nearly all text files will use a single end-of-line convention throughout, and thus pose no problem. I can't think of cirumstances in which a user would encounter text files containing extra CR characters like this. If such cirumstances really are rare, then having to edit in "binary" mode where all CRs are explicit seems reasonable. (BTW, does Emacs 20 distinguish between text files in CODING_EOF_LF, and binary files? I think such a distinction is useful - a binary file might contain all sorts of odd combinations of CR and LF, but a text file should normally use a single convention throughout.) >On Tue, 5 Aug 1997 21:00:37 +0900, Kenichi Handa said: >>If the distribution of CR and LF throughout the file do not form a >>clear pattern, would `no conversion' be an option? > >To decide the pattern is clear or not is very difficult. In addition, >we had better not do exhaustive scanning throughout the file. We don't want to do exhaustive scanning, but we can easily detect if the end-of-line convention we choose based on the initial scan is not used uniformly. I would want any text file which appears not to use a single convention to be handled using an information preserving convention, ie. CODING_EOL_LF (or preferrably marked as a binary file, not a text file using Unix line-endings, if that distinction is made). >So, I suggest the following code. This scans buffer until it >encounters 3 end-of-lines. If it founds two different patterns while >scanning, it decides not to decode end-of-line (by returning >CODING_EOL_LF). So, in any of the following cases, it doesn't decode >end-of-line. > CR CR LF, LF CR LF, CR LF LF, CR CR LF LF, LF CR CR LF. >I think it is clear enough, and users won't be surprised that much. > >What do you all think? I like this proposal. In addition, or perhaps instead, I would want insert-file-contents to notice if the chosen convention is not used uniformly, and either report an error or possibly reread the file using CODING_EOL_LF. (Indeed, the existing buffer contents could be patched up without rereading since it would be known that all previous lines used the original convention.) If it is not appropriate to signal an error, or revert the coding in situ, then at least insert-file-contents should indicate in some way (eg. by setting a variable to the number of non-conforming end-of-lines) so that other functions could inform the user about the discrepancy. I'm not sure whether the same sort of argument applies to subprocess output or not. AndrewI From rms@gnu.ai.mit.edu Tue Aug 5 10:30:38 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 5" "August" "1997" "13:30:33" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "7" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA28221 for ; Tue, 5 Aug 1997 10:30:37 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA05204; Tue, 5 Aug 1997 13:30:33 -0400 Message-Id: <199708051730.NAA05204@psilocin.gnu.ai.mit.edu> In-reply-to: <199708050810.RAA06812@etlken.etl.go.jp> (message from Kenichi Handa on Tue, 5 Aug 1997 17:10:48 +0900) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050630.CAA31410@psilocin.gnu.ai.mit.edu> <199708050810.RAA06812@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: Marc.Fleischeuers@kub.nl, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Tue, 5 Aug 1997 13:30:33 -0400 Yes. But how about LF CR LF or CR LF LF? Should they be recognized as DOS format or Unix format? This question is less important. Neither choice is horrible. So please fix the Mac-format problem right away and don't let it be delayed by other questions like this one. From rms@gnu.ai.mit.edu Tue Aug 5 10:34:47 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 5" "August" "1997" "13:34:45" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "18" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA28453 for ; Tue, 5 Aug 1997 10:34:46 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA05225; Tue, 5 Aug 1997 13:34:45 -0400 Message-Id: <199708051734.NAA05225@psilocin.gnu.ai.mit.edu> In-reply-to: <199708050810.RAA06812@etlken.etl.go.jp> (message from Kenichi Handa on Tue, 5 Aug 1997 17:10:48 +0900) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050630.CAA31410@psilocin.gnu.ai.mit.edu> <199708050810.RAA06812@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: Marc.Fleischeuers@kub.nl, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Tue, 5 Aug 1997 13:34:45 -0400 Hmmm, how about accumulating how many times each possible end-of-line format appears, and select the one which first occurs 3 times? If none occurs 3 times, perhaps we should select the one occurs last. Then, This does not fit the practical needs. The most important practical need is to avoid ever using mac format for a file which really should be in dos format. Therefore, the right solution is never use mac format if you can see any linefeed at all. So if all you can find is CR, never LF, then use mac format. Otherwise, if you every LF has a CR before it, use dos format. Otherwise, use Unix format. From rms@gnu.ai.mit.edu Tue Aug 5 11:17:53 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Tue" " 5" "August" "1997" "14:18:03" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199708051818.OAA05876@psilocin.gnu.ai.mit.edu>" "9" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id LAA01720 for ; Tue, 5 Aug 1997 11:17:52 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id OAA05876; Tue, 5 Aug 1997 14:18:03 -0400 Message-Id: <199708051818.OAA05876@psilocin.gnu.ai.mit.edu> In-reply-to: <199708051200.VAA07192@etlken.etl.go.jp> (message from Kenichi Handa on Tue, 5 Aug 1997 21:00:37 +0900) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050838.EAA00520@psilocin.gnu.ai.mit.edu> <199708051200.VAA07192@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Tue, 5 Aug 1997 14:18:03 -0400 So, I suggest the following code. This scans buffer until it encounters 3 end-of-lines. If it founds two different patterns while scanning, it decides not to decode end-of-line (by returning CODING_EOL_LF). So, in any of the following cases, it doesn't decode end-of-line. CR CR LF, LF CR LF, CR LF LF, CR CR LF LF, LF CR CR LF. I think it is clear enough, and users won't be surprised that much. I think this is good enough. I'll install it now. From rms@gnu.ai.mit.edu Tue Aug 5 12:53:09 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 5" "August" "1997" "15:53:20" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "11" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA09274 for ; Tue, 5 Aug 1997 12:53:09 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id PAA07564; Tue, 5 Aug 1997 15:53:20 -0400 Message-Id: <199708051953.PAA07564@psilocin.gnu.ai.mit.edu> In-reply-to: <199708051600.RAA14620@propos.long.harlequin.co.uk> (message from Andrew Innes on Tue, 5 Aug 1997 17:00:38 +0100 (BST)) References: <199708051600.RAA14620@propos.long.harlequin.co.uk> From: Richard Stallman To: andrewi@harlequin.co.uk CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Tue, 5 Aug 1997 15:53:20 -0400 (BTW, does Emacs 20 distinguish between text files in CODING_EOF_LF, and binary files? There is a distinction which perhaps you could interpret in this way: whether no-conversion is specified as the coding system. I think such a distinction is useful - a binary file might contain all sorts of odd combinations of CR and LF, but a text file should normally use a single convention throughout.) What, specifically, is it useful for? From eliz@is.elta.co.il Tue Aug 5 21:11:10 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Wed" " 6" "August" "1997" "07:10:51" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "" "18" "Re: New way to handle CRLF in Emacs 20.0" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id VAA04658 for ; Tue, 5 Aug 1997 21:11:08 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id HAA04550; Wed, 6 Aug 1997 07:10:52 +0300 X-Sender: eliz@is In-Reply-To: <199707132141.RAA25670@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: New way to handle CRLF in Emacs 20.0 Date: Wed, 6 Aug 1997 07:10:51 +0300 (IDT) On Sun, 13 Jul 1997, Richard Stallman wrote: > be nice to have options e.g. to ask the user whether the guess is > correct, or require more than a single CRLF before a decision is > made. (I didn't think about this too much, so I might be wrong.) > > I think that is not worth the trouble, given that we still have > find-file-text and find-file-binary. A typical DOS/NT user doesn't even know these functions exist. I think that `find-file' should in most of the cases do the right thing automatically, leaving the text- and binary-specific functions for the marginal cases. I will return to this issue when the more urgent problems are solved. Hopefully, by then I will also have enough experience to judge this objectively. From eliz@is.elta.co.il Tue Aug 5 21:11:56 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Wed" " 6" "August" "1997" "07:11:46" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "" "8" "Re: New way to handle CRLF in Emacs 20.0" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id VAA04692 for ; Tue, 5 Aug 1997 21:11:54 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id HAA04561; Wed, 6 Aug 1997 07:11:47 +0300 X-Sender: eliz@is In-Reply-To: <199707170631.XAA39945@joker.cs.washington.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Geoff Voelker cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: New way to handle CRLF in Emacs 20.0 Date: Wed, 6 Aug 1997 07:11:46 +0300 (IDT) On Wed, 16 Jul 1997, Geoff Voelker wrote: > I vote for removing the T:/B: from the modeline since they will be > redundant to the coding system characters. I agree. I didn't yet have time to download 20.0.92, but if this change isn't already there, I can install it. From handa@etl.go.jp Tue Aug 5 18:09:55 1997 X-VM-v5-Data: ([nil nil nil t nil nil nil nil nil] [nil "Wed" " 6" "August" "1997" "10:10:27" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "38" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA28619 for ; Tue, 5 Aug 1997 18:09:40 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id KAA20137; Wed, 6 Aug 1997 10:09:11 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA00905; Wed, 6 Aug 1997 10:09:11 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id KAA07868; Wed, 6 Aug 1997 10:10:27 +0900 Message-Id: <199708060110.KAA07868@etlken.etl.go.jp> In-reply-to: <199708051818.OAA05876@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Tue, 5 Aug 1997 14:18:03 -0400) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050838.EAA00520@psilocin.gnu.ai.mit.edu> <199708051200.VAA07192@etlken.etl.go.jp> <199708051818.OAA05876@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Wed, 6 Aug 1997 10:10:27 +0900 Richard Stallman writes: > So, I suggest the following code. This scans buffer until it > encounters 3 end-of-lines. If it founds two different patterns while > scanning, it decides not to decode end-of-line (by returning > CODING_EOL_LF). So, in any of the following cases, it doesn't decode > end-of-line. > CR CR LF, LF CR LF, CR LF LF, CR CR LF LF, LF CR CR LF. > I think it is clear enough, and users won't be surprised that much. > I think this is good enough. I'll install it now. I have just made a small change as below in FSF's code: diff -c -r1.30 coding.c *** coding.c 1997/08/05 18:19:33 1.30 --- coding.c 1997/08/06 01:06:38 *************** *** 2739,2745 **** } } ! return (total ? eol_type : CODING_EOL_UNDECIDED); } /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC --- 2739,2745 ---- } } ! return eol_type; } /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC --- Ken'ichi HANDA handa@etl.go.jp From rms@gnu.ai.mit.edu Thu Jul 31 16:42:11 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "31" "July" "1997" "19:42:35" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "10" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "7" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id QAA01460 for ; Thu, 31 Jul 1997 16:42:11 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id TAA19727; Thu, 31 Jul 1997 19:42:35 -0400 Message-Id: <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> In-reply-to: <199707312038.NAA15222@joker.cs.washington.edu> (voelker@cs.washington.edu) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> From: Richard Stallman To: voelker@cs.washington.edu CC: Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Thu, 31 Jul 1997 19:42:35 -0400 I'm not sure I understand what you are trying to do. When a file is inside of Emacs, line are always terminated by newlines. The line termination that exists when the file is in the filesystem is only placed there when the file is written out. There is no need to explicitly place CR or LF characters in a file to change the termination used. You're right--but perhaps this can be a clue to finding a place where the documentation needs to be made clearer. So it is worth figuring out why Marc got the wrong idea. From eliz@is.elta.co.il Thu Aug 14 09:31:56 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Thu" "14" "August" "1997" "19:31:42" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "" "19" "EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA24370 for ; Thu, 14 Aug 1997 09:31:55 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id TAA08084; Thu, 14 Aug 1997 19:31:43 +0300 X-Sender: eliz@is Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Geoff Voelker cc: Andrew Innes , Richard Stallman Subject: EOL conversion on MSDOS and MS-Windows Date: Thu, 14 Aug 1997 19:31:42 +0300 (IDT) Geoff, there's something that bothers me in the way Emacs 20.0.93 computes and displays the EOL conversion. There is a seeming inconsistency between the setting of coding system for reading and for writing. When Emacs reads a file that is not in the alist of known file types, it sets the coding system to undecided. If the file happens to be a Unix-style (like e.g. those in Emacs source distribution), it ends up being *-unix, and Emacs displays `:' in the modeline. However, if you then save the file, the coding system is set to undecided-dos, and Emacs adds CR characters. But the coding system displayed on the modeline does not change. It will only change if you revert the buffer or restart Emacs. I think this is confusing. When users look at the modeline, they should be able to determine how will the file be written to disk. So I think the coding system should be by default set to undecided-dos on reading the file (unless the file matches in the buffer-type-alist or is on untranslated filesystem etc.). From eliz@is.elta.co.il Mon Aug 18 08:31:44 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Mon" "18" "August" "1997" "18:31:36" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "" "35" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id IAA18273 for ; Mon, 18 Aug 1997 08:31:42 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id SAA17735; Mon, 18 Aug 1997 18:31:37 +0300 X-Sender: eliz@is In-Reply-To: <199708162215.PAA28649@joker.cs.washington.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Geoff Voelker cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Mon, 18 Aug 1997 18:31:36 +0300 (IDT) On Sat, 16 Aug 1997, Geoff Voelker wrote: > + if (CODING_REQUIRE_EOL_CONVERSION (&coding)) This doesn't compile: there is no macro named CODING_REQUIRE_EOL_CONVERSION (or thereabouts). I replaced this line with this: if (coding.eol_type == CODING_EOL_CRLF) Geoff, is this what you meant, or did you miss something from the diffs? After the above change, quick test indicates that this now works as Richard suggested. But there is another problem: `write-region' always takes the coding system from `buffer-file-type' even if I write the region to another file. To reproduce: emacs -q C-x C-f src/xfns.c C-SPC C-u 10 C-n M-x write-region RET xyzzy RET Assuming src/xfns.c is in Unix format, this creates xyzzy also in Unix format. I think this is wrong. Since xyzzy did not exist, it should have been created in DOS format. Do you agree? Richard, how would this behave on Unix if src/xfns.c was in DOS format and xyzzy didn't exist? What about the case where xyzzy already exists and Emacs is overwriting it? From rms@gnu.ai.mit.edu Mon Aug 18 21:34:38 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "19" "August" "1997" "00:35:46" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "27" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id VAA27454 for ; Mon, 18 Aug 1997 21:34:37 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id AAA21653; Tue, 19 Aug 1997 00:35:46 -0400 Message-Id: <199708190435.AAA21653@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Mon, 18 Aug 1997 18:31:36 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk cc: handa@etl.go.jp, rms@gnu.ai.mit.edu Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Tue, 19 Aug 1997 00:35:46 -0400 emacs -q C-x C-f src/xfns.c C-SPC C-u 10 C-n M-x write-region RET xyzzy RET Assuming src/xfns.c is in Unix format, this creates xyzzy also in Unix format. I think this is wrong. Since xyzzy did not exist, it should have been created in DOS format. I am not sure. Note that you can use C-x RET c to specify a different coding system for this command. Richard, how would this behave on Unix if src/xfns.c was in DOS format and xyzzy didn't exist? As far as I know, it would do exactly the same thing as on DOS. What about the case where xyzzy already exists and Emacs is overwriting it? Emacs would not notice whether the file already exists. Maybe it should, but I am not sure. From handa@etl.go.jp Mon Aug 18 22:27:15 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Tue" "19" "August" "1997" "14:28:03" "+0900" "Kenichi Handa" "handa@etl.go.jp" "<199708190528.OAA24843@etlken.etl.go.jp>" "40" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id WAA29533 for ; Mon, 18 Aug 1997 22:27:14 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id OAA09429; Tue, 19 Aug 1997 14:26:57 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id OAA21185; Tue, 19 Aug 1997 14:26:55 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id OAA24843; Tue, 19 Aug 1997 14:28:03 +0900 Message-Id: <199708190528.OAA24843@etlken.etl.go.jp> In-reply-to: <199708190435.AAA21653@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Tue, 19 Aug 1997 00:35:46 -0400) References: <199708190435.AAA21653@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk, rms@gnu.ai.mit.edu Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Tue, 19 Aug 1997 14:28:03 +0900 Richard Stallman writes: > emacs -q > C-x C-f src/xfns.c > C-SPC > C-u 10 C-n > M-x write-region RET xyzzy RET > Assuming src/xfns.c is in Unix format, this creates xyzzy also in Unix > format. > I think this is wrong. Since xyzzy did not exist, it should have been > created in DOS format. > I am not sure. I think write-region should write in the format of buffer-file-coding-system of the current buffer if no coding system is specified explicitely by C-x RET c or in file-coding-system-alist. > Note that you can use C-x RET c to specify a different coding system > for this command. > Richard, how would this behave on Unix if src/xfns.c was > in DOS format and xyzzy didn't exist? > As far as I know, it would do exactly the same thing as on DOS. In this case, "the same thing as on DOS" is "to write in the format of buffer-file-coding-system". So, xyzzy is written in DOS format. > What about the case where xyzzy already exists and Emacs is overwriting > it? > Emacs would not notice whether the file already exists. > Maybe it should, but I am not sure. I think write-region should not follow the format of already existing file, but append-to-file had better follow the format. --- Ken'ichi HANDA handa@etl.go.jp From eliz@is.elta.co.il Tue Aug 19 05:59:27 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "19" "August" "1997" "15:59:17" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "13" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id FAA12235 for ; Tue, 19 Aug 1997 05:59:25 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id PAA21411; Tue, 19 Aug 1997 15:59:18 +0300 X-Sender: eliz@is In-Reply-To: <199708190707.AAA25484@joker.cs.washington.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Geoff Voelker cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Tue, 19 Aug 1997 15:59:17 +0300 (IDT) On Mon, 18 Aug 1997, Geoff Voelker wrote: > ! if (coding.eol_type != CODING_EOL_UNDECIDED > ! && coding.eol_type != CODING_EOL_LF) > current_buffer->buffer_file_type = Qnil; > else > current_buffer->buffer_file_type = Qt; Hmm... If the EOL coding is still undecided, why should the file be marked as binary? Shouldn't it be text by default? The above code means that if I read a file which has no newlines, it will be treated as binary. Is this correct? From eliz@is.elta.co.il Tue Aug 19 06:05:56 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "19" "August" "1997" "15:56:43" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "35" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id GAA12367 for ; Tue, 19 Aug 1997 06:05:54 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id PAA21383; Tue, 19 Aug 1997 15:56:43 +0300 X-Sender: eliz@is In-Reply-To: <199708190435.AAA21653@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Tue, 19 Aug 1997 15:56:43 +0300 (IDT) On Tue, 19 Aug 1997, Richard Stallman wrote: > I think this is wrong. Since xyzzy did not exist, it should have been > created in DOS format. > > I am not sure. I'm not sure either, so let me explain why do I think it's wrong. First, there's a general rule about non-existent files: they are created with the default coding system. This is documented behavior, and users could think it applies to this case as well. The other consideration that I think goes against inheriting the coding system from the current buffer is that Emacs tailors many aspects of its operation using the *name* of the file. For example, suppose Emacs were to do something special when writing a C source file. Would we expect that action to take place when I write a region of an e-mail message to a .c file? I think we would. If you agree with that, I think we should also expect the coding system (for writing) be derived from the name of the file the region is being written to. This means that if the file's name to which the region is written is not found in the various alists which define specific coding systems, Emacs should fall back to the default coding system, which is undecided-dos on DOS_NT platforms (unless Emacs is customized). Here's another aspect of this dilemma. Suppose I visit a file that is in Unix EOL format, and then use `C-x C-w' to write the entire buffer to a file whose name *is* found in `file-name-buffer-type-alist'. Would we expect the coding system to change to reflect the change in the buffer's filename? I think we would, but Emacs doesn't behave this way now, because Emacs consults `file-name-buffer-type-alist' only when it reads files. From rms@gnu.ai.mit.edu Tue Aug 19 08:23:57 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "19" "August" "1997" "11:25:18" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "5" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id IAA17840 for ; Tue, 19 Aug 1997 08:23:57 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id LAA29210; Tue, 19 Aug 1997 11:25:18 -0400 Message-Id: <199708191525.LAA29210@psilocin.gnu.ai.mit.edu> In-reply-to: <199708190711.AAA25742@joker.cs.washington.edu> (voelker@cs.washington.edu) References: <199708190435.AAA21653@psilocin.gnu.ai.mit.edu> <199708190528.OAA24843@etlken.etl.go.jp> <199708190711.AAA25742@joker.cs.washington.edu> From: Richard Stallman To: voelker@cs.washington.edu CC: handa@etl.go.jp, eliz@is.elta.co.il, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Tue, 19 Aug 1997 11:25:18 -0400 Perhaps we should add a warning message to the user when a write-region will write a buffer in an eol format that is not in the default format of the file system being used. I added text about this. Thanks. From rms@gnu.ai.mit.edu Tue Aug 19 09:08:53 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "19" "August" "1997" "12:10:08" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "6" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA20726 for ; Tue, 19 Aug 1997 09:08:52 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA29916; Tue, 19 Aug 1997 12:10:08 -0400 Message-Id: <199708191610.MAA29916@psilocin.gnu.ai.mit.edu> In-reply-to: <199708190528.OAA24843@etlken.etl.go.jp> (message from Kenichi Handa on Tue, 19 Aug 1997 14:28:03 +0900) References: <199708190435.AAA21653@psilocin.gnu.ai.mit.edu> <199708190528.OAA24843@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Tue, 19 Aug 1997 12:10:08 -0400 I think write-region should not follow the format of already existing file, but append-to-file had better follow the format. I agree, append-to-file needs to do this. Does it already, or is this a bug that needs fixing? From eliz@is.elta.co.il Tue Aug 19 10:05:35 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "19" "August" "1997" "20:05:05" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id KAA23786 for ; Tue, 19 Aug 1997 10:05:33 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id UAA21957; Tue, 19 Aug 1997 20:05:06 +0300 X-Sender: eliz@is In-Reply-To: <199708191610.MAA29916@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Tue, 19 Aug 1997 20:05:05 +0300 (IDT) On Tue, 19 Aug 1997, Richard Stallman wrote: > I think write-region should not follow the format of already existing > file, but append-to-file had better follow the format. > > I agree, append-to-file needs to do this. > > Does it already, or is this a bug that needs fixing? No, it doesn't. It just seeks to the end of file and writes the region with the coding system determined as usual. I tried to append a portion of a Unix-style file to a DOS-style file and got the appended part in Unix format. Unfortunately, I don't have time to fix this right now. I will fix it by tomorrow, if nobody will before that. From rms@gnu.ai.mit.edu Tue Aug 19 11:41:24 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "19" "August" "1997" "14:42:28" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "3" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id LAA00204 for ; Tue, 19 Aug 1997 11:41:23 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id OAA32279; Tue, 19 Aug 1997 14:42:28 -0400 Message-Id: <199708191842.OAA32279@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Tue, 19 Aug 1997 15:56:43 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Tue, 19 Aug 1997 14:42:28 -0400 It seems to me that this is one of those cases the concept of what is "really right" is so complex, that it may be better to do something simple and not try to do what is "really right". From rms@gnu.ai.mit.edu Tue Aug 19 11:44:14 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "19" "August" "1997" "14:45:24" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "13" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id LAA00404 for ; Tue, 19 Aug 1997 11:44:14 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id OAA32310; Tue, 19 Aug 1997 14:45:24 -0400 Message-Id: <199708191845.OAA32310@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Tue, 19 Aug 1997 15:59:17 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, rms@gnu.ai.mit.edu, handa@etl.go.jp Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Tue, 19 Aug 1997 14:45:24 -0400 Hmm... If the EOL coding is still undecided, why should the file be marked as binary? Shouldn't it be text by default? The above code means that if I read a file which has no newlines, it will be treated as binary. Is this correct? If this question concerns ONLY the case of a file with no newlines, then I agree with you, that should be considered a text file by default. But there is a related question: what will happen with the actual writing of the file? If buffer-file-coding-system is undecided as regards the eol conversion, what will happen if the user inserts some newlines and then saves the file? What eol convention will be used for saving the file? From eliz@is.elta.co.il Wed Aug 20 07:26:55 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "20" "August" "1997" "17:25:19" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "9" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA22381 for ; Wed, 20 Aug 1997 07:26:53 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id RAA24053; Wed, 20 Aug 1997 17:25:20 +0300 X-Sender: eliz@is In-Reply-To: <199708191842.OAA32279@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Wed, 20 Aug 1997 17:25:19 +0300 (IDT) On Tue, 19 Aug 1997, Richard Stallman wrote: > It seems to me that this is one of those cases the concept of what is > "really right" is so complex, that it may be better to do something > simple and not try to do what is "really right". I agree, but computing EOL conversion for writing from the filename when it's not the default buffer file name doesn't seem too complex. From eliz@is.elta.co.il Wed Aug 20 07:37:51 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "20" "August" "1997" "17:37:14" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "33" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA22809 for ; Wed, 20 Aug 1997 07:37:49 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id RAA24061; Wed, 20 Aug 1997 17:37:15 +0300 X-Sender: eliz@is In-Reply-To: <199708191845.OAA32310@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Wed, 20 Aug 1997 17:37:14 +0300 (IDT) On Tue, 19 Aug 1997, Richard Stallman wrote: > But there is a related question: what will happen with the actual > writing of the file? If buffer-file-coding-system is undecided > as regards the eol conversion, what will happen if the user > inserts some newlines and then saves the file? What eol convention > will be used for saving the file? I think Emacs will use the EOL convention that is determined by buffer-file-type. Here's the last fragment from find-buffer-file-type-coding-system (on lisp/dos-w32.el) in its current incarnation: ((eq op 'write-region) (if buffer-file-coding-system (cons buffer-file-coding-system buffer-file-coding-system) (if buffer-file-type '(no-conversion . no-conversion) '(undecided-dos . undecided-dos))))))) So if the buffer type is text (nil), Emacs will add CR characters, if it's binary (t), it won't. Personally, I tend to make it text file, so it will be written in DOS text format. But I'm afraid that this tendency is a left-over from the previous Emacs behavior whereby it would rewrite LF-only files with CRLF EOLs. And we have changed that behavior. So now I'm not sure whether my gut feelings are correct. Geoff, what do you think? From rms@gnu.ai.mit.edu Wed Aug 20 09:39:43 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "20" "August" "1997" "12:40:57" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "4" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA28870 for ; Wed, 20 Aug 1997 09:39:42 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA11228; Wed, 20 Aug 1997 12:40:57 -0400 Message-Id: <199708201640.MAA11228@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Wed, 20 Aug 1997 17:25:19 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Wed, 20 Aug 1997 12:40:57 -0400 I agree, but computing EOL conversion for writing from the filename when it's not the default buffer file name doesn't seem too complex. I am not sure that is always right either. From eliz@is.elta.co.il Wed Aug 20 09:54:59 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "20" "August" "1997" "19:54:14" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "15" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA29976 for ; Wed, 20 Aug 1997 09:54:55 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id TAA24420; Wed, 20 Aug 1997 19:54:15 +0300 X-Sender: eliz@is In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman , handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Wed, 20 Aug 1997 19:54:14 +0300 (IDT) On Tue, 19 Aug 1997, Eli Zaretskii wrote: On Tue, 19 Aug 1997, Richard Stallman wrote: > I think write-region should not follow the format of already existing > file, but append-to-file had better follow the format. > > I agree, append-to-file needs to do this. When you say that appending to a file should follow the format, do you mean only the EOL encoding, or the entire coding system? It seems to me that if the EOLs are taken from the file to which the region is appended, the rest of the coding system should be also, no? From rms@gnu.ai.mit.edu Wed Aug 20 10:21:50 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "20" "August" "1997" "13:20:31" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "23" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA02218 for ; Wed, 20 Aug 1997 10:21:49 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA11905; Wed, 20 Aug 1997 13:20:31 -0400 Message-Id: <199708201720.NAA11905@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Wed, 20 Aug 1997 17:37:14 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp, rms@gnu.ai.mit.edu Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Wed, 20 Aug 1997 13:20:31 -0400 I think Emacs will use the EOL convention that is determined by buffer-file-type. Looking at the code, I think that is true only if buffer-file-coding-system is nil: ((eq op 'write-region) (if buffer-file-coding-system (cons buffer-file-coding-system buffer-file-coding-system) (if buffer-file-type '(no-conversion . no-conversion) '(undecided-dos . undecided-dos))))))) If buffer-file-coding-system is non-nil, that overrides buffer-file-type. And buffer-file-coding-system should always be non-nil, if you have visited an existing file. When you visit a file that contains no newlines, buffer-file-coding-system gets set to undecided. It will still be undecided when you save the buffer. So the question is, what does saving the buffer do when buffer-file-coding-system is undecided? Handa, can you tell us? From eliz@is.elta.co.il Wed Aug 20 11:37:01 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "20" "August" "1997" "21:36:12" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id LAA08273 for ; Wed, 20 Aug 1997 11:36:59 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id VAA24776; Wed, 20 Aug 1997 21:36:13 +0300 X-Sender: eliz@is In-Reply-To: <199708201720.NAA11905@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Wed, 20 Aug 1997 21:36:12 +0300 (IDT) On Wed, 20 Aug 1997, Richard Stallman wrote: > When you visit a file that contains no newlines, > buffer-file-coding-system gets set to undecided. It will still be > undecided when you save the buffer. Hmm... for some reason when I read a file without newlines, the coding system gets set to undecided-dos, although that function on dos-w32.el indeed sets it to undecided. I will have to debug this. > So the question is, what does > saving the buffer do when buffer-file-coding-system is undecided? > Handa, can you tell us? I have set the coding system manually to undecided, added a newline and saved it. It got a Unix-style linefeed (as I'd expect, since that is the default when EOL type is not set). From rms@gnu.ai.mit.edu Wed Aug 20 16:33:38 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "20" "August" "1997" "19:35:08" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "7" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id QAA26599 for ; Wed, 20 Aug 1997 16:33:37 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id TAA16877; Wed, 20 Aug 1997 19:35:08 -0400 Message-Id: <199708202335.TAA16877@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Wed, 20 Aug 1997 19:54:14 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Wed, 20 Aug 1997 19:35:08 -0400 When you say that appending to a file should follow the format, do you mean only the EOL encoding, or the entire coding system? It seems to me that if the EOLs are taken from the file to which the region is appended, the rest of the coding system should be also, no? I think so too. From handa@etl.go.jp Wed Aug 20 17:38:10 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "21" "August" "1997" "09:39:02" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "12" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA00315 for ; Wed, 20 Aug 1997 17:38:09 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id JAA03179; Thu, 21 Aug 1997 09:37:50 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id JAA05469; Thu, 21 Aug 1997 09:37:49 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id JAA27152; Thu, 21 Aug 1997 09:39:02 +0900 Message-Id: <199708210039.JAA27152@etlken.etl.go.jp> In-reply-to: <199708202335.TAA16877@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Wed, 20 Aug 1997 19:35:08 -0400) References: <199708202335.TAA16877@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Thu, 21 Aug 1997 09:39:02 +0900 Richard Stallman writes: > When you say that appending to a file should follow the format, do you > mean only the EOL encoding, or the entire coding system? > It seems to me that if the EOLs are taken from the file to which the > region is appended, the rest of the coding system should be also, no? > I think so too. I agree too. --- Ken'ichi HANDA handa@etl.go.jp From rms@gnu.ai.mit.edu Wed Aug 20 17:42:47 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "20" "August" "1997" "20:44:10" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "16" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA00529 for ; Wed, 20 Aug 1997 17:42:47 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id UAA18067; Wed, 20 Aug 1997 20:44:10 -0400 Message-Id: <199708210044.UAA18067@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Wed, 20 Aug 1997 21:36:12 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Wed, 20 Aug 1997 20:44:10 -0400 Hmm... for some reason when I read a file without newlines, the coding system gets set to undecided-dos, That might be a good default on DOS. Or perhaps, on DOS, if the buffer-file-coding-system is still undecided when you first save the file, save it using undecided-dos instead. More precisely, if the eol conversion is still undecided when saving the file, on DOS, then save it using the DOS eol conversion. (I am assuming that this will have no effect on files whose names or file systems are recognized as determining which eol convention to use; I'm assuming that in those cases buffer-file-coding-system will specify the eol convention precisely. If that's not true, my proposal won't be good.) From handa@etl.go.jp Wed Aug 20 17:45:41 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "21" "August" "1997" "09:46:23" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "11" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA00618 for ; Wed, 20 Aug 1997 17:45:40 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id JAA03434; Thu, 21 Aug 1997 09:45:12 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id JAA05730; Thu, 21 Aug 1997 09:45:11 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id JAA27166; Thu, 21 Aug 1997 09:46:23 +0900 Message-Id: <199708210046.JAA27166@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Wed, 20 Aug 1997 21:36:12 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Thu, 21 Aug 1997 09:46:23 +0900 Eli Zaretskii writes: > I have set the coding system manually to undecided, added a newline and > saved it. It got a Unix-style linefeed (as I'd expect, since that is the > default when EOL type is not set). How about setting default-buffer-file-coding-system to 'undecided-dos on DOS? --- Ken'ichi HANDA handa@etl.go.jp From rms@gnu.ai.mit.edu Wed Aug 20 22:56:34 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "21" "August" "1997" "01:54:49" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "16" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id WAA11507 for ; Wed, 20 Aug 1997 22:56:33 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id BAA21551; Thu, 21 Aug 1997 01:54:49 -0400 Message-Id: <199708210554.BAA21551@psilocin.gnu.ai.mit.edu> In-reply-to: <199708210046.JAA27166@etlken.etl.go.jp> (message from Kenichi Handa on Thu, 21 Aug 1997 09:46:23 +0900) References: <199708210046.JAA27166@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Thu, 21 Aug 1997 01:54:49 -0400 How about setting default-buffer-file-coding-system to 'undecided-dos on DOS? That might be right, but I am not sure. What effect would that have, in the various cases? For example, what effect would this have when you visit a file with no line separators in them? What would happen if you add some newlines and save the file? What effect would this have when you visit a file that uses the Unix EOL convention? What effect would this have when you create a buffer with C-x b, and then save it in a file? From handa@etl.go.jp Thu Aug 21 03:59:57 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "21" "August" "1997" "20:00:51" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "39" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id DAA19186 for ; Thu, 21 Aug 1997 03:59:56 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id TAA06719; Thu, 21 Aug 1997 19:59:40 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id TAA09321; Thu, 21 Aug 1997 19:59:39 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id UAA27712; Thu, 21 Aug 1997 20:00:51 +0900 Message-Id: <199708211100.UAA27712@etlken.etl.go.jp> In-reply-to: <199708210554.BAA21551@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Thu, 21 Aug 1997 01:54:49 -0400) References: <199708210046.JAA27166@etlken.etl.go.jp> <199708210554.BAA21551@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: eliz@is.elta.co.il, rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Thu, 21 Aug 1997 20:00:51 +0900 Richard Stallman writes: > How about setting default-buffer-file-coding-system to 'undecided-dos > on DOS? > That might be right, but I am not sure. > What effect would that have, in the various cases? > For example, what effect would this have when you visit a file > with no line separators in them? buffer-file-coding-system of the new buffer is set to 'undecided-dos if the file contains only ASCII. If the file contains Japanese text encoded in iso-2022-7bit, buffer-file-coding-system is set to iso-2022-7bit-dos. Thus, > What would happen if you add some newlines and save the file? the file is saved by DOS EOL convention. > What effect would this have when you visit a file > that uses the Unix EOL convention? Unix EOL convention is detected correctly, and buffer-file-coding-system of the new buffer is set to XXXX-unix. > What effect would this have when you create a buffer > with C-x b, and then save it in a file? buffer-file-coding-system of the new buffer is still nil, but since default-buffer-file-coding-system is undecided-dos, it is saved by DOS EOL convention. I think all these behaviours are appropriate. --- Ken'ichi HANDA handa@etl.go.jp From rms@gnu.ai.mit.edu Thu Aug 21 14:47:03 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Thu" "21" "August" "1997" "17:48:24" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199708212148.RAA30605@psilocin.gnu.ai.mit.edu>" "4" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA23496 for ; Thu, 21 Aug 1997 14:47:02 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA30605; Thu, 21 Aug 1997 17:48:24 -0400 Message-Id: <199708212148.RAA30605@psilocin.gnu.ai.mit.edu> In-reply-to: <199708211100.UAA27712@etlken.etl.go.jp> (message from Kenichi Handa on Thu, 21 Aug 1997 20:00:51 +0900) References: <199708210046.JAA27166@etlken.etl.go.jp> <199708210554.BAA21551@psilocin.gnu.ai.mit.edu> <199708211100.UAA27712@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Thu, 21 Aug 1997 17:48:24 -0400 > How about setting default-buffer-file-coding-system to 'undecided-dos > on DOS? Ok, this sounds like a good idea. Could someone please make the change? From eliz@is.elta.co.il Sun Aug 24 22:08:33 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "08:07:49" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "9" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id WAA12224 for ; Sun, 24 Aug 1997 22:08:27 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id IAA01344; Mon, 25 Aug 1997 08:07:50 +0300 X-Sender: eliz@is In-Reply-To: <199708212148.RAA30605@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Mon, 25 Aug 1997 08:07:49 +0300 (IDT) On Thu, 21 Aug 1997, Richard Stallman wrote: > > How about setting default-buffer-file-coding-system to 'undecided-dos > > on DOS? > > Ok, this sounds like a good idea. Could someone please make the change? No need to do anything, it is already set this way. See lisp/dos-w32.el. From eliz@is.elta.co.il Sun Aug 24 22:24:58 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "08:24:22" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "26" "Coding system issues (1)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id WAA12770 for ; Sun, 24 Aug 1997 22:24:55 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id IAA01772; Mon, 25 Aug 1997 08:24:22 +0300 X-Sender: eliz@is Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: Geoff Voelker , Andrew Innes , Kenichi Handa Subject: Coding system issues (1) Date: Mon, 25 Aug 1997 08:24:22 +0300 (IDT) It seems that (setq-default enable-multibyte-characters nil) also disables part of the DOS EOL conversions. Specifically, if you create a new buffer, type text there, then save the buffer, you get Unix-style linefeeds at EOL, although the modeline quite deceptively says "\". E.g., try this: emacs -q M-: (setq-default enable-multibyte-characters nil) RET C-x b my-own-buffer RET Now type a few lines of text, then press C-x C-s foobar RET. Exit or suspend Emacs and look at the file foobar; you will see a Unix-style file. Is this so by design? Disabling EOL conversion when multibyte characters aren't supported might make sense on Unix (since it returns to the pre-20 behavior), but not on DOS_NT, I think. If you agree, then when multibyte characters support is disabled, Emacs on DOS_NT needs either to bind coding-system-for-read/write or call find-operation-coding-system (and then the latter should test the value of enable-multibyte-characters to return emacs-mule-dos/unix when it's nil). I prefer the latter solution. From eliz@is.elta.co.il Sun Aug 24 22:28:45 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "08:28:23" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "33" "Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id WAA12878 for ; Sun, 24 Aug 1997 22:28:43 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id IAA01802; Mon, 25 Aug 1997 08:28:24 +0300 X-Sender: eliz@is Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: Geoff Voelker , Andrew Innes , Kenichi Handa Subject: Coding system issues (2) Date: Mon, 25 Aug 1997 08:28:23 +0300 (IDT) I wonder whether insert-file-contents needs to inherit the coding system from the buffer, if it is set already (not undecided)? Right now, the coding system is computed afresh every time, even if REPLACE is non-nil, or if we are inserting into a buffer which already has some text in it. (I'm not talking merely about DOS_NT EOL conversion here.) One case where this subtlety might bite you is when you byte-compile a .el file. The byte compiler erases the buffer and re-reads the file before it begins the compilation (why, btw?), so even if you had set the coding system before that, you need to set it again with C-x RET c before compiling. If you forget, you might get subtle bugs when running the .elc file, because the strings get written into it in converted form. I had this problem with lisp/term/internal.el which leads Emacs to believe it's encoded in sjis. Even if I set the coding to emacs-mule when I visit the file, Emacs will use sjis when it re-reads the file before compiling it. The converted strings were used to set case-conversion tables, so the effect of this was that Fdowncase mysteriously stopped working for some characters: a particularly nasty and hard-to-debug problem. Do you agree that inserting a file into a buffer that already has a decided coding system should use the same coding system? If not, what about the case with byte-compiling? Should coding system be bound to that of the buffer during the compilation? I think users should not be requested to know whether a certain command calls insert-file-contents or not. When I set the coding system for a buffer, I'd expect that all the operations thereafter will use that coding system. Won't you? From eliz@is.elta.co.il Sun Aug 24 22:33:11 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "08:30:10" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "55" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id WAA13012 for ; Sun, 24 Aug 1997 22:33:10 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id IAA01814; Mon, 25 Aug 1997 08:30:11 +0300 X-Sender: eliz@is In-Reply-To: <199708202335.TAA16877@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Mon, 25 Aug 1997 08:30:10 +0300 (IDT) On Wed, 20 Aug 1997, Richard Stallman wrote: > It seems to me that if the EOLs are taken from the file to which the > region is appended, the rest of the coding system should be also, no? > > I think so too. To make append-to-file use the coding system of that file, I need to decide where to put the test for this. The relevant fragment from fileio.c is attached below for your reference. I think coding-system-for-write should take precedence over the file to which we are appending, otherwise there would be no way for the caller to force a specific coding system for this operation. When enable-multibyte-characters is nil, we shouldn't look at the coding system of the file either, even if buffer-file-coding-system is local. Do you agree? If so, it seems to me that testing for the file's coding system before the else clause and falling back to Ffind_operation_coding_system if the file leaves the coding undecided, is the correct way. ----------- from fileio.c ------------------------------------------ /* Decide the coding-system to be encoded to. */ { Lisp_Object val; if (auto_saving) val = Qnil; else if (!NILP (Vcoding_system_for_write)) val = Vcoding_system_for_write; else if (NILP (current_buffer->enable_multibyte_characters)) val = (NILP (Flocal_variable_p (Qbuffer_file_coding_system, Qnil)) ? Qnil : Fsymbol_value (Qbuffer_file_coding_system)); else { Lisp_Object args[7], coding_systems; args[0] = Qwrite_region, args[1] = start, args[2] = end, args[3] = filename, args[4] = append, args[5] = visit, args[6] = lockname; coding_systems = Ffind_operation_coding_system (7, args); val = (CONSP (coding_systems) && !NILP (XCONS (coding_systems)->cdr) ? XCONS (coding_systems)->cdr : current_buffer->buffer_file_coding_system); } setup_coding_system (Fcheck_coding_system (val), &coding); if (!STRINGP (start) && !NILP (current_buffer->selective_display)) coding.selective = 1; } From eliz@is.elta.co.il Sun Aug 24 22:37:01 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "08:36:03" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "22" "Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id WAA13129 for ; Sun, 24 Aug 1997 22:36:59 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id IAA01868; Mon, 25 Aug 1997 08:36:04 +0300 X-Sender: eliz@is Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: Geoff Voelker , Andrew Innes , Kenichi Handa Subject: Coding system issues (3) Date: Mon, 25 Aug 1997 08:36:03 +0300 (IDT) There's something in autodetection of a file's coding system which I find deeply disturbing: it gets in my way when I edit e.g. C sources with strings that include ASCII characters with the high bit set. For example, try to load src/msdos.c or lisp/term/internal.el. You will get no-conversion in the first case (which means CRLFs won't be converted if that file is in DOS format) and in the second you get sjis. Which is dead wrong in both cases: these are just tables of ASCII characters with codes beyond 127. Now, I understand that Emacs cannot possibly know what did I mean when I put such strings into the file. These strings might as well be text in some language other than English, right? But what annoys me is that I need to set the coding system explicitly each time I visit these files to see them as God intended. Am I missing some function or variable? If not, then do I have any other way except local file variables to tell Emacs these are ASCII files? With major modes, we can specify the mode on the first nonblank line if we don't like Emacs' choice, but there seems to be no such feature for coding systems. From handa@etl.go.jp Sun Aug 24 23:00:50 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "15:01:28" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "39" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA13966 for ; Sun, 24 Aug 1997 23:00:49 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id PAA14641; Mon, 25 Aug 1997 15:00:23 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id PAA09544; Mon, 25 Aug 1997 15:00:20 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id PAA01874; Mon, 25 Aug 1997 15:01:28 +0900 Message-Id: <199708250601.PAA01874@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Mon, 25 Aug 1997 08:36:03 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Mon, 25 Aug 1997 15:01:28 +0900 Eli Zaretskii writes: > Now, I understand that Emacs cannot possibly know what did I mean when > I put such strings into the file. These strings might as well be text > in some language other than English, right? But what annoys me is that > I need to set the coding system explicitly each time I visit these files > to see them as God intended. > Am I missing some function or variable? > If not, then do I have any other way except local file variables to > tell Emacs these are ASCII files? With major modes, we can specify > the mode on the first nonblank line if we don't like Emacs' choice, > but there seems to be no such feature for coding systems. In the latest pretest, Richard Stallman writes: > I have made a new pretest, which is tarring up now. > It will soon be in gnu/emacs/{emacs.xtar.gz,leim.xtar.gz} > on alpha.gnu.ai.mit.edu. > It has an important new feature: > You can specify the coding system for a file using the -*- > construct. Include `coding: CODINGSYSTEM;' inside the -*-...-*- > to specify use of coding system CODINGSYSTEM. So, we can use this feature for src/msdos.c and lisp/term/internal.el. But, since I followed the way of handling `mode' tag, the `coding' tag should also be at the first line of a file, which requires making the first line of internal.el very long. Richard, shouldn't we loosen this restriction at least for `coding' tag? How about consulting at least the first three lines? By the way, src/msdos.c has Unix-like EOL now, doesn't it? --- Ken'ichi HANDA handa@etl.go.jp From handa@etl.go.jp Sun Aug 24 23:35:15 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "15:36:07" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "47" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA15294 for ; Sun, 24 Aug 1997 23:35:13 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id PAA16683; Mon, 25 Aug 1997 15:34:57 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id PAA11933; Mon, 25 Aug 1997 15:34:56 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id PAA01967; Mon, 25 Aug 1997 15:36:07 +0900 Message-Id: <199708250636.PAA01967@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Mon, 25 Aug 1997 08:28:23 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (2) Date: Mon, 25 Aug 1997 15:36:07 +0900 Eli Zaretskii writes: > I wonder whether insert-file-contents needs to inherit the coding > system from the buffer, if it is set already (not undecided)? Right > now, the coding system is computed afresh every time, even if REPLACE > is non-nil, or if we are inserting into a buffer which already has > some text in it. (I'm not talking merely about DOS_NT EOL conversion > here.) I don't agree. The coding system of a file being read should be decided only by the file contents unless the coding system is specified explicitly. > One case where this subtlety might bite you is when you byte-compile a > .el file. The byte compiler erases the buffer and re-reads the file > before it begins the compilation (why, btw?), so even if you had set > the coding system before that, you need to set it again with C-x RET c > before compiling. If you forget, you might get subtle bugs when > running the .elc file, because the strings get written into it in > converted form. > I had this problem with lisp/term/internal.el which leads Emacs to > believe it's encoded in sjis. Even if I set the coding to emacs-mule > when I visit the file, Emacs will use sjis when it re-reads the file > before compiling it. The converted strings were used to set > case-conversion tables, so the effect of this was that Fdowncase > mysteriously stopped working for some characters: a particularly nasty > and hard-to-debug problem. This can be avoided by putting `coding' tag at the head of a file as I wrote before. By the way, I don't think it is a good idea to have random binary codes in a source file. For instance, in the case of internal.el (I have just notived the existence of this file), we can use backslash notation (e.g. "\207") instead of putting row binary code in string to keep the information of cases. With the current file, if a user of Japanese or Chinese version of Windows sees the file with their own editor (not emacs), they surely break the file contents. And, I think the file name "internal.el" is not appropriate. Something like "codepage.el" is better. What do you think? --- Ken'ichi HANDA handa@etl.go.jp From rms@gnu.ai.mit.edu Mon Aug 25 00:08:41 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "02:42:12" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "3" "New pretest" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id AAA16339 for ; Mon, 25 Aug 1997 00:08:40 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id CAA12978; Mon, 25 Aug 1997 02:42:12 -0400 Message-Id: <199708250642.CAA12978@psilocin.gnu.ai.mit.edu> Sent-via-bcc-to: Emacs pretesters From: Richard Stallman To: rms@gnu.ai.mit.edu Subject: New pretest Date: Mon, 25 Aug 1997 02:42:12 -0400 There is a new pretest in the usual place: gnu/emacs/emacs.xtar.gz and gnu/emacs/leim.xtar.gz on alpha.gnu.ai.mit.edu. From handa@etl.go.jp Mon Aug 25 01:38:49 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "17:39:23" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "18" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA20012 for ; Mon, 25 Aug 1997 01:38:47 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id RAA23762; Mon, 25 Aug 1997 17:38:14 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA18640; Mon, 25 Aug 1997 17:38:12 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id RAA02071; Mon, 25 Aug 1997 17:39:23 +0900 Message-Id: <199708250839.RAA02071@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Mon, 25 Aug 1997 08:30:10 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Mon, 25 Aug 1997 17:39:23 +0900 Eli Zaretskii writes: > I think coding-system-for-write should take precedence over the file > to which we are appending, otherwise there would be no way for the > caller to force a specific coding system for this operation. I agree. > When enable-multibyte-characters is nil, we shouldn't look at the > coding system of the file either, even if buffer-file-coding-system > is local. Do you agree? I'm not sure. Even if enable-multibyte-characters is nil, at least EOL format is detected by insert-file-contents. So, append-to-file had better detect at least EOL format. What do you think? --- Ken'ichi HANDA handa@etl.go.jp From eliz@is.elta.co.il Mon Aug 25 02:04:55 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "12:04:22" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "14" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id CAA21974 for ; Mon, 25 Aug 1997 02:04:53 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id MAA03044; Mon, 25 Aug 1997 12:04:23 +0300 X-Sender: eliz@is In-Reply-To: <199708250839.RAA02071@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Mon, 25 Aug 1997 12:04:22 +0300 (IDT) On Mon, 25 Aug 1997, Kenichi Handa wrote: > > When enable-multibyte-characters is nil, we shouldn't look at the > > coding system of the file either, even if buffer-file-coding-system > > is local. Do you agree? > > I'm not sure. Even if enable-multibyte-characters is nil, at least > EOL format is detected by insert-file-contents. So, append-to-file > had better detect at least EOL format. What do you think? Maybe I just don't understand well enough why that test for buffer-file-coding-system being a local variable is at all required? Can you explain it to me? From eliz@is.elta.co.il Mon Aug 25 03:10:41 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "13:10:16" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "19" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id DAA24018 for ; Mon, 25 Aug 1997 03:10:39 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id NAA03366; Mon, 25 Aug 1997 13:10:17 +0300 X-Sender: eliz@is In-Reply-To: <199708250601.PAA01874@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Mon, 25 Aug 1997 13:10:16 +0300 (IDT) On Mon, 25 Aug 1997, Kenichi Handa wrote: > > construct. Include `coding: CODINGSYSTEM;' inside the -*-...-*- > > to specify use of coding system CODINGSYSTEM. > > So, we can use this feature for src/msdos.c and lisp/term/internal.el. I've seen Richard's announcement after I wrote the message. I will add "codong: " settings to those two files. > By the way, src/msdos.c has Unix-like EOL now, doesn't it? Yes. So if you stay with the same version of Emacs, you won't see any problem. But I also loaded msdos.c edited by a previous version of Emacs, which always added CRs, and then I saw all those ^M characters. Besides, even with Unix EOLs, msdos.c causes Emacs to put "=" on the modeline, which means binary file. This is not quite right. From eliz@is.elta.co.il Mon Aug 25 03:22:20 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "13:21:54" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "36" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id DAA24251 for ; Mon, 25 Aug 1997 03:22:18 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id NAA03393; Mon, 25 Aug 1997 13:21:55 +0300 X-Sender: eliz@is In-Reply-To: <199708250636.PAA01967@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (2) Date: Mon, 25 Aug 1997 13:21:54 +0300 (IDT) On Mon, 25 Aug 1997, Kenichi Handa wrote: > > I had this problem with lisp/term/internal.el which leads Emacs to > > believe it's encoded in sjis. Even if I set the coding to emacs-mule > > when I visit the file, Emacs will use sjis when it re-reads the file > > before compiling it. The converted strings were used to set > > case-conversion tables, so the effect of this was that Fdowncase > > mysteriously stopped working for some characters: a particularly nasty > > and hard-to-debug problem. > > This can be avoided by putting `coding' tag at the head of a file as I > wrote before. No, it's not good enough. Users can override the coding tag with C-x c RET when they loaded the file. It is IMHO not nice to request that they use C-x c RET again before invoking the byte compiler. > By the way, I don't think it is a good idea to have random binary > codes in a source file. For instance, in the case of internal.el (I > have just notived the existence of this file), we can use backslash > notation (e.g. "\207") instead of putting row binary code in string to > keep the information of cases. Sure, but this is much harder for the programmer ;-). > And, I think the file name "internal.el" is not appropriate. > Something like "codepage.el" is better. What do you think? The truth is, those case tables should be nuked. As soon as I learn enough about international languages support in Emacs, I will change that part to set a specific language environment according to the DOS codepage. Other than that, internal.el does perform terminal-specific stuff, like key remapping etc. From handa@etl.go.jp Mon Aug 25 03:34:17 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "19:34:59" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "26" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id DAA24541 for ; Mon, 25 Aug 1997 03:34:16 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id TAA00163; Mon, 25 Aug 1997 19:33:51 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id TAA24346; Mon, 25 Aug 1997 19:33:50 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id TAA02242; Mon, 25 Aug 1997 19:34:59 +0900 Message-Id: <199708251034.TAA02242@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Mon, 25 Aug 1997 12:04:22 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Mon, 25 Aug 1997 19:34:59 +0900 Eli Zaretskii writes: >> I'm not sure. Even if enable-multibyte-characters is nil, at least >> EOL format is detected by insert-file-contents. So, append-to-file >> had better detect at least EOL format. What do you think? > Maybe I just don't understand well enough why that test for > buffer-file-coding-system being a local variable is at all > required? Can you explain it to me? buffer-file-coding-system being set locally means that the file was read with some kind of code conversion regardless of the current value of enable-multibyte-characters. There are two cases which cause this situation. 1) The file was read before enable-multibyte-characters is set to nil. 2) EOL format of the file was not that of Unix files. In both cases, I thought it was safer to encode the file with the same coding system used for decoding. Does this explanation help you? --- Ken'ichi HANDA handa@etl.go.jp From handa@etl.go.jp Mon Aug 25 03:53:45 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "19:54:42" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "28" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id DAA24934 for ; Mon, 25 Aug 1997 03:53:45 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id TAA01114; Mon, 25 Aug 1997 19:53:32 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id TAA25158; Mon, 25 Aug 1997 19:53:31 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id TAA02279; Mon, 25 Aug 1997 19:54:42 +0900 Message-Id: <199708251054.TAA02279@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Mon, 25 Aug 1997 13:10:16 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Mon, 25 Aug 1997 19:54:42 +0900 Eli Zaretskii writes: > I've seen Richard's announcement after I wrote the message. I will add > "codong: " settings to those two files. I recommend to use backslash notation in those files. Then, there's no nead of `coding:' tags. >> By the way, src/msdos.c has Unix-like EOL now, doesn't it? > Yes. So if you stay with the same version of Emacs, you won't see any > problem. But I also loaded msdos.c edited by a previous version of > Emacs, which always added CRs, and then I saw all those ^M characters. Yah! Hmmm. If a file contains random 8-bit code which doesn't fit the coding system emacs-mule, it is detected as binary file. This is a difficult problem. How can we distinguish such a file from a truely binnary file which doesn't require any EOL conversion? > Besides, even with Unix EOLs, msdos.c causes Emacs to put "=" on the > modeline, which means binary file. This is not quite right. Why? The file doesn't need EOL conversion. In addition, the file contains random 8bit codes. So, it should be read/written without any code conversion. --- Ken'ichi HANDA handa@etl.go.jp From eliz@is.elta.co.il Mon Aug 25 03:58:16 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "13:57:54" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "14" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id DAA25028 for ; Mon, 25 Aug 1997 03:58:14 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id NAA03490; Mon, 25 Aug 1997 13:57:55 +0300 X-Sender: eliz@is In-Reply-To: <199708251034.TAA02242@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Mon, 25 Aug 1997 13:57:54 +0300 (IDT) On Mon, 25 Aug 1997, Kenichi Handa wrote: > 1) The file was read before enable-multibyte-characters is set to nil. > > 2) EOL format of the file was not that of Unix files. > > In both cases, I thought it was safer to encode the file with the same > coding system used for decoding. > > Does this explanation help you? Yes, thanks. I will make the patches when I build the next pretest and send them to you all for reviewing. From eliz@is.elta.co.il Mon Aug 25 04:07:44 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "14:07:13" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "24" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id EAA25222 for ; Mon, 25 Aug 1997 04:07:42 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id OAA03537; Mon, 25 Aug 1997 14:07:14 +0300 X-Sender: eliz@is In-Reply-To: <199708251054.TAA02279@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Mon, 25 Aug 1997 14:07:13 +0300 (IDT) On Mon, 25 Aug 1997, Kenichi Handa wrote: > Yah! Hmmm. If a file contains random 8-bit code which doesn't fit > the coding system emacs-mule, it is detected as binary file. This is > a difficult problem. How can we distinguish such a file from a truely > binnary file which doesn't require any EOL conversion? This has been my concern since I first looked at src/coding.c. The `coding' tag will have to be the stopgap for now. Another solution is to use `find-file-text'. Perhaps some heuristic could be added in future based on the relative frequency of CRLF pairs and the binary characters. > > Besides, even with Unix EOLs, msdos.c causes Emacs to put "=" on the > > modeline, which means binary file. This is not quite right. > > Why? The file doesn't need EOL conversion. In addition, the file > contains random 8bit codes. So, it should be read/written without any > code conversion. The modeline is not for Emacs, it's for the user. The user should NOT see "=" when the file is a text file. Otherwise, we will need to resurrect the T: and B: that we have just nuked in 20.0.93 (because we agreed that the coding system tells enough). From handa@etl.go.jp Mon Aug 25 04:14:30 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "20:14:49" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "49" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id EAA25420 for ; Mon, 25 Aug 1997 04:14:29 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id UAA02229; Mon, 25 Aug 1997 20:13:39 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id UAA26297; Mon, 25 Aug 1997 20:13:38 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id UAA02309; Mon, 25 Aug 1997 20:14:49 +0900 Message-Id: <199708251114.UAA02309@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Mon, 25 Aug 1997 13:21:54 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (2) Date: Mon, 25 Aug 1997 20:14:49 +0900 Eli Zaretskii writes: >> > I had this problem with lisp/term/internal.el which leads Emacs to >> > believe it's encoded in sjis. Even if I set the coding to emacs-mule >> > when I visit the file, Emacs will use sjis when it re-reads the file >> > before compiling it. The converted strings were used to set >> > case-conversion tables, so the effect of this was that Fdowncase >> > mysteriously stopped working for some characters: a particularly nasty >> > and hard-to-debug problem. >> >> This can be avoided by putting `coding' tag at the head of a file as I >> wrote before. > No, it's not good enough. Users can override the coding tag with > C-x c RET when they loaded the file. It is IMHO not nice to request > that they use C-x c RET again before invoking the byte compiler. I don't understand why they dare to load the file by C-x RET c (not C-x c RET)? Anyway, if there's a reason to use C-x RET c, it means that coding tag is not correct and the tag should be modified correctly. >> By the way, I don't think it is a good idea to have random binary >> codes in a source file. For instance, in the case of internal.el (I >> have just notived the existence of this file), we can use backslash >> notation (e.g. "\207") instead of putting row binary code in string to >> keep the information of cases. > Sure, but this is much harder for the programmer ;-). I don't know why putting those raw 8bit codes is easier for programmers. When you add more dos-codepage support (e.g. Slavic, Turkish), you anyway can't see correct characters. And, the current code of internal.el doesn't work for multibyte characters. To make it work for multibyte characters, I think the better way is to change those string to vector of multibyte charactes. > The truth is, those case tables should be nuked. As soon as I learn > enough about international languages support in Emacs, I will change that > part to set a specific language environment according to the DOS > codepage. > Other than that, internal.el does perform terminal-specific stuff, like > key remapping etc. I see. --- Ken'ichi HANDA handa@etl.go.jp From handa@etl.go.jp Mon Aug 25 05:38:34 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "21:39:16" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "56" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA27871 for ; Mon, 25 Aug 1997 05:38:33 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id VAA05277; Mon, 25 Aug 1997 21:38:04 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id VAA29258; Mon, 25 Aug 1997 21:38:04 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id VAA02414; Mon, 25 Aug 1997 21:39:16 +0900 Message-Id: <199708251239.VAA02414@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Mon, 25 Aug 1997 14:07:13 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Mon, 25 Aug 1997 21:39:16 +0900 Eli Zaretskii writes: >> Yah! Hmmm. If a file contains random 8-bit code which doesn't fit >> the coding system emacs-mule, it is detected as binary file. This is >> a difficult problem. How can we distinguish such a file from a truely >> binnary file which doesn't require any EOL conversion? > This has been my concern since I first looked at src/coding.c. The > `coding' tag will have to be the stopgap for now. Another solution is to > use `find-file-text'. Perhaps some heuristic could be added in future > based on the relative frequency of CRLF pairs and the binary characters. Hmm, perhaps, we must now give up detecting a coding system of a file in an incremental manner as being done now, but have to read the whole file with no conversion, detect a coding system by running sophisticated Emacs Lisp code on the whole buffer, then decode the whole buffer at once. This requires a lot more memory and time-consuming for reading a huge file, but the advantage of more appropriate code-detection may be larger than this disadvantage. >> > Besides, even with Unix EOLs, msdos.c causes Emacs to put "=" on the >> > modeline, which means binary file. This is not quite right. >> >> Why? The file doesn't need EOL conversion. In addition, the file >> contains random 8bit codes. So, it should be read/written without any >> code conversion. > The modeline is not for Emacs, it's for the user. The user should NOT > see "=" when the file is a text file. Otherwise, we will need to > resurrect the T: and B: that we have just nuked in 20.0.93 (because we > agreed that the coding system tells enough). The modeline doesn't say anything about the file is text or binary. It just says how the file was encoded. They are different things. Although we have a coding system `binary' (alias of no-conversion), the term `binary' doesn't means that of DOS file type. But, hmmm, perhaps DOS users are too familiar with the concept of file type (text or binary). For Unix users, usually there's no difference. Anyway, these discussions suggests that we have to detect EOL type even after we detect that a text contains random 8-bit code. How about adding a new coding system raw-text, raw-text-dos, raw-text-unix, raw-text-mac, and set coding-category-binary to raw-text if we are not in such language environment as Vietnames which are using such a random 8-bit file for their own language files. Please try the followings: (make-coding-system 'raw-text 0 ?t "Raw text") (setq coding-category-binary 'raw-text) and find-file msdos.c of LF format and of CRLF format. --- Ken'ichi HANDA handa@etl.go.jp From eliz@is.elta.co.il Mon Aug 25 06:56:24 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "16:56:01" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "10" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id GAA01050 for ; Mon, 25 Aug 1997 06:56:22 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id QAA03868; Mon, 25 Aug 1997 16:56:02 +0300 X-Sender: eliz@is In-Reply-To: <199708251114.UAA02309@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (2) Date: Mon, 25 Aug 1997 16:56:01 +0300 (IDT) On Mon, 25 Aug 1997, Kenichi Handa wrote: > > Sure, but this is much harder for the programmer ;-). > > I don't know why putting those raw 8bit codes is easier for > programmers. Because when its mine codepage, I just type the characters on my keyboard ;-). From eliz@is.elta.co.il Mon Aug 25 07:10:18 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "17:09:56" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "" "31" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA02011 for ; Mon, 25 Aug 1997 07:10:15 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id RAA03892; Mon, 25 Aug 1997 17:09:57 +0300 X-Sender: eliz@is In-Reply-To: <199708251239.VAA02414@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Mon, 25 Aug 1997 17:09:56 +0300 (IDT) On Mon, 25 Aug 1997, Kenichi Handa wrote: > The modeline doesn't say anything about the file is text or binary. > It just says how the file was encoded. They are different things. Originally, yes. Previous versions of Emacs on MS-DOS would display T: or B:, accordingly, for text and binary files. In Emacs 20, we all agreed that the coding system/EOL info on the modeline makes those T:/B: redundant. So now they aren't displayed by default. But this means that the coding system and EOL part of the modeline now has an additional meaning on DOS and NT. > Anyway, these discussions suggests that we have to detect EOL type > even after we detect that a text contains random 8-bit code. Yep, seems this could be a solution that won't slow down the file loading too much. > (make-coding-system 'raw-text 0 ?t "Raw text") > (setq coding-category-binary 'raw-text) > > and find-file msdos.c of LF format and of CRLF format. This works (displays "t:" and "t\" respectively on the modeline). I will have to dig deeper to understand what this does exactly. Richard, Geoff and Andrew, do you agree that creating such a coding system is the way to go? From rms@gnu.ai.mit.edu Mon Aug 25 10:49:40 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "13:51:00" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "9" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA15521 for ; Mon, 25 Aug 1997 10:49:39 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA15665; Mon, 25 Aug 1997 13:51:00 -0400 Message-Id: <199708251751.NAA15665@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Mon, 25 Aug 1997 13:21:54 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (2) Date: Mon, 25 Aug 1997 13:51:00 -0400 No, it's not good enough. Users can override the coding tag with C-x c RET when they loaded the file. I do not see a problem with this. Users can do whatever they want to. It is IMHO not nice to request that they use C-x c RET again before invoking the byte compiler. Could you explain what you are talking about? From rms@gnu.ai.mit.edu Mon Aug 25 13:54:30 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "16:55:55" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "9" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id NAA28556 for ; Mon, 25 Aug 1997 13:54:24 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id QAA16638; Mon, 25 Aug 1997 16:55:55 -0400 Message-Id: <199708252055.QAA16638@psilocin.gnu.ai.mit.edu> In-reply-to: <199708250636.PAA01967@etlken.etl.go.jp> (message from Kenichi Handa on Mon, 25 Aug 1997 15:36:07 +0900) References: <199708250636.PAA01967@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (2) Date: Mon, 25 Aug 1997 16:55:55 -0400 The coding system of a file being read should be decided only by the file contents unless the coding system is specified explicitly. Yes, that is right. Eli, if you disagree, would you please describe *in full* a case where you think it is wrong? It isn't useful to have a discussion if we are not sure we are talking about the same thing. From rms@gnu.ai.mit.edu Mon Aug 25 13:56:16 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "16:57:47" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "9" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id NAA28679 for ; Mon, 25 Aug 1997 13:56:15 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id QAA16652; Mon, 25 Aug 1997 16:57:47 -0400 Message-Id: <199708252057.QAA16652@psilocin.gnu.ai.mit.edu> In-reply-to: <199708250839.RAA02071@etlken.etl.go.jp> (message from Kenichi Handa on Mon, 25 Aug 1997 17:39:23 +0900) References: <199708250839.RAA02071@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Mon, 25 Aug 1997 16:57:47 -0400 > When enable-multibyte-characters is nil, we shouldn't look at the > coding system of the file either, even if buffer-file-coding-system > is local. Do you agree? I'm not sure. Even if enable-multibyte-characters is nil, at least EOL format is detected by insert-file-contents. So, append-to-file had better detect at least EOL format. What do you think? That is right. From rms@gnu.ai.mit.edu Mon Aug 25 14:10:53 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "17:12:25" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "8" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA29570 for ; Mon, 25 Aug 1997 14:10:52 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA16778; Mon, 25 Aug 1997 17:12:25 -0400 Message-Id: <199708252112.RAA16778@psilocin.gnu.ai.mit.edu> In-reply-to: <199708251034.TAA02242@etlken.etl.go.jp> (message from Kenichi Handa on Mon, 25 Aug 1997 19:34:59 +0900) References: <199708251034.TAA02242@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL conversion on MSDOS and MS-Windows Date: Mon, 25 Aug 1997 17:12:25 -0400 1) The file was read before enable-multibyte-characters is set to nil. 2) EOL format of the file was not that of Unix files. In both cases, I thought it was safer to encode the file with the same coding system used for decoding. This reasoning makes sense to me. From rms@gnu.ai.mit.edu Mon Aug 25 14:54:48 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "17:55:58" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "17" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA02218 for ; Mon, 25 Aug 1997 14:54:47 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA17030; Mon, 25 Aug 1997 17:55:58 -0400 Message-Id: <199708252155.RAA17030@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Mon, 25 Aug 1997 08:28:23 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: Coding system issues (2) Date: Mon, 25 Aug 1997 17:55:58 -0400 I wonder whether insert-file-contents needs to inherit the coding system from the buffer, if it is set already (not undecided)? This would be incorrect. If I insert file foo into a buffer visiting bar, foo should be decoding in the right coding system of file foo, which has absolutely nothing to do with the coding system of file bar. The byte compiler erases the buffer and re-reads the file before it begins the compilation (why, btw?), Because that buffer probably had some other text in it, perhaps from another file that you compiled. Remember that byte-compile-file does NOT visit the input file. It uses a temporary buffer. This is a normal technique; many Emacs commands that look at files use temporary buffers. Some read many different files. From rms@gnu.ai.mit.edu Mon Aug 25 14:55:19 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "17:56:43" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "4" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA02250 for ; Mon, 25 Aug 1997 14:55:18 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA17045; Mon, 25 Aug 1997 17:56:43 -0400 Message-Id: <199708252156.RAA17045@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Mon, 25 Aug 1997 08:28:23 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: Coding system issues (2) Date: Mon, 25 Aug 1997 17:56:43 -0400 Do you agree that inserting a file into a buffer that already has a decided coding system should use the same coding system? That would be completely wrong. From rms@gnu.ai.mit.edu Mon Aug 25 15:00:22 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "18:01:49" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "24" "Re: Coding system issues (1)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id PAA02642 for ; Mon, 25 Aug 1997 15:00:21 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id SAA17074; Mon, 25 Aug 1997 18:01:49 -0400 Message-Id: <199708252201.SAA17074@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Mon, 25 Aug 1997 08:24:22 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il, handa@etl.go.jp CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, rms@gnu.ai.mit.edu Subject: Re: Coding system issues (1) Date: Mon, 25 Aug 1997 18:01:49 -0400 It seems that (setq-default enable-multibyte-characters nil) also disables part of the DOS EOL conversions. Specifically, if you create a new buffer, type text there, then save the buffer, you get Unix-style linefeeds at EOL, although the modeline quite deceptively says "\". E.g., try this: emacs -q M-: (setq-default enable-multibyte-characters nil) RET C-x b my-own-buffer RET Now type a few lines of text, then press C-x C-s foobar RET. Exit or suspend Emacs and look at the file foobar; you will see a Unix-style file. This is definitely a bug. Saving this buffer should peform EOL conversion even though enable-multibyte-characters is nil. Handa can you please work on this with highest priority? Disabling EOL conversion when multibyte characters aren't supported might make sense on Unix (since it returns to the pre-20 behavior), It is wrong on Unix too. EOL conversion should work for all formats on all systems, regardless of enable-multibyte-characters. From handa@etl.go.jp Mon Aug 25 18:00:52 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "26" "August" "1997" "10:01:51" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "26" "Re: Coding system issues (1)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA13003 for ; Mon, 25 Aug 1997 18:00:51 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id KAA20440; Tue, 26 Aug 1997 10:00:41 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA22888; Tue, 26 Aug 1997 10:00:40 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id KAA03036; Tue, 26 Aug 1997 10:01:51 +0900 Message-Id: <199708260101.KAA03036@etlken.etl.go.jp> In-reply-to: <199708252201.SAA17074@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Mon, 25 Aug 1997 18:01:49 -0400) References: <199708252201.SAA17074@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk, rms@gnu.ai.mit.edu Subject: Re: Coding system issues (1) Date: Tue, 26 Aug 1997 10:01:51 +0900 Richard Stallman writes: > emacs -q > M-: (setq-default enable-multibyte-characters nil) RET > C-x b my-own-buffer RET > Now type a few lines of text, then press C-x C-s foobar RET. Exit or > suspend Emacs and look at the file foobar; you will see a Unix-style > file. > This is definitely a bug. Saving this buffer should peform EOL conversion > even though enable-multibyte-characters is nil. > Handa can you please work on this with highest priority? Ok, I'm now working on that. > Disabling EOL conversion when multibyte characters aren't supported > might make sense on Unix (since it returns to the pre-20 behavior), > It is wrong on Unix too. EOL conversion should work for all formats > on all systems, regardless of enable-multibyte-characters. I agree. --- Ken'ichi HANDA handa@etl.go.jp From rms@gnu.ai.mit.edu Mon Aug 25 20:58:05 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "25" "August" "1997" "23:59:33" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "12" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id UAA20132 for ; Mon, 25 Aug 1997 20:58:04 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id XAA18563; Mon, 25 Aug 1997 23:59:33 -0400 Message-Id: <199708260359.XAA18563@psilocin.gnu.ai.mit.edu> In-reply-to: <199708251054.TAA02279@etlken.etl.go.jp> (message from Kenichi Handa on Mon, 25 Aug 1997 19:54:42 +0900) References: <199708251054.TAA02279@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Mon, 25 Aug 1997 23:59:33 -0400 > Yes. So if you stay with the same version of Emacs, you won't see any > problem. But I also loaded msdos.c edited by a previous version of > Emacs, which always added CRs, and then I saw all those ^M characters. That was a bug in the old version of Emacs. Let's not worry about old bugs that have been fixed. Yah! Hmmm. If a file contains random 8-bit code which doesn't fit the coding system emacs-mule, it is detected as binary file. Could you be more precise? What does "detected as binary file" really mean? What does Emacs DO in this case? From handa@etl.go.jp Mon Aug 25 22:08:07 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "26" "August" "1997" "14:08:25" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "58" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id WAA22058 for ; Mon, 25 Aug 1997 22:08:06 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id OAA06286; Tue, 26 Aug 1997 14:07:14 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id OAA10146; Tue, 26 Aug 1997 14:07:14 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id OAA03270; Tue, 26 Aug 1997 14:08:25 +0900 Message-Id: <199708260508.OAA03270@etlken.etl.go.jp> In-reply-to: <199708260359.XAA18563@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Mon, 25 Aug 1997 23:59:33 -0400) References: <199708251054.TAA02279@etlken.etl.go.jp> <199708260359.XAA18563@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Tue, 26 Aug 1997 14:08:25 +0900 Richard Stallman writes: >> Yes. So if you stay with the same version of Emacs, you won't see any >> problem. But I also loaded msdos.c edited by a previous version of >> Emacs, which always added CRs, and then I saw all those ^M characters. > That was a bug in the old version of Emacs. Let's not worry about old > bugs that have been fixed. No. This bug (feature?) still remains. If a file is formated by DOS-like EOL, and there exist ramdom 8-bit codes somewhere in the file, the current version detect it as coding-category-binary, and does no code conversion because coding-category-binary is set to `no-conversion' by default. > Yah! Hmmm. If a file contains random 8-bit code which doesn't fit > the coding system emacs-mule, it is detected as binary file. > Could you be more precise? What does "detected as binary file" really > mean? What does Emacs DO in this case? It does as I wrote above because decode_coding (in coding.c) has the following code. ---------------------------------------------------------------------- if (coding->type == coding_type_undecided) detect_coding (coding, source, src_bytes); if (coding->eol_type == CODING_EOL_UNDECIDED) detect_eol (coding, source, src_bytes); ---------------------------------------------------------------------- So, Emacs at first try to detect text coding. At this time, if the file contains random 8-bit code, Emacs thinks that the category of coding is coding-category-binary and setup the coding system no-conversion in the structure `coding' (coding->eol_type is set to CODING_EOL_LF). So, it skips detect_eol. Even now, if we set coding-category-binary to `emacs-mule', this problem is avoided, but it is like setting coding-category-sjis to iso-latin-1 (Richard, do you remember the previous discussion about handling Microsoft extra latin code?), and not a right thing. In addtition, mnemonic of `emacs-mule' is `=' (same as no conversion), which won't help DOS users. So, I proposed a new coding system `raw-text' (thought I'm not sure this is a good name or not) which requires only EOL conversion and set coding-category-binary to raw-text by default. Then, the call of detect_coding setup raw-text in coding (coding->eol_type is set to CODING_EOL_UNDECIDED), and Emacs calls detect_eol which may set coding->eol_type correctly. The demerit of this method is that a truely binary file is detected as `raw-text-XXX'. But, this can be avoided except for a very rare case by changing the code of detect_eol so that it setup no-conversion to the struct `coding' if EOL format is not consistent. --- Ken'ichi HANDA handa@etl.go.jp From eliz@is.elta.co.il Mon Aug 25 23:53:17 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "26" "August" "1997" "09:52:54" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id XAA25266 for ; Mon, 25 Aug 1997 23:53:15 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id JAA05300; Tue, 26 Aug 1997 09:52:55 +0300 X-Sender: eliz@is In-Reply-To: <199708251801.OAA15763@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Tue, 26 Aug 1997 09:52:54 +0300 (IDT) On Mon, 25 Aug 1997, Richard Stallman wrote: > > Besides, even with Unix EOLs, msdos.c causes Emacs to put "=" on the > > modeline, which means binary file. This is not quite right. > > Why? The file doesn't need EOL conversion. In addition, the file > contains random 8bit codes. So, it should be read/written without any > code conversion. > > Eli, please send him a precise test case; tell Handa *exactly* what to > type so he can observe this. Describing actions in abstract ways is a > VERY bad idea, almost guaranteed to lead to misunderstandings. I did send a precise test case: `C-x C-f src/msdos.c RET'. After that, look at the modeline: it says the coding is no-conversion ("="), as if this were a binary file. If the file has DOS EOLs, you will see ^M characters at the end of each line. From eliz@is.elta.co.il Mon Aug 25 23:59:28 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "26" "August" "1997" "09:58:49" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "40" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id XAA25410 for ; Mon, 25 Aug 1997 23:59:26 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id JAA05307; Tue, 26 Aug 1997 09:58:49 +0300 X-Sender: eliz@is In-Reply-To: <199708251751.NAA15665@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (2) Date: Tue, 26 Aug 1997 09:58:49 +0300 (IDT) On Mon, 25 Aug 1997, Richard Stallman wrote: > It is IMHO not nice to request > that they use C-x c RET again before invoking the byte compiler. > > Could you explain what you are talking about? I did that, in my original message. This message was one of a series that was generated by a discussion which followed. The problem (which took me a couple of days to debug, btw) was that lisp/term/internal.el, when byte-compiled and loaded, would cause Fdowncase to behave erratically. Specifically, the letters A, O, and U would stay in upper case. To reproduce: emacs -q M-x load-file lisp/term/internal.elc RET C-x b *scratch* RET (downcase "AOU")^J (I hope that internal.el can be loaded on Unix with no problems, so you could try it.) It turned out that internal.el looks to Emacs as sjis-encoded file. I then set the coding system to emacs-mule manually when I visited that file (`C-x RET c emacs-mule RET C-x C-f lisp/term/internal.el RET'), but the byte-compiled file was still wrong, because `emacs-lisp-byte-compile' re-reads the file, and when it does, it decodes it again as sjis. So the user must set the coding system again before compiling the file. This is IMHO counter-intuitive, since users might have no idea that the byte compiler reads the file again. With the introduction of the `coding' tag in the -*- line, this problem with internal.el is solved. But I'm still concerned with the more general case whereby setting coding system for the .el file is not enough to byte-compile it as that coding system says. In the case where a user needs to override Emacs coding detection, this might lead to subtle bugs. From eliz@is.elta.co.il Tue Aug 26 00:06:29 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "26" "August" "1997" "10:06:05" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "10" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id AAA25732 for ; Tue, 26 Aug 1997 00:06:27 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id KAA05319; Tue, 26 Aug 1997 10:06:06 +0300 X-Sender: eliz@is In-Reply-To: <199708252155.RAA17030@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: Coding system issues (2) Date: Tue, 26 Aug 1997 10:06:05 +0300 (IDT) On Mon, 25 Aug 1997, Richard Stallman wrote: > Remember that byte-compile-file does NOT visit the input file. > It uses a temporary buffer. I was using `emacs-lisp-byte-compile' (also available from the menu bar), which is supposed to compile the file in the current buffer. I understand that it calls `byte-compile-file' internally, but it is still not obvious that it should re-read the file which in this case is already visited. From rms@gnu.ai.mit.edu Tue Aug 26 09:55:56 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" "26" "August" "1997" "12:56:31" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "1" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA16283 for ; Tue, 26 Aug 1997 09:55:52 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA23352; Tue, 26 Aug 1997 12:56:31 -0400 Message-Id: <199708261656.MAA23352@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Tue, 26 Aug 1997 10:06:05 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: Coding system issues (2) Date: Tue, 26 Aug 1997 12:56:31 -0400 byte-compile-file should not be influenced by the current buffer. From rms@gnu.ai.mit.edu Tue Aug 26 21:29:52 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "00:31:18" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "19" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id VAA27973 for ; Tue, 26 Aug 1997 21:29:51 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id AAA26954; Wed, 27 Aug 1997 00:31:18 -0400 Message-Id: <199708270431.AAA26954@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Tue, 26 Aug 1997 09:52:54 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 00:31:18 -0400 I did send a precise test case: `C-x C-f src/msdos.c RET'. After that, look at the modeline: it says the coding is no-conversion ("="), as if this were a binary file. Yes, I see this. The buffer has buffer-file-coding-system = no-conversion, but enable-multibyte-characters is t. This is not right. If the file happens to have a \201 in it, Emacs could get quite confused. The right thing to do, for a file which has byte codes 200-377 which Emacs can't understand, is to turn off enable-multibyte-characters. That way it is safe to read in the file no matter what byte values it has. In addition, I agree with Eli that it should do EOL conversion according to the data in the file. Handa, can you please implement this? From eliz@is.elta.co.il Mon Sep 29 01:43:40 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "29" "September" "1997" "10:43:12" "+0200" "Eli Zaretskii" "eliz@is.elta.co.il" nil "23" "EOL encoding and C-x i" "^From:" nil nil "9" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id BAA13207 for ; Mon, 29 Sep 1997 01:43:38 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id KAA02087; Mon, 29 Sep 1997 10:43:13 +0200 X-Sender: eliz@is Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: Kenichi Handa , Geoff Voelker , Andrew Innes Subject: EOL encoding and C-x i Date: Mon, 29 Sep 1997 10:43:12 +0200 (IST) `C-x i' changes the EOL encoding in a way that I find unexpected. This is in Emacs 20.2 on MS-DOS. To reproduce: emacs -q C-x C-f foobar.txt (I assume `foobar.txt' doesn't exist, but I don't think that it matters.) Put some text into the buffer, then save it. The file foobar.txt is saved with DOS EOLs, like it should. Now insert another file into the buffer: C-x i foo.bar If `foo.bar' has Unix EOLs, the coding system of the current buffer is changed to *-dos, and the file is saved as such. Is this done on purpose? If so, I would like to know the reason. I would expect that, at least for a buffer which has been saved already in a file with DOS EOLs and have enough of them to qualify as a DOS text file, `C-x i' won't change the EOL encodings. From handa@etl.go.jp Mon Sep 29 04:29:08 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" "29" "September" "1997" "20:28:54" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "79" "Re: EOL encoding and C-x i" "^From:" nil nil "9" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id EAA16676 for ; Mon, 29 Sep 1997 04:29:06 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id UAA06440; Mon, 29 Sep 1997 20:27:51 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id UAA12486; Mon, 29 Sep 1997 20:27:50 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id UAA23617; Mon, 29 Sep 1997 20:28:54 +0900 Message-Id: <199709291128.UAA23617@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Mon, 29 Sep 1997 10:43:12 +0200 (IST)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL encoding and C-x i Date: Mon, 29 Sep 1997 20:28:54 +0900 Eli Zaretskii writes: > `C-x i' changes the EOL encoding in a way that I find unexpected. This > is in Emacs 20.2 on MS-DOS. > To reproduce: > emacs -q > C-x C-f foobar.txt > (I assume `foobar.txt' doesn't exist, but I don't think that it matters.) > Put some text into the buffer, then save it. The file foobar.txt is > saved with DOS EOLs, like it should. > Now insert another file into the buffer: > C-x i foo.bar > If `foo.bar' has Unix EOLs, the coding system of the current buffer is > changed to *-dos, and the file is saved as such. > Is this done on purpose? If so, I would like to know the reason. This is not done on purpose. But, the behaviour of this reason is simple; buffer-file-coding-system is not set locally just by saving it. > I would expect that, at least for a buffer which has been saved > already in a file with DOS EOLs and have enough of them to qualify > as a DOS text file, `C-x i' won't change the EOL encodings. I agree that what you expect is quite reasonable, and agree that the buffer-file-coding-system of a buffer once saved should not be changed by inserting something later. This can be achieved by binding buffer-file-coding-system locally in that buffer. The question is at which point we should bind it. How about the following change to basic-save-buffer? ---lisp/ChangeLog--------------------------------------------------------- * files.el (basic-save-buffer): Set buffer-file-coding-system to the coding system actually used for saving. ---patch for lisp/files.el------------------------------------------------ diff -acrN --exclude=ChangeLog --exclude=*.elc --exclude=*~ --exclude=TAGS --exclude=loaddefs.el ../emacs-20.2.fsf/lisp/files.el ../emacs-20.2/lisp/files.el *** ../emacs-20.2.fsf/lisp/files.el Tue Sep 9 14:32:49 1997 --- ../emacs-20.2/lisp/files.el Mon Sep 29 19:59:39 1997 *************** *** 2181,2186 **** --- 2181,2190 ---- ;; If a hook returned t, file is already "written". ;; Otherwise, write it the usual way now. (setq setmodes (basic-save-buffer-1))) + ;; Now we have saved the current buffer. Let's make sure + ;; that buffer-file-coding-system is fixed to what + ;; actually used for saving by binding it locally. + (setq buffer-file-coding-system last-coding-system-used) (setq buffer-file-number (nthcdr 10 (file-attributes buffer-file-name))) (if setmodes ------------------------------------------------------------------------ I've just tested this change on Unix by: (1) At first, set default value of buffer-file-coding-system to undecided-dos. (2) Then, I visisted a new file. At this moment, buffer-file-coding-system was not set locally. (3) Entered "abc\n" in the buffer and saved it. Now buffer-file-coding-system ws set to undecided-dos locally. (4) Inserted some ascii file of Unix-like end-of-line codes. buffer-file-coding-system was still undecided-dos. (5) Inserted a file of iso-latin-1-unix. Then buffer-file-coding-system was changed to iso-latin-1-dos. So, it seems that this change works well. If you test it in your environment and find no problem, someone please update FSF's code. Now, I'm using a very narrow line, and it's quite difficult to do the job of updating. --- Ken'ichi HANDA handa@etl.go.jp From rms@gnu.ai.mit.edu Tue Sep 30 19:14:13 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Tue" "30" "September" "1997" "22:15:00" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199710010215.WAA12036@psilocin.gnu.ai.mit.edu>" "17" "Re: EOL encoding and C-x i" "^From:" nil nil "9" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id TAA29544 for ; Tue, 30 Sep 1997 19:14:12 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id WAA12036; Tue, 30 Sep 1997 22:15:00 -0400 Message-Id: <199710010215.WAA12036@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Mon, 29 Sep 1997 10:43:12 +0200 (IST)) Reply-to: rms@gnu.ai.mit.edu From: Richard Stallman To: eliz@is.elta.co.il CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: EOL encoding and C-x i Date: Tue, 30 Sep 1997 22:15:00 -0400 Now insert another file into the buffer: C-x i foo.bar If `foo.bar' has Unix EOLs, the coding system of the current buffer is changed to *-dos, and the file is saved as such. Do you mean it is changed to *-unix? Is this done on purpose? If so, I would like to know the reason. I would expect that, at least for a buffer which has been saved already in a file with DOS EOLs and have enough of them to qualify as a DOS text file, `C-x i' won't change the EOL encodings. In that situation you are mixing the two kinds of EOLs, which means that it is hard to be sre that either alternative is really right. But I tend to agree with you. Handa, how hard would this be? From rms@gnu.ai.mit.edu Sun Jul 6 01:33:31 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Sun" " 6" "July" "1997" "04:33:59" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199707060833.EAA10252@psilocin.gnu.ai.mit.edu>" "45" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA18916 for ; Sun, 6 Jul 1997 01:33:30 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id EAA10252; Sun, 6 Jul 1997 04:33:59 -0400 Message-Id: <199707060833.EAA10252@psilocin.gnu.ai.mit.edu> In-reply-to: <199707042102.OAA34672@joker.cs.washington.edu> (voelker@cs.washington.edu) References: <199707031919.PAA10787@psilocin.gnu.ai.mit.edu> <199707042102.OAA34672@joker.cs.washington.edu> From: Richard Stallman To: voelker@cs.washington.edu CC: eliz@is.elta.co.il, andrewi@harlequin.co.uk Subject: Re: New way of handling CRLF Date: Sun, 6 Jul 1997 04:33:59 -0400 Given the new coding-system framework, I think that all file I/O under DOS_NT should now be done in binary mode Yes, that is true. If it doesn't work that way now, could someone send me a fix? Actually, the new coding-system framework appears to obviate the need for buffer-file-type; file-coding-system-alist and buffer-file-coding-system appear to be flexible enough to supercede it. I will need to think more about this, though, since it is a rather drastic change under DOS_NT. Another alternative would be to modify some of the Mule functions so that, on DOS/NT, they look at the same variables which now control the decision about the buffer file type. For example, the list of special extensions and the list of untranslated file systems. > (There is a bug in the pretest that fails to save a file with CRLF if > it was recognized with CRLF. That has been fixed.) Can you send me the patches for this so that I can test assuming that this case should work? I don't know what the patch is. If you want, you can log in here and try to figure out. Currently, the default for file-coding-system-alist is 'undecided. Under DOS_NT, this should probably be 'emacs-mule so that CRLF is decoded and encoded by default. I don't follow the reasoning. Why would changing from undecided to emacs-mule have any effect on EOL conversion? Perhaps you're being fooled by the bug that Handa fixed (see above) which made EOL conversion not work when saving a file, since that may have been only for `undecided'. At this point, I don't think we should try to eliminate file-name-buffer-file-type-alist and untranslated file systems, rather just make them have their effect via the coding system mechanism which is the right way for now. file-name-buffer-file-type-alist and untranslated file systems are documented. From handa@etl.go.jp Tue Aug 5 18:09:55 1997 X-VM-v5-Data: ([nil nil nil t nil nil nil nil nil] [nil "Wed" " 6" "August" "1997" "10:10:27" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "38" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA28619 for ; Tue, 5 Aug 1997 18:09:40 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id KAA20137; Wed, 6 Aug 1997 10:09:11 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA00905; Wed, 6 Aug 1997 10:09:11 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id KAA07868; Wed, 6 Aug 1997 10:10:27 +0900 Message-Id: <199708060110.KAA07868@etlken.etl.go.jp> In-reply-to: <199708051818.OAA05876@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Tue, 5 Aug 1997 14:18:03 -0400) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050838.EAA00520@psilocin.gnu.ai.mit.edu> <199708051200.VAA07192@etlken.etl.go.jp> <199708051818.OAA05876@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Wed, 6 Aug 1997 10:10:27 +0900 Richard Stallman writes: > So, I suggest the following code. This scans buffer until it > encounters 3 end-of-lines. If it founds two different patterns while > scanning, it decides not to decode end-of-line (by returning > CODING_EOL_LF). So, in any of the following cases, it doesn't decode > end-of-line. > CR CR LF, LF CR LF, CR LF LF, CR CR LF LF, LF CR CR LF. > I think it is clear enough, and users won't be surprised that much. > I think this is good enough. I'll install it now. I have just made a small change as below in FSF's code: diff -c -r1.30 coding.c *** coding.c 1997/08/05 18:19:33 1.30 --- coding.c 1997/08/06 01:06:38 *************** *** 2739,2745 **** } } ! return (total ? eol_type : CODING_EOL_UNDECIDED); } /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC --- 2739,2745 ---- } } ! return eol_type; } /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC --- Ken'ichi HANDA handa@etl.go.jp From Marc.Fleischeuers@kub.nl Wed Aug 6 01:32:59 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "" " 6" "August" "1997" "10:32:13" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "50" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA14030 for ; Wed, 6 Aug 1997 01:32:53 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id KAA20618; Wed, 6 Aug 1997 10:32:12 +0200 (MET DST) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050630.CAA31410@psilocin.gnu.ai.mit.edu> <199708051721.NAA05128@psilocin.gnu.ai.mit.edu> In-Reply-To: Richard Stallman's message of Tue, 5 Aug 1997 13:21:52 -0400 Message-ID: Lines: 50 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: Richard Stallman Cc: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, handa@etl.go.jp, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 06 Aug 1997 10:32:13 +0200 voelker@cs.washington.edu (Geoff Voelker) writes: > But at some point, Marc and Handa expressed dismay at the code I added > in dos-w32.el for determining the coding system for a file. Do you > still think this? Yes, I still think this. But I think this little excersice has helped in making the issue clearer. Richard Stallman writes: > I would like this behaviour to be dependent on the > buffer-file-coding-system in effect for a buffer, *not* the operating > system emacs runs on. > > It already is, I believe. The default choice for buffer-file-coding-system > is different on MSDOS. > > Does this seem to be untrue in your experience? My gripe comes down to this: I understand a different default for buffer-file-coding-system; the problem is that as a simple user, I don't *see* this. M-x describe-coding-system says buffer-file-coding-system is nil. The mode line indicator is `:'. I only found out what was going on when I looked at `find-buffer-file-type-coding-system' in lisp/dos-w32.el. At this point, I added the following definitions to my ~/.emacs: (defun untranslated-file-p (filename) "Return t if FILENAME is on a filesystem that does not require CR/LF translation, and nil otherwise." t) (setq-default buffer-file-coding-system 'undecided-dos) These definitions are a gross hack and by no means do I recommend them to use for emacs. However, my purpose was the following: first, I wanted to disable the selection of a coding system based on file system, file name and file existence in `find-buffer-file-coding-system', second, I wanted to have an explicit, user-visible, default. The above definitions serve this purpose well. This is how I like emacs to be. Marc -- Computer! End program! Computer! Create _new_ program! From Marc.Fleischeuers@kub.nl Wed Aug 6 02:01:42 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "" " 6" "August" "1997" "11:01:38" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "40" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id CAA14752 for ; Wed, 6 Aug 1997 02:01:40 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id LAA22366; Wed, 6 Aug 1997 11:01:38 +0200 (MET DST) References: <199708051600.RAA14620@propos.long.harlequin.co.uk> In-Reply-To: Andrew Innes's message of Tue, 5 Aug 1997 17:00:38 +0100 (BST) Message-ID: Lines: 40 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: Andrew Innes Cc: handa@etl.go.jp, rms@gnu.ai.mit.edu, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 06 Aug 1997 11:01:38 +0200 Andrew Innes writes: > (Replying to several messages.) > I agree with Handa that Emacs should not do this. Nearly all text files > will use a single end-of-line convention throughout, and thus pose no > problem. I can't think of cirumstances in which a user would encounter > text files containing extra CR characters like this. If such > cirumstances really are rare, then having to edit in "binary" mode where > all CRs are explicit seems reasonable. True, this is what you would like to see. If emacs is in "no-conversion" or "unix" eol-mode (`buffer-file-coding-system' matches "-unix", and the mode line indicator is `:'), then I think it is not unreasonable that if you want to make a "dos" file, to enter `C-q C-m C-q C-j'. Incidentally, this actually works if `untranslated-file-p' returns `t' indiscriminantly, and the default value of `buffer-file-coding-system is `undecided-dos' (that is, my .emacs settings since yesterday). Given the input C-x C-f M-backspace M-backspace n e w . f i l e return C-x return f u n d e c i d e d - u n i x return a b c C-q RET C-q C-j a b c C-q RET C-q C-j C-x C-s C-x C-v return File new.file is written containing exactly the bytes I input (as I wanted to, i.e., abc\C-m\C-jabc\C-m\C-j) and it's read in the way it should be: as a dos-file. Heck, I can even create a Mac-file on an MS-DOS machine like this! > (BTW, does Emacs 20 distinguish between text files in CODING_EOF_LF, and > binary files? I think such a distinction is useful - a binary file > might contain all sorts of odd combinations of CR and LF, but a text > file should normally use a single convention throughout.) No. There is a `CODING_EOF_UNDECIDED', but in src/coding.c this case is treated the same as `CODING_EOF_LF'. Marc From Marc.Fleischeuers@kub.nl Wed Aug 6 02:13:59 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "" " 6" "August" "1997" "11:11:30" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "24" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id CAA15107 for ; Wed, 6 Aug 1997 02:13:56 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id LAA22961; Wed, 6 Aug 1997 11:11:30 +0200 (MET DST) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050838.EAA00520@psilocin.gnu.ai.mit.edu> <199708051200.VAA07192@etlken.etl.go.jp> <199708051818.OAA05876@psilocin.gnu.ai.mit.edu> <199708051840.LAA26604@joker.cs.washington.edu> In-Reply-To: voelker@cs.washington.edu's message of Tue, 05 Aug 1997 11:34:09 -0700 (PDT) Message-ID: Lines: 24 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: voelker@cs.washington.edu (Geoff Voelker) Cc: Marc.Fleischeuers@kub.nl, handa@etl.go.jp, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 06 Aug 1997 11:11:30 +0200 voelker@cs.washington.edu (Geoff Voelker) writes: > I've been gone from Friday until this morning, and so I've been > catching up on the mail that's been sent on this thread. From what I > can tell, Handa's latest patch fixes Marc's problem (have you had a > chance to try this, Marc?). Yes I installed this new function. Emacs is a little more predictable and I have seen no big surprises. The (admittedly incorrectly made) msdos-file that contains \C-m\C-m\C-j line separator chars is still not displayed "correctly", it is shown as abc^M^M abc^M^M and buffer-file-coding-system is nil. I think however that this is ok. For one, \C-m\C-m\C-j as a line separator does not agree with any convention so the way that a file containing these characters is displayed is arbitrary anyway. Instead, I think it is more fruitful to avoid situations where files with these erroneous line-endings are created, i.e. making it clearer for users what the eol-conventions at any given time are. Marc From Marc.Fleischeuers@kub.nl Wed Aug 6 02:58:58 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "" " 6" "August" "1997" "11:58:54" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "30" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id CAA16012 for ; Wed, 6 Aug 1997 02:58:56 -0700 Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id LAA26126 for ; Wed, 6 Aug 1997 11:58:54 +0200 (MET DST) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708051851.LAA18429@joker.cs.washington.edu> In-Reply-To: voelker@cs.washington.edu's message of Tue, 05 Aug 1997 11:46:18 -0700 (PDT) Message-ID: Lines: 30 X-Mailer: Gnus v5.3/Emacs 19.33 From: Marc Fleischeuers Sender: marcf@PI0737.kub.nl To: voelker@cs.washington.edu (Geoff Voelker) Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: 06 Aug 1997 11:58:54 +0200 voelker@cs.washington.edu (Geoff Voelker) writes: > > the variable with M-x set-variable RET buffer-file-type but when I > > press return all I get is [no match]. > > > > ... > > > > Apropos'ing around I found another promising variable, > > `buffer-file-format', valid values for which are found in > > `format-alist'. In this alist there seems to be an appropriate format, > > `ibm'. However, `M-x set-variable RET buffer-file-format' again gives > > [no match]. > > Marc, > > I couldn't quite tell if you had figured this out yet or not, but > set-variable works on variables that have been defvar'd (which these > have not). For these, you would want to use setq. > > -geoff Er, yes, was mimicking an average user. I think your advice was addressed to an experienced emacs debugger so I was not really fair. I do think however that end-of-line stuff should be at the control of a user, without having to resort to lisp. Marc -- Computer! End program! Computer! Create _new_ program! From andrewi@harlequin.co.uk Wed Aug 6 05:32:07 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" " 6" "August" "1997" "13:30:20" "+0100" "Andrew Innes" "andrewi@harlequin.co.uk" nil "42" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA16523 for ; Wed, 6 Aug 1997 05:32:04 -0700 Received: from propos.long.harlequin.co.uk (propos.long.harlequin.co.uk [193.128.93.50]) by holly.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP id NAA25198; Wed, 6 Aug 1997 13:30:56 +0100 (BST) Received: from woozle.long.harlequin.co.uk (woozle.long.harlequin.co.uk [193.128.93.77]) by propos.long.harlequin.co.uk (8.8.4/8.6.12) with SMTP id NAA02181; Wed, 6 Aug 1997 13:30:20 +0100 (BST) Message-Id: <199708061230.NAA02181@propos.long.harlequin.co.uk> In-reply-to: <199708051953.PAA07564@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Tue, 5 Aug 1997 15:53:20 -0400) From: Andrew Innes To: rms@gnu.ai.mit.edu CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Wed, 6 Aug 1997 13:30:20 +0100 (BST) On Tue, 5 Aug 1997 15:53:20 -0400, Richard Stallman said: > (BTW, does Emacs 20 distinguish between text files in CODING_EOF_LF, and > binary files? > >There is a distinction which perhaps you could interpret in this way: >whether no-conversion is specified as the coding system. I'm not familiar with this; does no-conversion imply no character set conversion, or no EOL conversion (or both)? If the latter (no charset or EOL conversion), then that nicely expresses the distinction I have between binary and text. > I think such a distinction is useful - a binary file > might contain all sorts of odd combinations of CR and LF, but a text > file should normally use a single convention throughout.) > >What, specifically, is it useful for? It is probably only useful (in practical terms) in small ways, such as if insert-file-contents were to check whether the chosen/specified EOL coding is used consistently for all lines in text files; obviously that would not be appropriate for binary files. I guess the distinction is meaningful to me as a user, though of little consequence to the way Emacs handles files. That is, I think of certain types of file (eg. .gz, .tar, .zip, .obj files etc) as binary, and I therefore expect to do different things with such files in Emacs than I would with text files. So, while I would think it quite natural to change the EOL coding for a buffer visiting a text file, I would want Emacs to query me if I tried to do the same for a buffer visiting a binary file. If I visited a text file that had mixed EOL coding, I would want to be told about it, and probably would want the option to have all lines converted to the same coding. I might like to have a find-file-hooks function that sets truncate-lines to t for text files, and to nil for binary files. That sort of thing. It is not a big deal, but to my mind makes sense. AndrewI From rms@gnu.ai.mit.edu Wed Aug 6 10:51:10 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" " 6" "August" "1997" "13:50:52" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "8" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA20275 for ; Wed, 6 Aug 1997 10:51:10 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA24101; Wed, 6 Aug 1997 13:50:52 -0400 Message-Id: <199708061750.NAA24101@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Marc Fleischeuers on 06 Aug 1997 10:32:13 +0200) References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> <199707310605.XAA15156@joker.cs.washington.edu> <199707312038.NAA15222@joker.cs.washington.edu> <199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050630.CAA31410@psilocin.gnu.ai.mit.edu> <199708051721.NAA05128@psilocin.gnu.ai.mit.edu> From: Richard Stallman To: Marc.Fleischeuers@kub.nl CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, handa@etl.go.jp, andrewi@harlequin.co.uk Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Wed, 6 Aug 1997 13:50:52 -0400 second, I wanted to have an explicit, user-visible, default. I agree it is better to make the decision about the default coding system for a new file when the buffer is created--not delay it until saving the file. Can someone write this? From rms@gnu.ai.mit.edu Wed Aug 6 11:06:18 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" " 6" "August" "1997" "14:06:15" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "8" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id LAA21460 for ; Wed, 6 Aug 1997 11:06:17 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id OAA24241; Wed, 6 Aug 1997 14:06:15 -0400 Message-Id: <199708061806.OAA24241@psilocin.gnu.ai.mit.edu> In-reply-to: <199708061230.NAA02181@propos.long.harlequin.co.uk> (message from Andrew Innes on Wed, 6 Aug 1997 13:30:20 +0100 (BST)) References: <199708061230.NAA02181@propos.long.harlequin.co.uk> From: Richard Stallman To: andrewi@harlequin.co.uk CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Wed, 6 Aug 1997 14:06:15 -0400 >What, specifically, is it useful for? It is probably only useful (in practical terms) in small ways, such as if insert-file-contents were to check whether the chosen/specified EOL coding is used consistently for all lines in text files; obviously that would not be appropriate for binary files. no-conversion is useful for that. From rms@gnu.ai.mit.edu Wed Aug 6 11:07:10 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" " 6" "August" "1997" "14:07:17" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "5" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id LAA21544 for ; Wed, 6 Aug 1997 11:07:09 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id OAA24249; Wed, 6 Aug 1997 14:07:17 -0400 Message-Id: <199708061807.OAA24249@psilocin.gnu.ai.mit.edu> In-reply-to: <199708061230.NAA02181@propos.long.harlequin.co.uk> (message from Andrew Innes on Wed, 6 Aug 1997 13:30:20 +0100 (BST)) References: <199708061230.NAA02181@propos.long.harlequin.co.uk> From: Richard Stallman To: andrewi@harlequin.co.uk CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf] Date: Wed, 6 Aug 1997 14:07:17 -0400 I might like to have a find-file-hooks function that sets truncate-lines to t for text files, and to nil for binary files. That sort of thing. I think it would work, now, to do this by testing whether buffer-file-coding-system is no-conversion. From eliz@is.elta.co.il Tue Aug 26 23:55:55 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Wed" "27" "August" "1997" "09:55:17" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "" "29" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id XAA02550 for ; Tue, 26 Aug 1997 23:55:53 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id JAA07948; Wed, 27 Aug 1997 09:55:18 +0300 X-Sender: eliz@is In-Reply-To: <199708270431.AAA26954@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 09:55:17 +0300 (IDT) On Wed, 27 Aug 1997, Richard Stallman wrote: > The right thing to do, for a file which has byte codes 200-377 which > Emacs can't understand, is to turn off enable-multibyte-characters. > That way it is safe to read in the file no matter what byte values it > has. When Emacs detects binary characters that don't fit into any known coding system, it assumes that coding-category-binary has been assigned an appropriate coding system. Currently, this is no-conversion by default. Handa suggested to change that default and assign to it the (new) coding system to be called raw-text that would still do EOL conversions. Will this solve the problem? If it will, then the only problem that remains is how do we make sure that a truely binary file that happens to have a few CRLF pairs doesn't get detected as raw-text-dos. This seems to call for some kind of heuristic in detect_eol_type, slightly more complicated than what's there today (which just compares the number of CRLF pairs against a compile-time threshold, currently set to 3). I also think the solution proposed by Handa is better than turning off enable-multibyte-characters, because the latter would probably mean that users won't be able to assign something different to coding-category-binary, in case they need to customize the handling of binary files. From eliz@is.elta.co.il Wed Aug 27 00:28:27 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "10:28:01" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id AAA04042 for ; Wed, 27 Aug 1997 00:28:24 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id KAA08074; Wed, 27 Aug 1997 10:28:01 +0300 X-Sender: eliz@is In-Reply-To: <199708270721.AAA16471@joker.cs.washington.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Geoff Voelker cc: rms@gnu.ai.mit.edu, handa@etl.go.jp, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 10:28:01 +0300 (IDT) On Wed, 27 Aug 1997, Geoff Voelker wrote: > This does not seem correct since a file could be on an "untranslated" > filesystem and still need a coding system (the untranslated only > refers to EOL). Do people agree that "binary" in this context really > means use LFs for EOL? If so, then undecided-unix should probably be > used instead of no-conversion. I agree, but only for files created by Emacs. An existing file should set the EOL type according to its content when it is visited and use that EOL type when it is saved, even on untranslated systems, because that's what Emacs would do on Unix (where all filesystems are currently treated as untranslated). I think the automatic decoding of EOLs has taken most of the sting out of untranslated filesystems feature, except for the case of files created by Emacs. From handa@etl.go.jp Wed Aug 27 00:33:02 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "16:33:26" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "38" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id AAA04164 for ; Wed, 27 Aug 1997 00:32:57 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id QAA01585; Wed, 27 Aug 1997 16:32:31 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id QAA16416; Wed, 27 Aug 1997 16:32:29 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id QAA04792; Wed, 27 Aug 1997 16:33:26 +0900 Message-Id: <199708270733.QAA04792@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Wed, 27 Aug 1997 09:55:17 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 16:33:26 +0900 Eli Zaretskii writes: > When Emacs detects binary characters that don't fit into any known > coding system, it assumes that coding-category-binary has been > assigned an appropriate coding system. Currently, this is > no-conversion by default. Handa suggested to change that default and > assign to it the (new) coding system to be called raw-text that would > still do EOL conversions. > Will this solve the problem? I think so, and I beleive Richard too. Richard asked me to implement my suggestion, and also asked me to turn off enable-multibyte-characters when Emacs detects a file is raw-text. But, of course, we notice that this can't solve all of the situations. I've just done it. But, I found one problem. When we set enable-multibyte-characters to nil, mode-line doesn't show any information about coding system (except for EOL format). Perhaps, we have to modify mode-line-format so that it shows `B' if buffer-file-coding-system is no-conversion and `T' in the other cases. > If it will, then the only problem that remains is how do we make sure > that a truely binary file that happens to have a few CRLF pairs > doesn't get detected as raw-text-dos. This seems to call for some > kind of heuristic in detect_eol_type, slightly more complicated than > what's there today (which just compares the number of CRLF pairs > against a compile-time threshold, currently set to 3). I don't think it's worth implementing such kind of heuristics, because there's anyway a case that we can't detect correctly. In addition, such a code makes Emacs slower on reading a normal file. Or, do you have any idea on detecting EOL format without making Emacs much slower? --- Ken'ichi HANDA handa@etl.go.jp From handa@etl.go.jp Wed Aug 27 00:45:42 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "16:46:50" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "32" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id AAA04483 for ; Wed, 27 Aug 1997 00:45:41 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id QAA02426; Wed, 27 Aug 1997 16:45:39 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id QAA17012; Wed, 27 Aug 1997 16:45:39 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id QAA04804; Wed, 27 Aug 1997 16:46:50 +0900 Message-Id: <199708270746.QAA04804@etlken.etl.go.jp> In-reply-to: <199708270721.AAA16471@joker.cs.washington.edu> (voelker@cs.washington.edu) References: <199708270431.AAA26954@psilocin.gnu.ai.mit.edu> <199708270721.AAA16471@joker.cs.washington.edu> From: Kenichi Handa To: voelker@cs.washington.edu CC: eliz@is.elta.co.il, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 16:46:50 +0900 voelker@cs.washington.edu (Geoff Voelker) writes: > This discussion about no-conversion has made me rethink part of > find-buffer-file-type-coding-system. For files that are specified to > be "binary" in file-name-buffer-file-type-alst or > untranslated-filesystem-list, the no-conversion coding system is used. > This does not seem correct since a file could be on an "untranslated" > filesystem and still need a coding system (the untranslated only > refers to EOL). Do people agree that "binary" in this context really > means use LFs for EOL? If so, then undecided-unix should probably be > used instead of no-conversion. I don't know what do you mean by "\"binary\" in this context". If you mean coding system `binary' or you use the word in a normal context, e.g. "This is a text file, that is a binary file", then `binary' measn "no code conversion (including EOL format) required". But, now I set coding-category-binary to raw-text. So, in the context of coding-category-binary, `binary' refer only to text part, and EOL is automatically detected. So, your mail makes me think that we had better: o treat `coding-category-binary' as truely binary even for EOL format, o set it to `no-conversion', o make a new category `coding-category-raw-text', o and set it to raw-text. What do you think? --- Ken'ichi HANDA handa@etl.go.jp From eliz@is.elta.co.il Wed Aug 27 00:46:31 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "10:46:08" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "39" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id AAA04520 for ; Wed, 27 Aug 1997 00:46:29 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id KAA08109; Wed, 27 Aug 1997 10:46:09 +0300 X-Sender: eliz@is In-Reply-To: <199708270733.QAA04792@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 10:46:08 +0300 (IDT) On Wed, 27 Aug 1997, Kenichi Handa wrote: > Richard asked me to implement > my suggestion, and also asked me to turn off > enable-multibyte-characters when Emacs detects a file is raw-text. I thought setting coding-category-binary to raw-text is enough. Your two-line test was all I needed to read msdos.c correctly. So why do you also need to turn enable-multibyte-character off? > I've just done it. Could you please send me the diffs so I could test this? Thanks. > > If it will, then the only problem that remains is how do we make sure > > that a truely binary file that happens to have a few CRLF pairs > > doesn't get detected as raw-text-dos. This seems to call for some > > kind of heuristic in detect_eol_type, slightly more complicated than > > what's there today (which just compares the number of CRLF pairs > > against a compile-time threshold, currently set to 3). > > I don't think it's worth implementing such kind of heuristics, because > there's anyway a case that we can't detect correctly. Then how would you suggest to solve the case of a true binary file (say, an executable program) that happens to have 3 or more CRLF pairs in it? As far as I understand, Emacs will convert the CRLF pairs on input and add a CR to any LF on output, which is disastrous in such cases. > Or, do you have any idea on detecting EOL format without making Emacs > much slower? The idea is to not give up checking the file after you've seen the first 3 CRLF pairs, but look into the file some more. I didn't think about this enough to have a working solution. I wanted first to be sure that people agree that this is the way to go. But generally, I don't think this would make the input much slower than it is already, if the heuristic is implemented in C (inside decode_coding or thereabouts). From handa@etl.go.jp Wed Aug 27 00:47:52 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "16:48:35" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "11" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id AAA04554 for ; Wed, 27 Aug 1997 00:47:47 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id QAA02490; Wed, 27 Aug 1997 16:47:25 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id QAA17064; Wed, 27 Aug 1997 16:47:24 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id QAA04817; Wed, 27 Aug 1997 16:48:35 +0900 Message-Id: <199708270748.QAA04817@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Wed, 27 Aug 1997 10:28:01 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 16:48:35 +0900 Eli Zaretskii writes: > I think the automatic decoding of EOLs has taken most of the sting out of > untranslated filesystems feature, except for the case of files created by > Emacs. I'm sorry I don't understand the wording "has taken most of the sting out of ...". Could you please tell it in an easier English? ^.^;;; --- Ken'ichi HANDA handa@etl.go.jp From eliz@is.elta.co.il Wed Aug 27 00:54:41 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "10:54:00" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "23" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id AAA04724 for ; Wed, 27 Aug 1997 00:54:40 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id KAA08125; Wed, 27 Aug 1997 10:54:01 +0300 X-Sender: eliz@is In-Reply-To: <199708270746.QAA04804@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 10:54:00 +0300 (IDT) On Wed, 27 Aug 1997, Kenichi Handa wrote: > But, now I set coding-category-binary to raw-text. So, in the context > of coding-category-binary, `binary' refer only to text part, and EOL > is automatically detected. > > So, your mail makes me think that we had better: > o treat `coding-category-binary' as truely binary even for EOL format, > o set it to `no-conversion', > o make a new category `coding-category-raw-text', > o and set it to raw-text. This might be an OK solution, but I'm afraid I don't understand how would Emacs distinguish between these two coding categories (binary and raw-text)? Let's take msdos.c and emacs.exe as two examples. The former is a text file where EOLs should be decoded, the latter is a binary file where EOLs should NOT be converted. Assuming that emacs.exe has at least 3 CRLF pairs in it, how would Emacs know which conversion to apply in each of these two cases? From eliz@is.elta.co.il Wed Aug 27 00:56:07 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "10:55:45" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "13" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id AAA04746 for ; Wed, 27 Aug 1997 00:56:05 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id KAA08130; Wed, 27 Aug 1997 10:55:45 +0300 X-Sender: eliz@is In-Reply-To: <199708270748.QAA04817@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 10:55:45 +0300 (IDT) On Wed, 27 Aug 1997, Kenichi Handa wrote: > Eli Zaretskii writes: > > I think the automatic decoding of EOLs has taken most of the sting out of > > untranslated filesystems feature, except for the case of files created by > > Emacs. > > I'm sorry I don't understand the wording "has taken most of the sting > out of ...". Could you please tell it in an easier English? ^.^;;; I mean it made the intranslated feature unnecessary in many cases where it would be usedin Emacs 19. From handa@etl.go.jp Wed Aug 27 01:25:00 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "17:24:33" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "68" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA05431 for ; Wed, 27 Aug 1997 01:24:59 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id RAA04228; Wed, 27 Aug 1997 17:23:22 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA19038; Wed, 27 Aug 1997 17:23:21 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id RAA04887; Wed, 27 Aug 1997 17:24:33 +0900 Message-Id: <199708270824.RAA04887@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Wed, 27 Aug 1997 10:46:08 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 17:24:33 +0900 Eli Zaretskii writes: >> Richard asked me to implement >> my suggestion, and also asked me to turn off >> enable-multibyte-characters when Emacs detects a file is raw-text. > I thought setting coding-category-binary to raw-text is enough. Your > two-line test was all I needed to read msdos.c correctly. So why do you > also need to turn enable-multibyte-character off? By turning enable-multibyte-character off, you can avoid seeing some garbage characters when some part of the buffer contents matches Emacs' internal format incidentally, can avoid incorrect cursor moving in such a case. >> I've just done it. > Could you please send me the diffs so I could test this? Thanks. I'll attach the current diff at the tail. Please note that it also contains patches not related to the current discussion. I'll update FSF's code as soon as we reach some agreement. >> I don't think it's worth implementing such kind of heuristics, because >> there's anyway a case that we can't detect correctly. > Then how would you suggest to solve the case of a true binary file (say, > an executable program) that happens to have 3 or more CRLF pairs in it? > As far as I understand, Emacs will convert the CRLF pairs on input and > add a CR to any LF on output, which is disastrous in such cases. The 100% safe way is: o set default value of enable-multibyte-character to nil, o or register the target file name in file-coding-system-alist, o or visit the file by C-x RET c no-conversion RET FILENAME. >> Or, do you have any idea on detecting EOL format without making Emacs >> much slower? > The idea is to not give up checking the file after you've seen the first 3 > CRLF pairs, but look into the file some more. I didn't think about this > enough to have a working solution. I wanted first to be sure that people > agree that this is the way to go. But generally, I don't think this would > make the input much slower than it is already, if the heuristic is > implemented in C (inside decode_coding or thereabouts). The problem with the current file-reading mechanism is that it doesn't read the whole text at once, instead it does: 1) reads one bunch 2) detects coding 3) if coding is decided, decodes the bunch just read, goto 5) 4) goto 1) 5) reads the remaining bunches while decoding them by the decided coding. So, currently, if the detecting routine at step 2 can't decide EOL format, it insert the text as is in a buffer. I understand that we had better change this mechanism. But, it requires another big change in the current code, and I'm afraid it will delay shipping Emacs much more. And, I beleive the patch I attached will save most cases. Considering the trade off between making code-detection not that slow and making code-detection more intelligent, I think the former is important as fas as we can't have a 100% correct code-detection. --- Ken'ichi HANDA handa@etl.go.jp From handa@etl.go.jp Wed Aug 27 01:27:11 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "17:27:58" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "30" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA06114 for ; Wed, 27 Aug 1997 01:27:10 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id RAA04510; Wed, 27 Aug 1997 17:26:48 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA19240; Wed, 27 Aug 1997 17:26:47 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id RAA04900; Wed, 27 Aug 1997 17:27:58 +0900 Message-Id: <199708270827.RAA04900@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Wed, 27 Aug 1997 10:54:00 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 17:27:58 +0900 Eli Zaretskii writes: >> But, now I set coding-category-binary to raw-text. So, in the context >> of coding-category-binary, `binary' refer only to text part, and EOL >> is automatically detected. >> >> So, your mail makes me think that we had better: >> o treat `coding-category-binary' as truely binary even for EOL format, >> o set it to `no-conversion', >> o make a new category `coding-category-raw-text', >> o and set it to raw-text. > This might be an OK solution, but I'm afraid I don't understand how would > Emacs distinguish between these two coding categories (binary and > raw-text)? Only by consistency of EOL format. If consistent, it's raw-text, if not, it's no-conversion. > Let's take msdos.c and emacs.exe as two examples. The former is a text > file where EOLs should be decoded, the latter is a binary file where EOLs > should NOT be converted. > Assuming that emacs.exe has at least 3 CRLF pairs in it, how would Emacs > know which conversion to apply in each of these two cases? No way. We can assume any rare cases which make any code-detection mechanisms fail. --- Ken'ichi HANDA handa@etl.go.jp From handa@etl.go.jp Wed Aug 27 01:29:03 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "17:29:18" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "166" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA06187 for ; Wed, 27 Aug 1997 01:29:01 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id RAA04680; Wed, 27 Aug 1997 17:28:08 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA19303; Wed, 27 Aug 1997 17:28:07 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id RAA04906; Wed, 27 Aug 1997 17:29:18 +0900 Message-Id: <199708270829.RAA04906@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Wed, 27 Aug 1997 10:55:45 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 17:29:18 +0900 Oops, sorry, I forgot to attach the patch. Here it is. --- Ken'ichi HANDA handa@etl.go.jp ------------------------------------------------------------ begin 664 all.diff.gz M'XL("![=`S0``V%L;"YD:69F`-P]^W?31K,_FW/_B,6%QJ:6X[<3YZ-M,('/ M-`\:!]J>TA,4:6V+R)*1Y(3,[.[,[/SVEW5 M=D8C9IA6<,@,@W^PW+G-'__:HREU+O_\_[>9D]_E0NW5]T[;Y M*`045JUN\JEIA4:C5JU5MSO543C:=+DSW3PPS_G(<7G5\1:@%B#N/7KT:#U: MA9,Y9[OS,6MT6*/6J[=ZS2ZK;V]W[QF&<7U#A5^X+=`!J]5K-WOUFD!_E/ZA M'C5;E5:-T2UC\.\;=N2Q7[GG?&"F9[-HPMG@R0$['G9JM5J%V?[\S.6&[4=L MS*.0A5;`+Z&]^:P*R/CX,:O>,_!*W#SX$_Y^$K]5"[],'&O"]G`4+/+9/.3X MQ_*]"QY$[&1PPG`H]$X`[3OA3#RK$(&SJX@;EC^=P:-%$.KXF'L\,"-.(\#G M#/EDN$X8@5"KQ$H8>W,;I&X8N:QI=2OM9HHUP_F9[03Q$;!?Z4.F'-`WQ`A!25J^J]^VSXZLG3P?'P\?NY MZ;@L/#]7S3TC+JB1V((4LL=VK,CQ/9-Z,?(#UI\X'@'1HI%_()Y]@O"]V7^[B,[S^Z>AX;_<0K_9>'1^]E-<'@V&?5/D^ M,UVWAV\/=OM#?"5'C9>_'!WO/_V44CK%AA.?<2^`+LC,X`IAHQWQIX?`+]*'H<9(F8`=[PQFSA1&<5DXAP*D%5@]4!0?&3. MW8A=F"Y8$G_$J(-"Y^N-!C!H2ZG`UV,0D;M3[,G5IV:G4N^T4E;@>.Z2P8I- M#"GY)OH7H=+]7T_VC@^,IX,^/(P?SJT)`\,5QX=39W7!O8 M,)N9T>33IN.!NGLF30EW$PC`_#`L[X*]070&/^"IH/=LHW1F1M;$4"#2@D;E M#6;`=!0=2R;Q9M*QQ7Z,F*"5LJVBUT*6W^B6-O3G@<6EG?;(&ZR>U6;`B88- MP@Z@C"Z\'?PH_[AZ?#'S6>/8";'Y>\L>"5!>^(UA,. MEHNCHGHXD/G,!G88L88;R"MPH-SEH#(P,9COVI*!E^00%8N0G2#*R`0;:#.( M+90UP8[J,Z:79W*PJVA"4?Z$JR!VP"F#!MU'#4*`$;T&SXQC0>75`1C.&VY- M_,24?63C@,\08_2)?<\V;7ZQZO-=J4.\8+N M6S_+4GS'"K_#-):3^`_V\2.;GL>S^A]L2%:H,`'=STA$-IJ2Q@IAD$+\'JGB#T`L?60XW MYE,S/&>U'8'TX0*QOOT6'$>D&[LRVY&TTZJ9TV`<,_TUC3*I:,+.*FE;+OA( M-O7#R+VBZYXP1FTP1IVN,D9W7SX9Z-@D_(4"N]:FK)H$''DRSQSRJO%LG"X)1`6I$_Y M3J4$2&4`@OZ]RNL"1HD)7=#?:I%&4X(XEI5*R5@6R:M7Y3)A``X$ND`Q?D'- MT@M\A(PH63`C$D16XA]FIF=35PT<%/8EI@>((8_>:PR%R:;CX\\]@\&_-`^@ MJ0P?`!4A$RZ06"99;H0P>H0JD9:8T,P%%WB2'R[#V6&@LR^+%_&P77F+V"/H M.>1!WMB88K#$BMB;HAB^2X@P[9>$`0JHG-`C+KAB]*X8-5D$'/4%0,O4R'"\ MV3PRICR:^#;S*(J$$9]HR1-!,`F!:0),@,B!/&D\A\@DXA\B63M#[>YL*P]S M9[7[VX"'E!*&ZRHY9JMHP;^(GINSF7NUH.6B>S?1Z2\HW3OF,GQO=#.7(1!N MX3($8F$(7AB-?!UL>ZW7;O5JV[=Q&9*:[C+:O7JCUVPM=QF-5JW2:+6%R\#F MQ(-DEA5%"4:VYZ((P7Y@>8L*!6=S+(I0_5M8EZ+2%*QX0)]LM#CA51CQ*406 M;",P+PV4.FNS'R)\4CPV+TD/*K(H(6LP""-K$B$+H`%(7K>,,P>?VF@5RX`- M__(:@GYO.*$/+<C"[B--@/+V@X@^$1P\>0,N(0ND21>P(?AH6_?0]FS/,: MS<:-4LD,+<>!S!057/V7[W_KW4JST4S\+P5PHF<0:?$Q6!(#^]4U,)8JI/MH MN+YUOA)M*XNVA6AAV%B*=>:,VX7"AB6*VW0;6X9%6,\,K@#:\V4Z'H)V*7&" MA5!,G@4.V*SH2K`G=WS"8#?JP(YN8K#_F>Q0&GM+3MPIXW83NW8;DX;VYUG@ M"/O38+7M7KO1:[1N;LVRAJS3J]=[[?JJV+=1@0@AF7OU'AM.G%%DO!@,60D< MS\'0^,GTWCEE8;RF8%A@FD,(\,*$@`_7?E[V*X3:Z#$P$Z138..0XV@5IJ9W MQ<#Y.:87A0*PV6-/0(>R!-5:$M)#]6KU&&C*!>A%A?7[^W#CCP-S&N+%A6/S MV/Y`(B8-D>F.4;,F4XQ8#@[W#HX.!WT,5DQM>4JLG-G`0=>\@A[XF(H!.?"N MG'PMU:AURXC4GA[UC>')\>#PN:!G^]9\"JZ:V"^M^'+L9_NHB>&,6\X(XY8I M5J!G$,Z,_3ODG#LUMYN5YO9V,C5W=M@N1L%!"*P"";*5@M9*QQ61IZ2H0`LH1K'I13C7DP8]'5C)\Y%K7S M.+YE/S!J+;ZO-[KI-F.]P';Y5&O71*T%D9Y![!_K2Q5[0A-N&[=:U)()]X]B M?/N?R?B\&=#JU"NM3C]FX,['H M'9=.KN)W0?&[W93I!UL3SL]"QW;,N$4FB52$+['\(.#AS/=L*NF8:2A%2/DN MFT<@8/"+W+,-?V0HMP@.2AJOG)%SWS50%G'\+XS'?Z2$FDH_A)5+.FSD\YGN M!$Y-5DM(6[M@@;?2%OA.CQ^-9\(#36$_GQ]Y^M%I;54Z:LL;,JS3KEYEDF- M`;B`4*44&/$U0Y@X^)WH$8AN$('[!7F+X'`Q',2=&2S@IEU52,)2<`\]CD$U M-+':K7Q/F#A@?<0:Z[(L8X2!DKDS*2JM#JZ;HRK@&R6I"JEP`*$K)9=MK+G5 MZCV\N$F6&E/2T]1FKUF/%WSR%+[>@9BLWNTF^P5QKQ#%YS"GC*3@#%8B78*N MJLK:SL[.-]]\8\XC'UF9X MV7HU="BU*'.X>["'VX,(3"08\6N9Z*#)4N7#I`H<[L`@/#1/KGM58:#RX<2? MN[9,-J[(IKW%`6P@3V*JT!;T`5\*FG)#6!1PK@KGVA()*SY[FF5FCQ4%`\1D MDCQAI7BIQS!#302Y*T'$Q;(D(X)BP7C1FD8?AFV*1J:&)@::I(X:D'H/%4=8M MR,30&E`)*F#S/C M_=R/=&T4:&+@!:W%$BTG&A![H*>/`V:5(;I+=7E)H2`6HI*N`!^.(<4#HS`V@S.R;^:("BO` ME@P[4MR48SOC8T=LL*['` M=S.&;"#I91);&%4A7R`%M1X))AO;-^U$`Y,1(R_X>US3*WG1A#4I`"VSC0WI MED*8\L*Y*!01F0;^F)2\("P%A*4S"#OEV$0H6Y>M%%:Q52Q2:E--:OXRGDI^ M*B.08X(FW#K7)S3EE@I>TU5<-S8@QIO)^;DP!538"<[)B\)E=-**1\,RIHZ7 M`I`ZD%&`HO)`'&Q?S.:BG!+K2W]!S"6US"L&5M;$O6BE5INI@FX22G(1L8BV M!J`@3M47DM&7/QSBOXV'87PC+LIOO*)0!D$1=:TN=`V2[)>[_9]VG^\9&)MD MH1HQU/[NX?-7`*9!+&HIP.WV3P:O=T_VC&>O#OM9*.&6-226W:51?504#;9%@QE@(4Q9S*VE(750Y=M= M$X,C$W*57%#4T;$H2C4UZZ=QQW25-+?7SM*V\&*>E2:BHF4A95MO!N+>D MUA.$QL6%`I$7;-7CDU)_;U[`XDTK>)?9N+(Z37B+W=E8L7M+)0KR\JOG"L)) M4`:.9C1_8',/@@+VEGB7)HV$Y5!PX0=W/LBM1O&YOS=%?7/IFR*R1&,$_/L% M]WC*?4>T+=]4*TX5(A/PJ>2)*C&*D@"DU([(/X%(>G<258K$D&XV(,$KJCMI M2O59"=7RQ$G/FV(FY2#D9$$I(H4X1@_9;1(C+0^)FY*9F:STKDI!TBCIN&$A M$4B`%>SU&8O>`,6+,EB\)G](C)MN1QE;94@78@^50L01ZLH<0@N%;YA$Y$7: M6A:Q,G]@&B,S\:K6H5MG$%0:D\'DNLG#&B'6DH2!^GM]QK!.KO"96<*:R<^R MQ`U+&<(^&7CZ33)1A2K79A'0>;"*-\PBKLT:5O,V$0!-^[B[P.RGXMQ&B3?(,?S2)9#AE]YIF!@1S9Z')-@P-N]QE!`;-+41>>&-MA/Y MOBA,TT[B'(.?;#&6@*O<25Y=+4/!&2W6L[0BJ!8^:IXD/<[DQ6*Y!T7E\Q!= MQ,0,V=N'8=JI5L2Q3WE0N*A2(-G'_,*@/KD6.H435>^0QB=M%W;&-.I4O`#/ M*H=\@4JB+]+0I[;LJ@Q-;A:9AK&*",$FU#)MQ]+#(6=8&:L&]Q1<,ISE=3E% M4E;E%D5A%Y>4P;1TI:5%8^ON.#LFGCR9,PW MK:L`)WJT:L$Y'_B:!>=\)'U;-"XX=]?8%KV$4OJ`1Z/6:Z]8<&XU*BW]1."0 M1Y`TI@[A<._""7R/-C*5#E[M[Y7%%W&P4=>QV.[^R1Z$AV`##W][01\V"7/J M`V)+`6KC?&;05@9#)UQ4Y,!?6_(2=\AO;;6WC79LJ9)WIBM7VR^\JW>L&#^_ M,B]Q6V)1'4GZGWLB6A>[8^6V&QBT?E#LWS-HK=J2OTEL&;%E[+H+4Y%[8W@P M66,>)I#K3L($HS"^<=H&@(2(P//J"^CMV74UX*3IWER#_,L(9Q& MT([Q+,`+&>$)$59(KG$F+3N_D]JRE4$A<([><+`".E$`CS+@Q4'B%+G MA\"\")M#DDX+Y&M)>ET)?O=Y&G'')*T.:GTA(>?N7VQ7NO7,],X;0F%#&\+2 M[FWV^DR?&/`R9;PC`\ZA\/3%W*7]D)U>;9W/5N03RKC';J_67.$>:Y56YU;1Z6MHD_(MSEX/ MAOW!X-8Q6D))%%DV+N('QH6#9W_C0*V8>O4^B(-0B*A>#WX^%FN;0%:<)XF/ M%VA$<\%D`#S`5MAU0/2<"#)=-S>`<8\77:CK=2E<[*U_"5:WY;,8#K%*6Z=.)>(!2.YXH@R?Q MN5TJ'@,,556$ON-!3ZKHSIS(=)W_Y7:63/;+KJ1PK2H[@K>!\A:#^/N=EWB. M%%=_L3BZ*>HWICC+28HEOH&0.9Z!7U,D.E@(P6].GOD7O,(FXJNBX9R^QV$R M.D5=\ MOW#$H+S\1A*^VH"U%0QUUDO]IW M5Z1Y4Z'1ZE0:[;KV%8YV`Q[$W\S%#Z">'IV]PZ7WGT7/3I6,?^:^>XHG3G:R M@**(=XHUP=,4T@+DS`^C4V3(:<)\H`RV\)0XI#W>@:0AA>OYJ;?$<"P7!UX: M3JP!J-Z(DB^T(>B+),2Z,EE[O]4^.CE^R MTFL:ZRF-%96$G]*V&'7:Z]MOV?W#P3X`_BI0EF,8WRNI_F[]H8Z+?<2/T;'' MC[%+I_VCIWNGR!=XFGW8%">6&[@-KM'5SRS_^Z3TSQ1/[J3JUIH@KGHM.=G\ MD0'*X/#Y:7_W9._YT?%OIP>[PY].GPP.=X]_*^\H@XP_6`M`WH%T2;1B)U.` M!ZRO(A+1$#\,D3@CDC)*R)2KFRZ8+(H2I4C5Y[J)GP7ZY.%CW#:!YR-C$PA) M)L28@55A\.L4'$E9=CP-1V'.*=4?OE.\"Y7T[9J8+ZWZBWM`[#`(^GZH*>X0`"6R6T21V$*1(-&]!] M2),09:B)5(A&@SI4\;_O/GRQ\P&$3>MX:A5\/OO.OKN?<[Y@T^+G!Q[QKB:W MN*7*V\;U#.O+P@IG^$."NL#,,XX&S$B%I:MHUH"@P$3& M.\@$5=K4V+*>U%_Q>8^XK"CE M#\.SK;[KS55E=,@?8549*3\#S^G>L`2QRC'-DPWF46/]BLR9(`]?&!.TSP:, MULMLF.>HGNV.HMJ)Y8NU+Z.Q/PR45[-KM>]\IBVELP"'221@;KF,=Q4BA<\.T>E` MVQ8BP(2:9D_D,>C;U26SG&T/[=@]8KF^^G9$-,\EF1.(,".>=E>J.!)G!7EJ M)GD^1R-.K0H?V/]BB\=5@#(Y!'393HU6GC&XORC50(JN_-[7*7T?9(HO?&IU MK@WGOG'<=NH`SV27_,%20&$?(336`GU`X>@Z$`]HFUJMN+'.:&>=B3337?6W 5TJ^\#VNG;\+GWX6_`2X)YP='=``` ` end From eliz@is.elta.co.il Wed Aug 27 02:13:22 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "12:12:01" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "20" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id CAA07268 for ; Wed, 27 Aug 1997 02:13:21 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id MAA08303; Wed, 27 Aug 1997 12:12:03 +0300 X-Sender: eliz@is In-Reply-To: <199708270824.RAA04887@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 12:12:01 +0300 (IDT) On Wed, 27 Aug 1997, Kenichi Handa wrote: > By turning enable-multibyte-character off, you can avoid seeing some > garbage characters when some part of the buffer contents matches > Emacs' internal format incidentally, can avoid incorrect cursor moving > in such a case. I see. In that case, I agree that it would be better to have the modeline still show the coding system in the case where Emacs sees unknown binary characters in the file. ?t and ?b (for text and binary files, accordingly) are good enough for me. > And, I beleive the patch I attached will save most cases. Considering > the trade off between making code-detection not that slow and making > code-detection more intelligent, I think the former is important as > fas as we can't have a 100% correct code-detection. I agree. Any heuristic should be based on user experience, and we cannot have that unless Emacs is released ;-). From handa@etl.go.jp Wed Aug 27 04:08:38 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "20:08:58" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "30" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id EAA09013 for ; Wed, 27 Aug 1997 04:08:37 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id UAA11196; Wed, 27 Aug 1997 20:07:48 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id UAA26086; Wed, 27 Aug 1997 20:07:47 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id UAA05054; Wed, 27 Aug 1997 20:08:58 +0900 Message-Id: <199708271108.UAA05054@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Wed, 27 Aug 1997 12:12:01 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 20:08:58 +0900 Eli Zaretskii writes: > On Wed, 27 Aug 1997, Kenichi Handa wrote: >> By turning enable-multibyte-character off, you can avoid seeing some >> garbage characters when some part of the buffer contents matches >> Emacs' internal format incidentally, can avoid incorrect cursor moving >> in such a case. > I see. In that case, I agree that it would be better to have the modeline > still show the coding system in the case where Emacs sees unknown binary > characters in the file. ?t and ?b (for text and binary files, > accordingly) are good enough for me. The remaining matter is how to show the state of enable-multibyte-character in mode line. Now, two letters (`-' and coding system mnemonic) before EOL indicator (`:', `\', or `/') means enable-multibyte-character is t and one character `-' means enable-multibyte-character it nil. If we just show `t' or `b', it's hard for users to know the status of enable-multibyte-character. My idea is to turn the first letter `-' to `='. And, I prefer `=' to `b' because mnemonic letter of `no-conversion' is `='. Another change I want to do is to change mnemonic letter of `emacs-mule' from `=' to `M', then `=' always tells that the buffer contents are binary code. Richard, what do you think? May I change the current code as above? --- Ken'ichi HANDA handa@etl.go.jp From eliz@is.elta.co.il Wed Aug 27 04:17:15 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "14:16:30" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id EAA09124 for ; Wed, 27 Aug 1997 04:17:13 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id OAA08460; Wed, 27 Aug 1997 14:16:31 +0300 X-Sender: eliz@is In-Reply-To: <199708271108.UAA05054@etlken.etl.go.jp> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Kenichi Handa cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 14:16:30 +0300 (IDT) On Wed, 27 Aug 1997, Kenichi Handa wrote: > My > idea is to turn the first letter `-' to `='. And, I prefer `=' to `b' > because mnemonic letter of `no-conversion' is `='. Seems OK to me. > Another change I want to do is to change mnemonic letter of > `emacs-mule' from `=' to `M', then `=' always tells that the buffer > contents are binary code. It bothers me for some time that emacs-mule and no-conversion have both the same mnemonic, so the change is welcome. However, I would suggest to leave emacs-mule be `=' and invent a new letter (`b'? `B'?) for the binary case, since `=' means ``the usual case'', which is emacs-mule. The binary case is the exception. From handa@etl.go.jp Wed Aug 27 05:14:07 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "21:09:41" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "22" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA10528 for ; Wed, 27 Aug 1997 05:14:06 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id VAA13952; Wed, 27 Aug 1997 21:08:29 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id VAA28850; Wed, 27 Aug 1997 21:08:29 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id VAA05105; Wed, 27 Aug 1997 21:09:41 +0900 Message-Id: <199708271209.VAA05105@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Wed, 27 Aug 1997 14:16:30 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 21:09:41 +0900 Eli Zaretskii writes: >> Another change I want to do is to change mnemonic letter of >> `emacs-mule' from `=' to `M', then `=' always tells that the buffer >> contents are binary code. > It bothers me for some time that emacs-mule and no-conversion have both > the same mnemonic, so the change is welcome. However, I would suggest to > leave emacs-mule be `=' and invent a new letter (`b'? `B'?) for the > binary case, since `=' means ``the usual case'', which is emacs-mule. > The binary case is the exception. Unfortunately `B' is already used by `chinese-big5'. And I want to keep the letter `b' for a coding system which we may support in the feature. And I think `=' is a good mnemonic for `no-conversion' because the files' external and internal (to Emacs) codings are `equal'. In addition, all the other frequently used coding systems have non-symbol mnemonics. So, using the symbol `=' for the exception (no-conversion) seems reasonable. --- Ken'ichi HANDA handa@etl.go.jp From rms@gnu.ai.mit.edu Wed Aug 27 09:23:57 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "12:25:17" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "8" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA21124 for ; Wed, 27 Aug 1997 09:23:56 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA29840; Wed, 27 Aug 1997 12:25:17 -0400 Message-Id: <199708271625.MAA29840@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Wed, 27 Aug 1997 09:55:17 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 12:25:17 -0400 Handa suggested to change that default and assign to it the (new) coding system to be called raw-text that would still do EOL conversions. Will this solve the problem? NO! It is impossible to solve the problem unless enable-multibyte-characters is set to nil. From rms@gnu.ai.mit.edu Wed Aug 27 09:37:08 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "12:38:39" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "22" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA21924 for ; Wed, 27 Aug 1997 09:37:07 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA29926; Wed, 27 Aug 1997 12:38:39 -0400 Message-Id: <199708271638.MAA29926@psilocin.gnu.ai.mit.edu> In-reply-to: <199708270721.AAA16471@joker.cs.washington.edu> (voelker@cs.washington.edu) References: <199708270431.AAA26954@psilocin.gnu.ai.mit.edu> <199708270721.AAA16471@joker.cs.washington.edu> From: Richard Stallman To: voelker@cs.washington.edu CC: eliz@is.elta.co.il, handa@etl.go.jp, andrewi@harlequin.co.uk, rms@gnu.ai.mit.edu Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 12:38:39 -0400 This discussion about no-conversion has made me rethink part of find-buffer-file-type-coding-system. For files that are specified to be "binary" in file-name-buffer-file-type-alst or untranslated-filesystem-list, the no-conversion coding system is used. If a file is binary, no-conversion is right; but in addition, enable-multibyte-characters should be turned off, so that no sequence of bytes gets misinterpreted as a multibyte character. This does not seem correct since a file could be on an "untranslated" filesystem and still need a coding system (the untranslated only refers to EOL). That is true. Files which are on an untranslated file system, whose individual names do not imply binary files, are not really "binary". The best thing to do with them is this: if the file exists, read it normally; if it does not exist, use undecided-unix as the coding system for creating it. Can you implement that? From rms@gnu.ai.mit.edu Wed Aug 27 09:43:57 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "12:45:22" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "6" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA22345 for ; Wed, 27 Aug 1997 09:43:55 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA29949; Wed, 27 Aug 1997 12:45:22 -0400 Message-Id: <199708271645.MAA29949@psilocin.gnu.ai.mit.edu> In-reply-to: <199708270733.QAA04792@etlken.etl.go.jp> (message from Kenichi Handa on Wed, 27 Aug 1997 16:33:26 +0900) References: <199708270733.QAA04792@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 12:45:22 -0400 > If it will, then the only problem that remains is how do we make sure > that a truely binary file that happens to have a few CRLF pairs > doesn't get detected as raw-text-dos. We cannot do that. For true binary files, the user has to say somehow that "this is a binary file". From eliz@is.elta.co.il Wed Aug 27 09:45:26 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "19:44:35" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "9" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA22400 for ; Wed, 27 Aug 1997 09:45:22 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id TAA09240; Wed, 27 Aug 1997 19:44:36 +0300 X-Sender: eliz@is In-Reply-To: <199708271638.MAA29926@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: voelker@cs.washington.edu, handa@etl.go.jp, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 19:44:35 +0300 (IDT) On Wed, 27 Aug 1997, Richard Stallman wrote: > If a file is binary, no-conversion is right; but in addition, > enable-multibyte-characters should be turned off, so that no sequence > of bytes gets misinterpreted as a multibyte character. I thought that no-conversion already prevents interpretation of multibyte sequences in the file, since it does I/O verbatim. What am I missing? From rms@gnu.ai.mit.edu Wed Aug 27 13:19:51 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Wed" "27" "August" "1997" "16:21:09" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199708272021.QAA31689@psilocin.gnu.ai.mit.edu>" "15" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id NAA09533 for ; Wed, 27 Aug 1997 13:19:50 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id QAA31689; Wed, 27 Aug 1997 16:21:09 -0400 Message-Id: <199708272021.QAA31689@psilocin.gnu.ai.mit.edu> In-reply-to: <199708270827.RAA04900@etlken.etl.go.jp> (message from Kenichi Handa on Wed, 27 Aug 1997 17:27:58 +0900) References: <199708270827.RAA04900@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 16:21:09 -0400 > This might be an OK solution, but I'm afraid I don't understand how would > Emacs distinguish between these two coding categories (binary and > raw-text)? Only by consistency of EOL format. If consistent, it's raw-text, if not, it's no-conversion. I think it is a mistake to try to distinguish this automatically. It cannot be done right, so let's NOT TRY. Instead, we should simply tell users that they must specify explicitly which files are true binary files, one way or another. Handa, please forget about trying to do this, and install the other changes now. From rms@gnu.ai.mit.edu Wed Aug 27 14:15:33 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "17:16:46" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "9" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA14163 for ; Wed, 27 Aug 1997 14:15:32 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA32099; Wed, 27 Aug 1997 17:16:46 -0400 Message-Id: <199708272116.RAA32099@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Wed, 27 Aug 1997 19:44:35 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, handa@etl.go.jp, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 17:16:46 -0400 I thought that no-conversion already prevents interpretation of multibyte sequences in the file, since it does I/O verbatim. You're lumping together two entirely different issues. no-conversion means that the bytes are not translated when they are read in. What they mean in the buffer is another matter! But perhaps no-conversion SHOULD turn off enable-multibyte-characters. Handa, what do you think? From handa@etl.go.jp Wed Aug 27 17:59:04 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "28" "August" "1997" "09:59:59" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "31" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA00905 for ; Wed, 27 Aug 1997 17:59:03 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id JAA00494; Thu, 28 Aug 1997 09:58:49 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id JAA22179; Thu, 28 Aug 1997 09:58:48 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id JAA05760; Thu, 28 Aug 1997 09:59:59 +0900 Message-Id: <199708280059.JAA05760@etlken.etl.go.jp> In-reply-to: <199708272021.QAA31689@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Wed, 27 Aug 1997 16:21:09 -0400) References: <199708270827.RAA04900@etlken.etl.go.jp> <199708272021.QAA31689@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Thu, 28 Aug 1997 09:59:59 +0900 Richard Stallman writes: >> This might be an OK solution, but I'm afraid I don't understand how would >> Emacs distinguish between these two coding categories (binary and >> raw-text)? > Only by consistency of EOL format. If consistent, it's raw-text, if > not, it's no-conversion. > I think it is a mistake to try to distinguish this automatically. > It cannot be done right, so let's NOT TRY. > Instead, we should simply tell users that they must specify explicitly > which files are true binary files, one way or another. I agree. But, we anyway have to define Emacs' behaviour when it encounter such a file that has random 8-bit code in text but has consistent EOL format, or and has inconsistent EOL format. What Emacs should do in these cases? I think using raw-text-XXX in the former case and using no-conversion is the latter case is reasonable. > Handa, please forget about trying to do this, and install the other > changes now. What do you mean by "this"? o introducing coding-category-raw-text? o implementing some more intelligent EOL detecter? o all the changes about handling raw-text? --- Ken'ichi HANDA handa@etl.go.jp From handa@etl.go.jp Wed Aug 27 18:13:06 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "28" "August" "1997" "10:14:04" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "12" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA01601 for ; Wed, 27 Aug 1997 18:13:05 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id KAA01489; Thu, 28 Aug 1997 10:12:54 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA23642; Thu, 28 Aug 1997 10:12:53 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id KAA05777; Thu, 28 Aug 1997 10:14:04 +0900 Message-Id: <199708280114.KAA05777@etlken.etl.go.jp> In-reply-to: <199708272116.RAA32099@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Wed, 27 Aug 1997 17:16:46 -0400) References: <199708272116.RAA32099@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Thu, 28 Aug 1997 10:14:04 +0900 Richard Stallman writes: > But perhaps no-conversion SHOULD turn off enable-multibyte-characters. > Handa, what do you think? I agree, and your mail just reminded me that you actually asked that kind of change long ago. I'll do this change. --- Ken'ichi HANDA handa@etl.go.jp From handa@etl.go.jp Wed Aug 27 18:48:37 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "28" "August" "1997" "10:49:29" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "22" "Re: Terminal coding systems on DOS_NT" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA03311 for ; Wed, 27 Aug 1997 18:48:36 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id KAA04465; Thu, 28 Aug 1997 10:48:20 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA26471; Thu, 28 Aug 1997 10:48:19 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id KAA05857; Thu, 28 Aug 1997 10:49:29 +0900 Message-Id: <199708280149.KAA05857@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Wed, 27 Aug 1997 17:10:50 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu Subject: Re: Terminal coding systems on DOS_NT Date: Thu, 28 Aug 1997 10:49:29 +0900 Eli Zaretskii writes: > mule-cmds.el disables terminal and input coding systems in the Mule menus > when window-system is non-nil. I think this should be enabled for > MS-DOS. I'm not sure about NT, but it probably should be enabled there > also. Geoff? > Btw, why are these coding systems irrelevant for X-Windows? Terminal coding system is used only when Emacs is running on some terminal. Keyboard coding system is for accepting multibyte characters sent from terminal (perhaps via some input method embeded in the terminal). For instance, kterm (Japanized xterm) can have Umm input method which sends iso-2022-jp or euc-japan to a program running under kterm, cxterm (Chinese xterm) sends big5 or euc-china, hanterm sends euc-korea. So both of them has no meaning when (at least) X window system is being used. But, I don't know about DOS and NT. --- Ken'ichi HANDA handa@etl.go.jp From rms@gnu.ai.mit.edu Wed Aug 27 18:53:03 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "21:54:34" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "4" "Re: get-file-buffer and find-buffer-visiting" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA03433 for ; Wed, 27 Aug 1997 18:53:02 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id VAA00564; Wed, 27 Aug 1997 21:54:34 -0400 Message-Id: <199708280154.VAA00564@psilocin.gnu.ai.mit.edu> In-reply-to: <199708272054.NAA17158@joker.cs.washington.edu> (voelker@cs.washington.edu) References: <199708272054.NAA17158@joker.cs.washington.edu> From: Richard Stallman To: voelker@cs.washington.edu CC: eliz@is.elta.co.il, andrewi@harlequin.co.uk Subject: Re: get-file-buffer and find-buffer-visiting Date: Wed, 27 Aug 1997 21:54:34 -0400 Since DOS_NT is case insensitive, does it make sense to change get-file-buffer to ignore case? I think so, if there is no other difficulty. From rms@gnu.ai.mit.edu Wed Aug 27 20:19:30 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "27" "August" "1997" "23:20:49" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "16" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id UAA06035 for ; Wed, 27 Aug 1997 20:19:29 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id XAA01157; Wed, 27 Aug 1997 23:20:49 -0400 Message-Id: <199708280320.XAA01157@psilocin.gnu.ai.mit.edu> In-reply-to: <199708271108.UAA05054@etlken.etl.go.jp> (message from Kenichi Handa on Wed, 27 Aug 1997 20:08:58 +0900) References: <199708271108.UAA05054@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Wed, 27 Aug 1997 23:20:49 -0400 My idea is to turn the first letter `-' to `='. This change would be ok with me, but it isn't what the manual says, and changes in this area of the code have tended to introduce bugs. So please don't change this. Another change I want to do is to change mnemonic letter of `emacs-mule' from `=' to `M', then `=' always tells that the buffer contents are binary code. The manual is already printed and says it is =, so don't change this. Please don't make any change in this area of Emacs, and work on the other issues which are more important. From rms@gnu.ai.mit.edu Wed Aug 27 23:18:48 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "28" "August" "1997" "02:16:53" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "15" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA12200 for ; Wed, 27 Aug 1997 23:18:47 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id CAA01806; Thu, 28 Aug 1997 02:16:53 -0400 Message-Id: <199708280616.CAA01806@psilocin.gnu.ai.mit.edu> In-reply-to: <199708280059.JAA05760@etlken.etl.go.jp> (message from Kenichi Handa on Thu, 28 Aug 1997 09:59:59 +0900) References: <199708270827.RAA04900@etlken.etl.go.jp> <199708272021.QAA31689@psilocin.gnu.ai.mit.edu> <199708280059.JAA05760@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Thu, 28 Aug 1997 02:16:53 -0400 Please implement the raw-text coding system as we have already described it. Please DO NOT try to distinguish heuristically "real binary" files from "raw-text files". I agree. But, we anyway have to define Emacs' behaviour when it encounter such a file that has random 8-bit code in text but has consistent EOL format, or and has inconsistent EOL format. What Emacs should do in these cases? Distinguish raw-text-unix and raw-text-dos and raw-text-mac just the same way as you do for most other coding systems. THis issue is NOT IMPORTANT! Stop spending time on it! Implement raw-text in the natural way, as I have explained it, and STOP PAYING ATTENTION TO IT and MOVE ON TO SOMETHING ELSE. From handa@etl.go.jp Thu Aug 28 00:17:42 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "28" "August" "1997" "16:18:28" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "23" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id AAA13937 for ; Thu, 28 Aug 1997 00:17:41 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id QAA23158; Thu, 28 Aug 1997 16:17:19 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id QAA13875; Thu, 28 Aug 1997 16:17:18 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id QAA06591; Thu, 28 Aug 1997 16:18:28 +0900 Message-Id: <199708280718.QAA06591@etlken.etl.go.jp> In-reply-to: <199708280616.CAA01806@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Thu, 28 Aug 1997 02:16:53 -0400) References: <199708270827.RAA04900@etlken.etl.go.jp> <199708272021.QAA31689@psilocin.gnu.ai.mit.edu> <199708280059.JAA05760@etlken.etl.go.jp> <199708280616.CAA01806@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Thu, 28 Aug 1997 16:18:28 +0900 Richard Stallman writes: > Please implement the raw-text coding system as we have already > described it. Please DO NOT try to distinguish heuristically "real > binary" files from "raw-text files". > I agree. But, we anyway have to define Emacs' behaviour when it > encounter such a file that has random 8-bit code in text but has > consistent EOL format, or and has inconsistent EOL format. What Emacs > should do in these cases? > Distinguish raw-text-unix and raw-text-dos and raw-text-mac > just the same way as you do for most other coding systems. > THis issue is NOT IMPORTANT! Stop spending time on it! > Implement raw-text in the natural way, as I have explained it, > and STOP PAYING ATTENTION TO IT and MOVE ON TO SOMETHING ELSE. Ok, then I've already done what necessary. I'll update FSF's code soon. --- Ken'ichi HANDA handa@etl.go.jp From eliz@is.elta.co.il Thu Aug 28 02:06:03 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "28" "August" "1997" "12:05:25" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "40" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id CAA18163 for ; Thu, 28 Aug 1997 02:06:01 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id MAA10627; Thu, 28 Aug 1997 12:05:26 +0300 X-Sender: eliz@is In-Reply-To: <199708272021.QAA31689@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Thu, 28 Aug 1997 12:05:25 +0300 (IDT) On Wed, 27 Aug 1997, Richard Stallman wrote: > Only by consistency of EOL format. If consistent, it's raw-text, if > not, it's no-conversion. > > I think it is a mistake to try to distinguish this automatically. > It cannot be done right, so let's NOT TRY. There's nothing wrong IMHO with a mechanism that does detect binary files most of the time, even if it doesn't work in all cases. I have added such capabilities in various DOS ports of GNU tools (e.g., see the DJGPP port of Grep) and never heard any complaints. The diffs that Handa has sent me seem to implement this consistency test already, and don't seem too resource-consuming. I agree that further refinement of the binary file detection could be delayed until more user experience is available, but I don't think it can be dismissed altogether. > Instead, we should simply tell users that they must specify explicitly > which files are true binary files, one way or another. I believe this should prove as a nuisance. It was enough of a nuisance in the DOS_NT world to introduce the file-name-buffer-file-type-alist so frequently-used binary files will be recognized automatically. This solution is IMHO not good enough in the presence of coding, since e.g. *.c and even *.text files could include strings encoded in non-English languages. But I believe that with small changes in the coding-detection code we could make Emacs recognize most of the binary files. I don't think users will like the requirement to have in effect two different ways of visiting files. Unix users have never before distinguished between binary files and the other kind, and I think they will want to keep it that way. I'm afraid that if we neglect to take some reasonable care of this issue, it might become the single most important drive for users to setq enable-multibyte-characters nil. From eliz@is.elta.co.il Thu Aug 28 02:24:59 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "28" "August" "1997" "12:24:51" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "35" "Re: get-file-buffer and find-buffer-visiting" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id CAA18542 for ; Thu, 28 Aug 1997 02:24:57 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id MAA10739; Thu, 28 Aug 1997 12:24:52 +0300 X-Sender: eliz@is In-Reply-To: <199708272054.NAA17158@joker.cs.washington.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Geoff Voelker cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: get-file-buffer and find-buffer-visiting Date: Thu, 28 Aug 1997 12:24:51 +0300 (IDT) On Wed, 27 Aug 1997, Geoff Voelker wrote: > I have a question about get-file-buffer and find-buffer-visiting. A > user has encountered a situation where get-file-buffer is invoked in > different situations with the same filename, except that the filename > differs in case in the different situations. get-file-buffer only > returns the associated buffer for the situation where the case > matches, and find-buffer-visiting returns the buffer independent of > case. > > Since DOS_NT is case insensitive, does it make sense to change > get-file-buffer to ignore case? Similar problems had popped up before. My impression from the discussions back there is that it boils down to this: should we consider file names which only differ in the letter-case as the *same* file name, or *different* names that refer to the same file? Emacs currently supports the former interpretation. get-file-buffer is documented to require exact match of the file name, and find-buffer-visiting is documented to test for other buffers that might visit the same file, perhaps under different names. If we want to interpret file names case-insensitively, I would suggest introducing a special function for filename comparison that on DOS_NT (and VMS?) will be case-insensitive, and change all the places where file names are compared with string-equal to use this new function instead. Places which use string-match will then need to ignore case as well. I'm afraid such a change would be a lot of work. However, changing a single function to ignore the case will make Emacs inconsistent in its treatment of file names on DOS_NT. If we think Emacs should be case-insensitive on DOS_NT, it should do that consistently. From eliz@is.elta.co.il Thu Aug 28 06:59:26 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Thu" "28" "August" "1997" "16:58:53" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "" "141" "\"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id GAA26767 for ; Thu, 28 Aug 1997 06:59:24 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id QAA11433; Thu, 28 Aug 1997 16:58:54 +0300 X-Sender: eliz@is Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: Geoff Voelker , Andrew Innes , Kenichi Handa Subject: "Binary" I/O and subprocesses on DOS_NT Date: Thu, 28 Aug 1997 16:58:53 +0300 (IDT) I believe I've found a bug in call-process-region. To reproduce, call hexl-find-file on a DOS-style text file (with CRLF EOLs), change it a bit, then save it: the file is written with Unix-style EOLs. I think this is because call-process-region was incorrectly setting the coding systems for writing the region that serves as input to the process and for reading process output. First, if binary-process-input is nil, that means the input to process is text, so setting the coding-system-for-write to nil (no conversion) is the opposite of what should be done. Also, call-process-region was setting the coding system for reading the process output using binary-process-input (instead of binary-process-output). Actually, this latter part seems unnecessary at all, since call-process does it itself, and it does it correctly. So for now, I just ifdef'ed that part away, see the patch below. I didn't install this change, because I would like you all to look at it carefully, in case I made some error. But please read my other message about this before you look at the patch. The whole issue is complicated, and I think it's a good idea that somebody else looks at what I've done. Here's the patch: 1997-08-28 +03 Eli Zaretskii * callproc.c (Fcall_process): Set EOL conversion type to LF when binary-process-output is non-nil. (Fcall_process_region): binary-process-XXXput only determines EOL conversion; if it is nil, convert LF <-> CRLF. Don't bind coding-system-for-read, it is done in Fcall_process. diff -c src/callproc.c~0 src/callproc.c *** src/callproc.c~0 Sun Aug 24 00:19:34 1997 --- src/callproc.c Thu Aug 28 15:44:20 1997 *************** *** 296,303 **** } setup_coding_system (Fcheck_coding_system (val), &process_coding); #ifdef MSDOS ! /* On MSDOS, if the user did not ask for binary, ! treat it as "text" which means doing CRLF conversion. */ /* FIXME: this probably should be moved into the guts of `Ffind_operation_coding_system' for the case of `call-process'. */ if (NILP (Vbinary_process_output)) --- 296,311 ---- } setup_coding_system (Fcheck_coding_system (val), &process_coding); #ifdef MSDOS ! /* On MSDOS, if the user did not ask for binary, treat it as ! "text" which means doing CRLF conversion. Otherwise, leave ! the EOLs alone. ! ! Note that ``binary'' here only means whether EOLs should or ! should not be converted, since that's what Vbinary_process_XXXput ! meant in the days before the coding systems were introduced. ! ! For other conversions, the caller should set coding-system ! variables explicitly, or rely on auto-detection. */ /* FIXME: this probably should be moved into the guts of `Ffind_operation_coding_system' for the case of `call-process'. */ if (NILP (Vbinary_process_output)) *************** *** 307,312 **** --- 315,322 ---- /* FIXME: should we set type to undecided? */ process_coding.type = coding_type_emacs_mule; } + else + process_coding.eol_type = CODING_EOL_LF; #endif } } *************** *** 801,813 **** start = args[0]; end = args[1]; /* Decide coding-system of the contents of the temporary file. */ #ifdef DOS_NT ! specbind (Qbuffer_file_type, Vbinary_process_input); ! if (NILP (Vbinary_process_input)) ! val = Qnil; ! else #endif - { if (!NILP (Vcoding_system_for_write)) val = Vcoding_system_for_write; else if (NILP (current_buffer->enable_multibyte_characters)) --- 811,822 ---- start = args[0]; end = args[1]; /* Decide coding-system of the contents of the temporary file. */ + { #ifdef DOS_NT ! /* This is to cause find-buffer-file-type-coding-system (see ! dos-w32.el) to choose correct EOL translation for write-region. */ ! specbind (Qbuffer_file_type, Vbinary_process_input); #endif if (!NILP (Vcoding_system_for_write)) val = Vcoding_system_for_write; else if (NILP (current_buffer->enable_multibyte_characters)) *************** *** 825,834 **** --- 834,860 ---- else val = Qnil; } + #ifdef DOS_NT + /* binary-process-input tells whether the buffer needs to be + written with EOL conversions, but it doesn't say anything + about the rest of text encoding. + + Don't let binary-process-input determine the EOL conversion if the + coding system was set explicitly and it specified EOL handling. */ + if (NILP (val) + || VECTORP (Fget (val, Qeol_type)) + || NILP (Vcoding_system_for_write)) + { + Fput (val, Qeol_type, + make_number (NILP (Vbinary_process_input) ? 1 : 0)); + } + #endif } specbind (intern ("coding-system-for-write"), val); Fwrite_region (start, end, filename_string, Qnil, Qlambda, Qnil); + /* This is done by Fcall_process. */ + #if 0 #ifdef DOS_NT if (NILP (Vbinary_process_input)) val = Qnil; *************** *** 853,858 **** --- 879,885 ---- } } specbind (intern ("coding-system-for-read"), val); + #endif record_unwind_protect (delete_temp_file, filename_string); From eliz@is.elta.co.il Thu Aug 28 07:20:23 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Thu" "28" "August" "1997" "17:19:31" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "" "28" "\"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA28112 for ; Thu, 28 Aug 1997 07:20:21 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id RAA11546; Thu, 28 Aug 1997 17:19:31 +0300 X-Sender: eliz@is Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: Geoff Voelker , Andrew Innes , Kenichi Handa Subject: "Binary" I/O and subprocesses on DOS_NT Date: Thu, 28 Aug 1997 17:19:31 +0300 (IDT) Here are my thoughts about this. First, the word ``binary'' is loaded, and it got in the way when I worked on this. binary-process-XXXput being non-nil doesn't really mean that data should be read or written with no conversions; it just means that EOLs should not be converted. I can imagine cases where the rest of the text should be encoded or decoded even though the EOLs should be left alone. Therefore, it is IMHO incorrect to set coding system to nil when binary I/O is specified. We should only set the eol-type property. The patch that I sent to you also avoids setting the EOL conversion of the coding system was specified explicitly, to let the callers override the value of binary-process-XXXput, if they need to do so. Geoff, I think that the code which sets the coding system on dos-w32.el should also be revised, so that it doesn't fall into this trap of ``binary'' files. The patterns for names of binary files in file-name-buffer-file-type-alist designate true binary files which should be read with no conversions at all, but the untranslated filesystems only specify the EOL conversion. As far as I can see, the current code doesn't make that distinction. In particular, when buffer-file-type is non-nil, it does NOT mean the coding system for write should be no-conversion, as dos-w32 sets it now. And btw, why is the coding system for ASCII buffers (such as C source) set to undecided? Shouldn't it be emacs-mule? From rms@gnu.ai.mit.edu Thu Aug 28 09:19:31 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "28" "August" "1997" "12:20:51" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "15" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA03868 for ; Thu, 28 Aug 1997 09:19:30 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA04533; Thu, 28 Aug 1997 12:20:51 -0400 Message-Id: <199708281620.MAA04533@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Thu, 28 Aug 1997 12:05:25 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Thu, 28 Aug 1997 12:20:51 -0400 There's nothing wrong IMHO with a mechanism that does detect binary files most of the time, even if it doesn't work in all cases. If it delays the Emacs 20 release even one day, that is something very wrong with it. Please drop the subject so that Handa will go back to fixing what he needs to fix. I believe this should prove as a nuisance. It was enough of a nuisance in the DOS_NT world to introduce the file-name-buffer-file-type-alist so frequently-used binary files will be recognized automatically. Too bad. Nothing can be done. From eliz@is.elta.co.il Fri Aug 29 03:14:41 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Fri" "29" "August" "1997" "13:14:28" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "23" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id DAA26830 for ; Fri, 29 Aug 1997 03:14:40 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id NAA13247; Fri, 29 Aug 1997 13:14:28 +0300 X-Sender: eliz@is In-Reply-To: <199708290418.VAA17387@joker.cs.washington.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Geoff Voelker cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: "Binary" I/O and subprocesses on DOS_NT Date: Fri, 29 Aug 1997 13:14:28 +0300 (IDT) On Thu, 28 Aug 1997, Geoff Voelker wrote: > Eli, I can't reproduce this. I did > > M-x hexl-find-file /tmp/text > changed some characters > C-x C-s > y > > then, in a command prompt, "od -a /tmp/text". the lines still ended > with CRLFs. Maybe because hexl is called differently on NT? At least one of the places I patched (in call-process) are MSDOS only, and there are numerous other ``ifdef MSDOS'' there. But anyway, please look at the patches for call-process-region (which are DOS_NT) and tell me whether they seem to be corect. Maybe you could come up with your own test case when you look at the present code of call-process-region (for starters, it uses Vbinary_process_input instead of Vbinary_process_output in the fragment that I ifdef'ed away). From eliz@is.elta.co.il Fri Aug 29 03:15:14 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Fri" "29" "August" "1997" "13:15:07" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "16" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id DAA26840 for ; Fri, 29 Aug 1997 03:15:12 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id NAA13253; Fri, 29 Aug 1997 13:15:07 +0300 X-Sender: eliz@is In-Reply-To: <199708290427.VAA26349@joker.cs.washington.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Geoff Voelker cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: "Binary" I/O and subprocesses on DOS_NT Date: Fri, 29 Aug 1997 13:15:07 +0300 (IDT) On Thu, 28 Aug 1997, Geoff Voelker wrote: > > And btw, why is the coding system for ASCII buffers (such as C source) > > set to undecided? Shouldn't it be emacs-mule? > > I don't think that we can assume that it is ASCII. No, I'm talking about what decode_coding returns when the file is read in. It returns undecided if only ASCII characters are seen in the buffer. If that is because the user can add non-ASCII characters after that, then the coding system for writing should be decided by looking at the buffer contents when it is saved, but I don't see that this is actually done. From handa@etl.go.jp Fri Aug 29 03:29:18 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Fri" "29" "August" "1997" "19:29:34" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "20" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id DAA27014 for ; Fri, 29 Aug 1997 03:29:12 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id TAA07998; Fri, 29 Aug 1997 19:28:27 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id TAA24216; Fri, 29 Aug 1997 19:28:26 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id TAA11118; Fri, 29 Aug 1997 19:29:34 +0900 Message-Id: <199708291029.TAA11118@etlken.etl.go.jp> In-reply-to: (message from Eli Zaretskii on Fri, 29 Aug 1997 13:15:07 +0300 (IDT)) References: From: Kenichi Handa To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk Subject: Re: "Binary" I/O and subprocesses on DOS_NT Date: Fri, 29 Aug 1997 19:29:34 +0900 Eli Zaretskii writes: > No, I'm talking about what decode_coding returns when the file is read > in. It returns undecided if only ASCII characters are seen in the > buffer. > If that is because the user can add non-ASCII characters after that, > then the coding system for writing should be decided by looking at the > buffer contents when it is saved, but I don't see that this is > actually done. When one inserts a new file in that buffer, and that new file is encoded in, for instance, iso-latin-1, then buffer-file-coding-system is changed to iso-latin-1. But, if buffer-file-coding-system is emacs-mule before inserting that new file, it doesn't change even after the insertion. --- Ken'ichi HANDA handa@etl.go.jp From rms@priam.CS.Berkeley.EDU Fri Aug 29 23:58:06 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "28" "August" "1997" "17:08:22" "-0400" "Richard M. Stallman" "rms@priam.cs.berkeley.edu" nil "8" "Re: get-file-buffer and find-buffer-visiting" "^From:" nil nil "8" nil nil nil nil] nil) Received: from priam.CS.Berkeley.EDU (priam.CS.Berkeley.EDU [128.32.34.48]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA18927 for ; Fri, 29 Aug 1997 23:58:05 -0700 Received: (from rms@localhost) by priam.CS.Berkeley.EDU (8.8.3/8.8.2) id XAA13297; Fri, 29 Aug 1997 23:57:58 -0700 (PDT) Message-Id: <199708300657.XAA13297@priam.CS.Berkeley.EDU> In-reply-to: (message from Eli Zaretskii on Thu, 28 Aug 1997 12:24:51 +0300 (IDT)) References: From: "Richard M. Stallman" To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, rms@priam.CS.Berkeley.EDU Subject: Re: get-file-buffer and find-buffer-visiting Date: Thu, 28 Aug 1997 17:08:22 -0400 I'm afraid such a change would be a lot of work. However, changing a single function to ignore the case will make Emacs inconsistent in its treatment of file names on DOS_NT. This convinces me that no change should be made now in this aspect of Emacs. Thanks. From rms@priam.CS.Berkeley.EDU Sat Aug 30 12:55:23 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sat" "30" "August" "1997" "12:55:04" "-0700" "Richard M. Stallman" "rms@priam.cs.berkeley.edu" nil "10" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil] nil) Received: from priam.CS.Berkeley.EDU (priam.CS.Berkeley.EDU [128.32.34.48]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA05888 for ; Sat, 30 Aug 1997 12:55:23 -0700 Received: (from rms@localhost) by priam.CS.Berkeley.EDU (8.8.3/8.8.2) id MAA13636; Sat, 30 Aug 1997 12:55:04 -0700 (PDT) Message-Id: <199708301955.MAA13636@priam.CS.Berkeley.EDU> In-reply-to: <199708251239.VAA02414@etlken.etl.go.jp> (message from Kenichi Handa on Mon, 25 Aug 1997 21:39:16 +0900) Reply-to: rms@priam.CS.Berkeley.EDU References: <199708251239.VAA02414@etlken.etl.go.jp> From: "Richard M. Stallman" To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Sat, 30 Aug 1997 12:55:04 -0700 (PDT) Hmm, perhaps, we must now give up detecting a coding system of a file in an incremental manner as being done now, but have to read the whole file with no conversion, detect a coding system by running sophisticated Emacs Lisp code on the whole buffer, then decode the whole buffer at once. What worries me in this, is that this method could be used for insert-file-contents, but cannot be used for a synchronous subprocess. I am not sure that it is a good idea to use different methods for subprocesses and files. From rms@priam.CS.Berkeley.EDU Sun Aug 31 01:09:08 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Fri" "29" "August" "1997" "14:19:59" "-0400" "Richard M. Stallman" "rms@priam.cs.berkeley.edu" nil "11" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil] nil) Received: from priam.CS.Berkeley.EDU (priam.CS.Berkeley.EDU [128.32.34.48]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA23093 for ; Sun, 31 Aug 1997 01:09:08 -0700 Received: (from rms@localhost) by priam.CS.Berkeley.EDU (8.8.3/8.8.2) id BAA14078; Sun, 31 Aug 1997 01:08:17 -0700 (PDT) Message-Id: <199708310808.BAA14078@priam.CS.Berkeley.EDU> In-reply-to: (message from Eli Zaretskii on Thu, 28 Aug 1997 17:19:31 +0300 (IDT)) From: "Richard M. Stallman" To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: "Binary" I/O and subprocesses on DOS_NT Date: Fri, 29 Aug 1997 14:19:59 -0400 Therefore, it is IMHO incorrect to set coding system to nil when binary I/O is specified. We should only set the eol-type property. I agree. The patch that I sent to you also avoids setting the EOL conversion of the coding system was specified explicitly, to let the callers override the value of binary-process-XXXput, if they need to do so. I agree in principle. Implementing this in a fully satisfactory way may be hard. From rms@priam.CS.Berkeley.EDU Sun Aug 31 01:14:45 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Fri" "29" "August" "1997" "14:21:43" "-0400" "Richard M. Stallman" "rms@priam.cs.berkeley.edu" nil "106" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil] nil) Received: from priam.CS.Berkeley.EDU (priam.CS.Berkeley.EDU [128.32.34.48]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA23092 for ; Sun, 31 Aug 1997 01:09:08 -0700 Received: (from rms@localhost) by priam.CS.Berkeley.EDU (8.8.3/8.8.2) id BAA14081; Sun, 31 Aug 1997 01:08:18 -0700 (PDT) Message-Id: <199708310808.BAA14081@priam.CS.Berkeley.EDU> In-reply-to: (message from Eli Zaretskii on Thu, 28 Aug 1997 17:19:31 +0300 (IDT)) From: "Richard M. Stallman" To: eliz@is.elta.co.il CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: "Binary" I/O and subprocesses on DOS_NT Date: Fri, 29 Aug 1997 14:21:43 -0400 Geoff, I think that the code which sets the coding system on dos-w32.el should also be revised, so that it doesn't fall into this trap of ``binary'' files. The patterns for names of binary files in file-name-buffer-file-type-alist designate true binary files which should be read with no conversions at all, but the untranslated filesystems only specify the EOL conversion. As far as I can see, the current code doesn't make that distinction. I've made changes (below) that I think solve these problems. Can you please test them? (These are in the .97 pretest too.) In particular, when buffer-file-type is non-nil, it does NOT mean the coding system for write should be no-conversion, as dos-w32 sets it now. Please note that that code is used only when buffer-file-coding-system is nil, which means, in a buffer that is not file-visiting. As far as I can see, no-conversion is the right choice for that case. *** dos-w32.el 1997/08/17 01:49:50 1.10 --- dos-w32.el 1997/08/29 18:17:38 *************** *** 72,89 **** (setq alist (cdr alist))) found))) (defun find-buffer-file-type (filename) ! ;; First check if file is on an untranslated filesystem, then on the alist. ! (if (untranslated-file-p filename) ! t ; for binary ! (let ((match (find-buffer-file-type-match filename)) ! (code)) ! (if (not match) ! default-buffer-file-type ! (setq code (cdr match)) ! (cond ((memq code '(nil t)) code) ! ((and (symbolp code) (fboundp code)) ! (funcall code filename))))))) (setq-default buffer-file-coding-system 'undecided-dos) --- 72,87 ---- (setq alist (cdr alist))) found))) + ;; Don't check for untranslated file systems here. (defun find-buffer-file-type (filename) ! (let ((match (find-buffer-file-type-match filename)) ! (code)) ! (if (not match) ! default-buffer-file-type ! (setq code (cdr match)) ! (cond ((memq code '(nil t)) code) ! ((and (symbolp code) (fboundp code)) ! (funcall code filename)))))) (setq-default buffer-file-coding-system 'undecided-dos) *************** *** 123,142 **** (let ((op (nth 0 command)) (target) (binary nil) (text nil) ! (undecided nil)) (cond ((eq op 'insert-file-contents) (setq target (nth 1 command)) ! (if (untranslated-file-p target) ! (if (file-exists-p target) ! (setq undecided t) ! (setq binary t)) ! (setq binary (find-buffer-file-type target)) ! (unless binary ! (if (find-buffer-file-type-match target) ! (setq text t) ! (setq undecided (file-exists-p target))))) (cond (binary '(no-conversion . no-conversion)) (text '(undecided-dos . undecided-dos)) (undecided '(undecided . undecided)) (t '(undecided-dos . undecided-dos)))) ((eq op 'write-region) --- 121,145 ---- (let ((op (nth 0 command)) (target) (binary nil) (text nil) ! (undecided nil) (undecided-unix nil)) (cond ((eq op 'insert-file-contents) (setq target (nth 1 command)) ! ;; First check for a file name that indicates ! ;; it is truly binary. ! (setq binary (find-buffer-file-type target)) ! (cond (binary) ! ;; Next check for files that MUST use DOS eol conversion. ! ((find-buffer-file-type-match target) ! (setq text t)) ! ;; For any other existing file, decide based on contents. ! ((file-exists-p target) ! (setq undecided t)) ! ;; Next check for a non-DOS file system. ! ((untranslated-file-p target) ! (setq undecided-unix t))) (cond (binary '(no-conversion . no-conversion)) (text '(undecided-dos . undecided-dos)) + (undecided-unix '(undecided-unix . undecided-unix)) (undecided '(undecided . undecided)) (t '(undecided-dos . undecided-dos)))) ((eq op 'write-region) From rms@gnu.ai.mit.edu Sun Aug 31 12:39:25 1997 X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] [nil "Sun" "31" "August" "1997" "15:40:56" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199708311940.PAA15457@psilocin.gnu.ai.mit.edu>" "7" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA06278 for ; Sun, 31 Aug 1997 12:39:24 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id PAA15457; Sun, 31 Aug 1997 15:40:56 -0400 Message-Id: <199708311940.PAA15457@psilocin.gnu.ai.mit.edu> In-reply-to: <199708290418.VAA17387@joker.cs.washington.edu> (voelker@cs.washington.edu) References: <199708290418.VAA17387@joker.cs.washington.edu> From: Richard Stallman To: voelker@cs.washington.edu CC: eliz@is.elta.co.il, andrewi@harlequin.co.uk, handa@etl.go.jp Subject: Re: "Binary" I/O and subprocesses on DOS_NT Date: Sun, 31 Aug 1997 15:40:56 -0400 > I believe I've found a bug in call-process-region. To reproduce, call > hexl-find-file on a DOS-style text file (with CRLF EOLs), change it a > bit, then save it: the file is written with Unix-style EOLs. Eli, I can't reproduce this. I did Can you try this in the new pretest? From handa@etl.go.jp Mon Sep 1 01:11:37 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" " 1" "September" "1997" "17:12:04" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "27" "Re: Coding system issues (3)" "^From:" nil nil "9" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA24451 for ; Mon, 1 Sep 1997 01:11:35 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id RAA02811; Mon, 1 Sep 1997 17:10:53 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA28502; Mon, 1 Sep 1997 17:10:53 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id RAA15939; Mon, 1 Sep 1997 17:12:04 +0900 Message-Id: <199709010812.RAA15939@etlken.etl.go.jp> In-reply-to: <199708301955.MAA13636@priam.CS.Berkeley.EDU> (rms@priam.CS.Berkeley.EDU) References: <199708251239.VAA02414@etlken.etl.go.jp> <199708301955.MAA13636@priam.CS.Berkeley.EDU> From: Kenichi Handa To: rms@priam.CS.Berkeley.EDU CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Mon, 1 Sep 1997 17:12:04 +0900 "Richard M. Stallman" writes: > Hmm, perhaps, we must now give up detecting a coding system of a file > in an incremental manner as being done now, but have to read the whole > file with no conversion, detect a coding system by running > sophisticated Emacs Lisp code on the whole buffer, then decode the > whole buffer at once. > What worries me in this, is that this method could be used for > insert-file-contents, but cannot be used for a synchronous subprocess. > I am not sure that it is a good idea to use different methods for > subprocesses and files. ?? A synchronous subprocess has no problem, we can read the whole output into a buffer, and then process it. The problem is with an asynchronous subprocess because we must detect a coding system on the fly. But, I think it is enough to give just a bunch of data Emacs receives from the subprocess to the sophisticated Emacs Lisp code which I mentioned above. And, even in file-reading (insert-file-contents), if BEG and END are specified, that Emacs Lisp code will detect a coding system only from the part of a file. --- Ken'ichi HANDA handa@etl.go.jp From rms@gnu.ai.mit.edu Mon Sep 1 10:05:47 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Mon" " 1" "September" "1997" "13:07:20" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "19" "Re: Coding system issues (3)" "^From:" nil nil "9" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA06618 for ; Mon, 1 Sep 1997 10:05:46 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA04706; Mon, 1 Sep 1997 13:07:20 -0400 Message-Id: <199709011707.NAA04706@psilocin.gnu.ai.mit.edu> In-reply-to: <199709010812.RAA15939@etlken.etl.go.jp> (message from Kenichi Handa on Mon, 1 Sep 1997 17:12:04 +0900) References: <199708251239.VAA02414@etlken.etl.go.jp> <199708301955.MAA13636@priam.CS.Berkeley.EDU> <199709010812.RAA15939@etlken.etl.go.jp> From: Richard Stallman To: handa@etl.go.jp CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Mon, 1 Sep 1997 13:07:20 -0400 > What worries me in this, is that this method could be used for > insert-file-contents, but cannot be used for a synchronous subprocess. ?? A synchronous subprocess has no problem, we can read the whole output into a buffer, and then process it. Right, I meant to say asynchronous. The problem is with an asynchronous subprocess because we must detect a coding system on the fly. But, I think it is enough to give just a bunch of data Emacs receives from the subprocess to the sophisticated Emacs Lisp code which I mentioned above. I am not sure what "enough" means. It would do something, it would choose a coding system, but it would do so in an inconsistent way. In effect, what I am saying is this: if this method is acceptable for an asynchronous subprocess, doesn't that mean it is also acceptable for a file? From handa@etl.go.jp Mon Sep 1 17:50:35 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Tue" " 2" "September" "1997" "09:50:57" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "27" "Re: Coding system issues (3)" "^From:" nil nil "9" nil nil nil nil] nil) Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA19068 for ; Mon, 1 Sep 1997 17:50:34 -0700 Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP id JAA27257; Tue, 2 Sep 1997 09:49:48 +0900 (JST) Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id JAA02661; Tue, 2 Sep 1997 09:49:47 +0900 (JST) Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) id JAA16888; Tue, 2 Sep 1997 09:50:57 +0900 Message-Id: <199709020050.JAA16888@etlken.etl.go.jp> In-reply-to: <199709011707.NAA04706@psilocin.gnu.ai.mit.edu> (message from Richard Stallman on Mon, 1 Sep 1997 13:07:20 -0400) References: <199708251239.VAA02414@etlken.etl.go.jp> <199708301955.MAA13636@priam.CS.Berkeley.EDU> <199709010812.RAA15939@etlken.etl.go.jp> <199709011707.NAA04706@psilocin.gnu.ai.mit.edu> From: Kenichi Handa To: rms@gnu.ai.mit.edu CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk Subject: Re: Coding system issues (3) Date: Tue, 2 Sep 1997 09:50:57 +0900 Richard Stallman writes: > The problem is with an asynchronous subprocess because we must detect > a coding system on the fly. But, I think it is enough to give just a > bunch of data Emacs receives from the subprocess to the sophisticated > Emacs Lisp code which I mentioned above. > I am not sure what "enough" means. It would do something, it would > choose a coding system, but it would do so in an inconsistent way. > In effect, what I am saying is this: if this method is acceptable for > an asynchronous subprocess, doesn't that mean it is also acceptable > for a file? Fo an asynchronous subprocess of which communication is hidden from users such as ispell and nntp, we anyway have to specify a coding system explicitly. It is dangerous to beleive automatic code detection. But, for shell (for instance), users can see the output from the subprocess, and they can easily find something is wrong when Emacs fails to find a correct coding automatically. And they can change a coding system by C-x RET p interactively. The sophisticated Emacs Lisp code mentioned above won't work that acculate when it is given a few data (perhaps only one line data), but I think it is acceptable for subprocesses. --- Ken'ichi HANDA handa@etl.go.jp From eliz@is.elta.co.il Sun Sep 14 07:12:32 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sun" "14" "September" "1997" "17:12:07" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "19" "Re: EOL conversion in call-process-region" "^From:" nil nil "9" nil nil nil nil] nil) Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA11931 for ; Sun, 14 Sep 1997 07:12:31 -0700 Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) id RAA20127; Sun, 14 Sep 1997 17:12:08 +0300 X-Sender: eliz@is In-Reply-To: <199709111953.PAA04064@psilocin.gnu.ai.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII From: Eli Zaretskii To: Richard Stallman cc: handa@etl.go.jp, voelker@cs.washington.edu Subject: Re: EOL conversion in call-process-region Date: Sun, 14 Sep 1997 17:12:07 +0300 (IDT) On Thu, 11 Sep 1997, Richard Stallman wrote: > Suppose you read a DOS file and then use M-| to do something to the > text. Chances are you don't want that to be affected by what the > file's EOL conversion was. That is exactly what I am not sure about. Won't people in such cases expect to get the same behavior as if the region was cut out of the original file, e.g. by a sed script? At least when the region is the entire buffer, they probably would. Passing the region without the original EOLs breaks this. It is true that in many cases, particularly when the buffer contains text in the native format, things will generally work both ways (DOS programs that work on text usually drop the CR characters when they read the file). But reading non-text files, or reading DOS-style text files on Unix, causes M-| to behave differently than if the file were submitted to the invoked program outside Emacs. From rms@gnu.ai.mit.edu Sun Sep 14 10:04:06 1997 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Sun" "14" "September" "1997" "13:05:54" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "17" "Re: EOL conversion in call-process-region" "^From:" nil nil "9" nil nil nil nil] nil) Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA15021 for ; Sun, 14 Sep 1997 10:04:05 -0700 Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA08358; Sun, 14 Sep 1997 13:05:54 -0400 Message-Id: <199709141705.NAA08358@psilocin.gnu.ai.mit.edu> In-reply-to: (message from Eli Zaretskii on Sun, 14 Sep 1997 17:12:07 +0300 (IDT)) References: From: Richard Stallman To: eliz@is.elta.co.il CC: handa@etl.go.jp, voelker@cs.washington.edu Subject: Re: EOL conversion in call-process-region Date: Sun, 14 Sep 1997 13:05:54 -0400 > Suppose you read a DOS file and then use M-| to do something to the > text. Chances are you don't want that to be affected by what the > file's EOL conversion was. That is exactly what I am not sure about. Won't people in such cases expect to get the same behavior as if the region was cut out of the original file, e.g. by a sed script? If you are using M-| to do something to the text as you see it, you won't care what it looked like in the file. If you are using M-| as a substitute for doing something with the file itself, then you would care. Basically it seems that neither choice is perfect, so I might as well not change the one we have.