From owner-ntemacs-users@cs.washington.edu Wed Feb 17 11:35:19 1999 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "17" "February" "1999" "19:09:28" "GMT" "Andrew Innes" "andrewi@harlequin.co.uk" nil "105" "Re: AW: how to use emacs in -batch mode from bash?" "^From:" nil nil "2" nil nil nil nil] nil) Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id LAA10985 for ; Wed, 17 Feb 1999 11:35:19 -0800 Received: (majordom@localhost) by trout.cs.washington.edu (8.8.5+CS/7.2trout) id LAA29587 for ntemacs-users-outgoing; Wed, 17 Feb 1999 11:10:55 -0800 (PST) Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2trout) with ESMTP id LAA29583 for ; Wed, 17 Feb 1999 11:10:51 -0800 (PST) Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id LAA08886 for ; Wed, 17 Feb 1999 11:10:49 -0800 Received: from gpo.cam.harlequin.co.uk (gpo.cam.harlequin.co.uk [192.88.238.241]) by holly.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP id TAA13702; Wed, 17 Feb 1999 19:10:02 GMT Received: from gridlock.cam.harlequin.co.uk (gridlock.cam.harlequin.co.uk [192.88.238.223]) by gpo.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP id TAA28859; Wed, 17 Feb 1999 19:09:29 GMT Message-Id: <199902171909.TAA28859@gpo.cam.harlequin.co.uk> In-reply-to: <36C9ADFC.ABD3ACE4@Maths.QMW.ac.uk> (F.J.Wright@qmw.ac.uk) References: <5B9BE15FBECDD111A1820000F843B87C16C16F@bkmail1.bk.bosch.de> <199902161547.HAA28357@june.cs.washington.edu> <36C9ADFC.ABD3ACE4@Maths.QMW.ac.uk> Precedence: bulk X-FAQ: http://www.cs.washington.edu/homes/voelker/ntemacs.html From: Andrew Innes Sender: owner-ntemacs-users@cs.washington.edu To: F.J.Wright@qmw.ac.uk CC: mike.fabian@it-mannesmann.de, Rolf.Sandau@de.bosch.com, ntemacs-users@cs.washington.edu, cygwin@sourceware.cygnus.com Subject: Re: AW: how to use emacs in -batch mode from bash? Date: Wed, 17 Feb 1999 19:09:28 GMT [added cygwin@sourceware.cygnus.com] On Tue, 16 Feb 1999 17:42:20 +0000, "Dr Francis J. Wright" said: >OK. Putting the pieces together, this works and appears to do what you >want: > >bash-2.02$ hi=HO; emacs -batch --eval "(message \\\"$hi\\\")" >HO > >But that leaves the question: why does it work? > >bash-2.02$ set -x >bash-2.02$ emacs -batch --eval "(message \\\"hi\\\")" >+ emacs -batch --eval '(message \"hi\")' >hi > >Hence, this is equivalent to my previous suggestion after variable >interpolation. But I agree with you, Mike, that so many \s should not >be necessary. > >Could it be that NTEmacs is parsing its command line based on an >assumption that is wrong when the shell is bash? It's probably using >libraries that assume the shell is COMMAND or CMD, which have different >quoting rules. Hence, when using bash it is necessary to quote in a way >that makes no sense from a UNIX/bash perspective. That's pretty much right on the nose (except that command.com/cmd.exe don't really have quoting rules; they are too dumb for that). This is the old "Microsoft vs Cygnus" quoting rules problem, but in reverse this time. The basic problem is that Windows applications normally rely on the C library startup code to construct the argv[] array (list of command line arguments) by parsing the command line. (On DOS/Windows, the command line is passed as a single string and it is entirely up to the application how it interprets that string. On Unix, applications receive a list of argument strings exactly as provided by the parent. The C libraries for Windows compilers provide startup code to reconstruct the list of argument strings to emulate the Unix environment.) This technique of the startup code parsing the command line to construct the argument list is perfectly reasonable, but Cygnus put a fly in the ointment by using slightly incompatible rules for parsing the command line. The basic rule is the same for both: arguments are separated by white space (which is discarded), so quotes must be put around arguments that are intended to contain white space. The rules diverge when handling the case where a quote character itself appears in an argument (an embedded quote), and must be escaped so it isn't misconstrued as the end of the argument. Now Emacs was made aware of the two quoting rules back in 19.34.6 days, to solve the problem of constructing the command line for subprocesses started from Emacs, so that the subprocess will "see" the list of arguments that Emacs intends even if there are embedded quotes. (Aside: At the same time, I added some magic so that Emacs would detect automatically which rules to use by looking at the application executable, specifically to check whether it imports cygwin.dll. That has worked well, except that the magic broke with newer releases of the Cygnus library when the dll name changed. The next version of Emacs will have better magic which works with all releases of the cygwin library, and will hopefully continue to work with any future releases.) However, we are now seeing the same problem occuring, this time on the Cygnus side. The Cygnus port of bash will be applying the normal shell quoting rules to parse the command line typed by the user (or entered in the shell script), to construct the list of arguments to pass to Emacs. However, when bash invokes spawn() or exec() or some similiar library function to actually invoke Emacs, it has to flatten the argument list into a single string. Clearly, the library function that does that is assuming the subprocess will use the Cygnus quoting rules to reconstruct the list of arguments. That fails when an argument contains an embedded quote and the application doesn't use the Cygnus rules, which is the situation here. Note that this is a problem with bash that applies when it invokes any application not compiled with the cygwin library, not just Emacs. I see two possible solutions to this general problem: 1. Change the cygwin spawn/exec/whatever library functions to use the Microsoft rules for escaping embedded quotes when running non-cygwin applications (I believe they already detects when they are spawning non-cygwin applications; if not, the method Emacs uses could be reused for this). 2. Change the cygwin quoting rules to match the Microsoft ones. This would apply to spawn/exec and the startup code, and would cause some breakage when mixing with applications compiled with old versions of cygwin. Since cygwin-compiled applications tend to be recompiled when new releases of the library come out, option (2) might actually be viable, and would be the preferred solution since it would maximise the interoperability between applications. But even option (1) would be a major improvement. AndrewI PS. There is a certain amount of irony in all this: the Microsoft startup code looks like it was intended to support escaping embedded quotes by doubling them (as Cygnus does), but the parsing code contains a bug which prevents this from working. If not for this bug, the problem with bash invoking non-cygwin applications wouldn't arise. From owner-ntemacs-users@cs.washington.edu Wed Feb 17 13:51:47 1999 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Wed" "17" "February" "1999" "16:24:23" "-0500" "Christopher Faylor" "cgf@cygnus.com" nil "68" "Re: AW: how to use emacs in -batch mode from bash?" "^From:" nil nil "2" nil nil nil nil] nil) Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id NAA23641 for ; Wed, 17 Feb 1999 13:46:46 -0800 Received: (majordom@localhost) by trout.cs.washington.edu (8.8.5+CS/7.2trout) id NAA01514 for ntemacs-users-outgoing; Wed, 17 Feb 1999 13:23:56 -0800 (PST) Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2trout) with ESMTP id NAA01510 for ; Wed, 17 Feb 1999 13:23:53 -0800 (PST) Received: from cygnus.com (runyon.cygnus.com [205.180.230.5]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id NAA21285 for ; Wed, 17 Feb 1999 13:23:52 -0800 Received: from kramden.cygnus.com (kramden.cygnus.com [192.80.44.95]) by runyon.cygnus.com (8.8.7-cygnus/8.8.7) with ESMTP id NAA13884; Wed, 17 Feb 1999 13:23:49 -0800 (PST) Received: (from cgf@localhost) by kramden.cygnus.com (8.8.7/8.7.3) id QAA14031; Wed, 17 Feb 1999 16:24:23 -0500 Message-ID: <19990217162423.A13997@cygnus.com> References: <5B9BE15FBECDD111A1820000F843B87C16C16F@bkmail1.bk.bosch.de> <199902161547.HAA28357@june.cs.washington.edu> <36C9ADFC.ABD3ACE4@Maths.QMW.ac.uk> <199902171909.TAA28859@gpo.cam.harlequin.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93i In-Reply-To: <199902171909.TAA28859@gpo.cam.harlequin.co.uk>; from Andrew Innes on Wed, Feb 17, 1999 at 07:09:28PM +0000 Precedence: bulk X-FAQ: http://www.cs.washington.edu/homes/voelker/ntemacs.html From: Christopher Faylor Sender: owner-ntemacs-users@cs.washington.edu To: Andrew Innes , F.J.Wright@qmw.ac.uk Cc: mike.fabian@it-mannesmann.de, Rolf.Sandau@de.bosch.com, ntemacs-users@cs.washington.edu, cygwin@sourceware.cygnus.com Subject: Re: AW: how to use emacs in -batch mode from bash? Date: Wed, 17 Feb 1999 16:24:23 -0500 On Wed, Feb 17, 1999 at 07:09:28PM +0000, Andrew Innes wrote: >However, we are now seeing the same problem occuring, this time on the >Cygnus side. The Cygnus port of bash will be applying the normal shell >quoting rules to parse the command line typed by the user (or entered in >the shell script), to construct the list of arguments to pass to Emacs. >However, when bash invokes spawn() or exec() or some similiar library >function to actually invoke Emacs, it has to flatten the argument list >into a single string. Clearly, the library function that does that is >assuming the subprocess will use the Cygnus quoting rules to reconstruct >the list of arguments. That fails when an argument contains an embedded >quote and the application doesn't use the Cygnus rules, which is the >situation here. As far as I know, the method used to "quote a quote" in cygwin is the same as what is used in Visual C's libraries. Here's a small program that I just wrote to test this: #include main(int argc, char **argv) { int i; for (i = 0; i < argc; i++) printf("arg %d: /%s/\n", i, argv[i]); } And, here's the result: c:\tmp>echoarg a b """" arg 0: /echoarg/ arg 1: /a/ arg 2: /b/ arg 3: /"/ >Note that this is a problem with bash that applies when it invokes any >application not compiled with the cygwin library, not just Emacs. > >I see two possible solutions to this general problem: > > 1. Change the cygwin spawn/exec/whatever library functions to use the > Microsoft rules for escaping embedded quotes when running non-cygwin > applications (I believe they already detects when they are spawning > non-cygwin applications; if not, the method Emacs uses could be > reused for this). Cygwin does not know when it is running a non-cygwin application. If it did we wouldn't go through this quoting mess at all. If Emacs is detecting this somehow, I'd love to hear how they do it. I've wanted to put more smarts into spawn for some time. > 2. Change the cygwin quoting rules to match the Microsoft ones. This > would apply to spawn/exec and the startup code, and would cause some > breakage when mixing with applications compiled with old versions of > cygwin. See above. As far as I can tell, cygwin is already compliant with Microsoft's rules. That was the intent in this whole scheme, actually. >PS. There is a certain amount of irony in all this: the Microsoft >startup code looks like it was intended to support escaping embedded >quotes by doubling them (as Cygnus does), but the parsing code contains >a bug which prevents this from working. If not for this bug, the >problem with bash invoking non-cygwin applications wouldn't arise. I'm not sure why you're seeing this and I'm not but for my version of MSVC 5.0 this seems to be working correctly. cgf From owner-ntemacs-users@cs.washington.edu Thu Feb 18 05:47:24 1999 X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] [nil "Thu" "18" "February" "1999" "13:22:07" "GMT" "Andrew Innes" "andrewi@harlequin.co.uk" nil "112" "Re: AW: how to use emacs in -batch mode from bash?" "^From:" nil nil "2" nil nil nil nil] nil) Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id FAA06600 for ; Thu, 18 Feb 1999 05:47:24 -0800 Received: (majordom@localhost) by trout.cs.washington.edu (8.8.5+CS/7.2trout) id FAA07998 for ntemacs-users-outgoing; Thu, 18 Feb 1999 05:23:38 -0800 (PST) Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2trout) with ESMTP id FAA07994 for ; Thu, 18 Feb 1999 05:23:35 -0800 (PST) Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id FAA06026 for ; Thu, 18 Feb 1999 05:23:32 -0800 Received: from gpo.cam.harlequin.co.uk (gpo.cam.harlequin.co.uk [192.88.238.241]) by holly.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP id NAA20705; Thu, 18 Feb 1999 13:22:40 GMT Received: from gridlock.cam.harlequin.co.uk (gridlock.cam.harlequin.co.uk [192.88.238.223]) by gpo.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP id NAA15032; Thu, 18 Feb 1999 13:22:07 GMT Message-Id: <199902181322.NAA15032@gpo.cam.harlequin.co.uk> In-reply-to: <19990217162423.A13997@cygnus.com> (message from Christopher Faylor on Wed, 17 Feb 1999 16:24:23 -0500) References: <5B9BE15FBECDD111A1820000F843B87C16C16F@bkmail1.bk.bosch.de> <199902161547.HAA28357@june.cs.washington.edu> <36C9ADFC.ABD3ACE4@Maths.QMW.ac.uk> <199902171909.TAA28859@gpo.cam.harlequin.co.uk> <19990217162423.A13997@cygnus.com> Precedence: bulk X-FAQ: http://www.cs.washington.edu/homes/voelker/ntemacs.html From: Andrew Innes Sender: owner-ntemacs-users@cs.washington.edu To: cgf@cygnus.com CC: F.J.Wright@qmw.ac.uk, mike.fabian@it-mannesmann.de, Rolf.Sandau@de.bosch.com, ntemacs-users@cs.washington.edu, cygwin@sourceware.cygnus.com Subject: Re: AW: how to use emacs in -batch mode from bash? Date: Thu, 18 Feb 1999 13:22:07 GMT On Wed, 17 Feb 1999 16:24:23 -0500, Christopher Faylor said: >On Wed, Feb 17, 1999 at 07:09:28PM +0000, Andrew Innes wrote: >>However, we are now seeing the same problem occuring, this time on the >>Cygnus side. The Cygnus port of bash will be applying the normal shell >>quoting rules to parse the command line typed by the user (or entered in >>the shell script), to construct the list of arguments to pass to Emacs. >>However, when bash invokes spawn() or exec() or some similiar library >>function to actually invoke Emacs, it has to flatten the argument list >>into a single string. Clearly, the library function that does that is >>assuming the subprocess will use the Cygnus quoting rules to reconstruct >>the list of arguments. That fails when an argument contains an embedded >>quote and the application doesn't use the Cygnus rules, which is the >>situation here. > >As far as I know, the method used to "quote a quote" in cygwin is the >same as what is used in Visual C's libraries. Here's a small program >that I just wrote to test this: > >#include >main(int argc, char **argv) >{ > int i; > for (i = 0; i < argc; i++) > printf("arg %d: /%s/\n", i, argv[i]); >} > >And, here's the result: > >c:\tmp>echoarg a b """" >arg 0: /echoarg/ >arg 1: /a/ >arg 2: /b/ >arg 3: /"/ This example doesn't show up the difference, because the MSVC startup code _does_ handle repeated quotes, but not in quite the same way (see crt/src/stdargv.c in the MSVC library source for the gory details). Here is a more revealing example: d:\users\andrewi>echoarg "test a" "test ""b""" "test ""c"" d" arg 0: /echoarg/ arg 1: /test a/ arg 2: /test "b"/ arg 3: /test "c/ arg 4: /d/ Note that arg 2 comes out as expected (fortuitously it turns out), but arg 3 is split into two args by the MSVC code (and drops a quote in the process), and not by the Cygwin code. The reason is that MSVC sometimes treats a doubled quote as the end of the argument. To escape an embedded quote reliably (at least in the absence of preceding backslashes), you have to triple it like so: d:\users\andrewi>echoarg "test a" "test """b"""" "test """c""" d" arg 0: /echoargs/ arg 1: /test a/ arg 2: /test "b"/ arg 3: /test "c" d/ In fairness, this might not be a bug in the MSVC code, but a deliberate feature. It enables the following, slightly strange, method of constructing arguments with whitespace: d:\users\andrewi>echoarg "a and b "together arg 0: /echoargs/ arg 1: /a and b together/ I can imagine that someone requested this behaviour, as a way to enable DOS batch files to do things they couldn't otherwise easily do. Anyway, the upshot of this mess is that the only really reliable way to escape an embedded quote is to put a backslash before it (and double all literal backslashes immediately preceding the embedded quote). This is what I refer to as the Microsoft quoting rule. >>Note that this is a problem with bash that applies when it invokes any >>application not compiled with the cygwin library, not just Emacs. >> >>I see two possible solutions to this general problem: >> >>1. Change the cygwin spawn/exec/whatever library functions to use the >>Microsoft rules for escaping embedded quotes when running non-cygwin >>applications (I believe they already detects when they are spawning >>non-cygwin applications; if not, the method Emacs uses could be >>reused for this). > >Cygwin does not know when it is running a non-cygwin application. If it >did we wouldn't go through this quoting mess at all. > >If Emacs is detecting this somehow, I'd love to hear how they do it. I've >wanted to put more smarts into spawn for some time. In NT-Emacs, we examine the header of an executable, and if it is in PE format, we walk the import table to see whether it implicitly links to "cygwin.dll". (In the next release, I just check whether there is a dll whose name starts "cygwin".) >>2. Change the cygwin quoting rules to match the Microsoft ones. This >>would apply to spawn/exec and the startup code, and would cause some >>breakage when mixing with applications compiled with old versions of >>cygwin. > >See above. As far as I can tell, cygwin is already compliant with Microsoft's >rules. That was the intent in this whole scheme, actually. The Microsoft rules are unfortunately more complicated than they seem, as shown above. I believe the simplest rule to reliably escape embedded quotes for MSVC-compiled programs is to use backslash, which is what NT-Emacs does. AndrewI