Unconsistent command-line parsing in case of UTF-8 quoted arguments

Brian Inglis Brian.Inglis@SystematicSw.ab.ca
Wed Oct 7 02:20:04 GMT 2020


On 2020-10-06 15:36, Jérôme Froissart wrote:
> Thanks for your replies.
> This issue only happens when a program is run from cmd.exe, not from a
> Cygwin bash shell.
> This is important for me, since I discovered this bug in a project
> that must be run from Windows graphical shell (i.e. there is no
> sensible way to run it through Cygwin and Bash).
> 
>> Please show us the output from "uname -a" and "locale" run from the bash prompt.
> 
>> Please provide the results of "locale" command right before running your test
>> binary.
> Here are the more detailed steps to reproduce the issue (along with
> answers to your requests about `uname`, `locale`, etc.).
> (I mostly reproduced what billziss-gh had done before, I do not take
> all the credits :D)
> 
> Here is an example C file

> I have built it with gcc from Cygwin
>     $ gcc -o binary example.c
> 
> Running it from the same Cygwin bash prompt works as expected
>     $ uname -a
>     CYGWIN_NT-10.0 XPS 3.1.5(0.340/5/3) 2020-06-01 08:59 x86_64 Cygwin
>     # (XPS is my Windows machine name)
> 
>     $ locale
>     LANG=fr_FR.UTF-8
>     LC_CTYPE="fr_FR.UTF-8"
>     LC_NUMERIC="fr_FR.UTF-8"
>     LC_TIME="fr_FR.UTF-8"
>     LC_COLLATE="fr_FR.UTF-8"
>     LC_MONETARY="fr_FR.UTF-8"
>     LC_MESSAGES="fr_FR.UTF-8"
>     LC_ALL=
> 
>     $ which gcc
>     /usr/bin/gcc
> 
>     # The following runs as expected
>     $ ./binary.exe "foo bar" "Jérôme"
>     C="C:\Users\Public\binary.exe"
>     0=./binary
>     1=foo bar
>     2=Jérôme
> 
> Now, let's start a Windows shell (cmd.exe)
> Note that I had to copy cygwin1.dll from my Cygwin installation
> directory, otherwise binary.exe would not start.
> I do not know whether there is a `locale` equivalent in Windows
> command prompt, so I merely ran my program.
>     C:\Users\Public>binary.exe "foo bar" "Jérôme"
>     C=binary.exe  "foo bar" "J□r□me"
>     0=binary
>     1=foo bar
>     2="Jérôme"
> 
> This behaviour is not expected and is quite inconsistent with what
> happened through Bash.
> Besides the "strange squares" that appear on the first line, and the
> extra space after binary.exe, I especially did not expect "Jérôme" to
> remain quoted as a second argument.
> 
> Sorry for the delay in my answer. I hope this is now clear, please ask
> me for more examples or investigation if you need.
> Thanks for your help.

Create a new or change your current Command Prompt shortcut to run:

	"%windir%\system32\cmd /u"

"/U Causes the output of internal commands to a pipe or file to be Unicode"

and add "chcp 65001":

	"%windir%\system32\cmd /u /k chcp 65001"

or set

	HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor\Autorun

or

	HKEY_CURRENT_USER\Software\Microsoft\Command Processor\AutoRun

to command

	"@chcp 65001 > nul"

e.g.

	> reg add HKEY_CURRENT_USER\Software\Microsoft\Command Processor ^
		/v AutoRun /d "@chcp 65001 > nul" /f

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]


More information about the Cygwin mailing list