46 lines
1.7 KiB
Plaintext
46 lines
1.7 KiB
Plaintext
PURPOSE of /CoreOS/sed/Regression/sed-reports-syntax-errors-with-some-multibyte
|
||
Description: Test for sed reports syntax errors with some multibyte
|
||
Author: Marek Polacek <mpolacek@redhat.com>
|
||
Bug summary: sed reports syntax errors with some multibyte characters
|
||
|
||
Description:
|
||
|
||
Description of problem:
|
||
|
||
Using a multibyte character that ends with 0x5c (backslash) can cause sed to report syntax errors.
|
||
|
||
|
||
Version-Release number of selected component (if applicable): sed-4.1.5-5
|
||
|
||
|
||
How reproducible: Always
|
||
|
||
|
||
Steps to Reproduce:
|
||
1. Start with your shell in a UTF-8 locale, eg en-US.UTF-8 (you can probably do this in a different locale, but it definitely works if you start in a UTF-8 locale).
|
||
|
||
2. Run the follow commands to construct a sed script:
|
||
|
||
U2010=$(echo -ne '\x20\x10' | iconv -f ucs-2be)
|
||
echo "echo '$U2010' | sed 's/$U2010/hyphen/g'" | iconv -t gbk > /tmp/script
|
||
|
||
3. Run the shell script in a locale that uses the gbk character set:
|
||
|
||
LC_ALL=zh_CN.gbk sh /tmp/script 2>&1 | iconv -f gbk
|
||
|
||
Actual results:
|
||
The script reports an error:
|
||
|
||
sed:-e 表达式 #1,字符 13:unterminated `s' command
|
||
|
||
Expected results:
|
||
|
||
The single word "hyphen"
|
||
|
||
|
||
Additional info:
|
||
|
||
The error arises because the character U+2010 (HYPHEN) is encoded as \xa9\x5c in the gbk encoding. Sed sees the "\x5c" as a backslash escaping the following character which, in this case, is the "/" that we hope is going to terminate the pattern; it doesn't and so we get a syntax error.
|
||
|
||
Of course, this is just one character in one encoding. There are likely to be many others and this is just one example. I have another example for SJIS, (U+8868) but SJIS isn't a good encoding to use for reporting bugs :-).
|