mirror of
https://github.com/ruby/ruby.git
synced 2022-11-09 12:17:21 -05:00
fix range check for Hangul jamo trailers in Unicode normalization
* lib/unicode_normalize/normalize.rb: Fix the range check for trailing
Hangul jamo characters in Unicode normalization. Different from
leading or vowel jamos, where LBASE and VBASE are actual characters,
a value equal to TBASE expresses the absence of a trailing jamo.
This fix is technically correct, but there was no bug because
the regular expressions in lib/unicode_normalize/tables.rb
eliminate jamos equal to TBASE from normalization processing.
* test/test_unicode_normalize.rb: Add preventive test
test_no_trailing_jamo based on
d134809cd3
just for the case we ever get a regression.
This closes issue #14934, thanks to MaLin (Lin Ma) for reporting.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit is contained in:
parent
9eb6304aa9
commit
a7acec6750
2 changed files with 8 additions and 1 deletions
|
@ -70,7 +70,7 @@ module UnicodeNormalize # :nodoc:
|
|||
if length>1 and 0 <= (lead =string[0].ord-LBASE) and lead < LCOUNT and
|
||||
0 <= (vowel=string[1].ord-VBASE) and vowel < VCOUNT
|
||||
lead_vowel = SBASE + (lead * VCOUNT + vowel) * TCOUNT
|
||||
if length>2 and 0 <= (trail=string[2].ord-TBASE) and trail < TCOUNT
|
||||
if length>2 and 0 < (trail=string[2].ord-TBASE) and trail < TCOUNT
|
||||
(lead_vowel + trail).chr(Encoding::UTF_8) + string[3..-1]
|
||||
else
|
||||
lead_vowel.chr(Encoding::UTF_8) + string[2..-1]
|
||||
|
|
|
@ -167,6 +167,13 @@ class TestUnicodeNormalize
|
|||
assert_equal "\u1100\u1161\u11A8", "\uAC00\u11A8".unicode_normalize(:nfd)
|
||||
end
|
||||
|
||||
# preventive tests for (non-)bug #14934
|
||||
def test_no_trailing_jamo
|
||||
assert_equal "\u1100\u1176\u11a8", "\u1100\u1176\u11a8".unicode_normalize(:nfc)
|
||||
assert_equal "\uae30\u11a7", "\u1100\u1175\u11a7".unicode_normalize(:nfc)
|
||||
assert_equal "\uae30\u11c3", "\u1100\u1175\u11c3".unicode_normalize(:nfc)
|
||||
end
|
||||
|
||||
def test_hangul_plus_accents
|
||||
assert_equal "\uAC00\u0323\u0300", "\uAC00\u0300\u0323".unicode_normalize(:nfc)
|
||||
assert_equal "\uAC00\u0323\u0300", "\u1100\u1161\u0300\u0323".unicode_normalize(:nfc)
|
||||
|
|
Loading…
Reference in a new issue