update to Unicode Version 12.1.0 (beta)

Unicode Version 12.1.0 adds one single character, U+32FF SQUARE ERA NAME REIWA, for the new Japanese era starting on May 1st. 12.1.0 will be finalized only on May 7th, so we go with the beta version because further changes in the data we need are highly unlikely, and we want to make sure Ruby is ready for the new era. * common.mk: change UNICODE_VERSION to 12.1.0, UNICODE_BETA to YES * enc/unicode/12.1.0, enc/unicode/12.1.0/casefold.h, enc/unicode/12.1.0/name2ctype.h: add directory and generated data files for new version * lib/unicode_normalize/tables.rb: update for new character * test/ruby/test_regexp.rb: add test for character property age=12.1 * test/test_unicode_normalize.rb: add test for NFKC decomposition of new character This (mostly) completes issue #15195. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2022-11-09 12:17:21 -05:00 · 2019-04-05 00:58:51 +00:00 · 2019-04-05 00:58:51 +00:00 · 7fe64d17d3
commit 7fe64d17d3
parent c8d60fddc0
6 changed files with 49249 additions and 4 deletions
--- a/common.mk
+++ b/common.mk
@ -15,9 +15,9 @@ mflags = $(MFLAGS)
 gnumake_recursive =
 enable_shared = $(ENABLE_SHARED:no=)

-UNICODE_VERSION = 12.0.0
+UNICODE_VERSION = 12.1.0
 UNICODE_EMOJI_VERSION = 12.0
-UNICODE_BETA = NO
+UNICODE_BETA = YES

 ### set the following environment variable or uncomment the line if
 ### the Unicode data files should be updated completely on every update ('make up',...).
--- a/enc/unicode/12.1.0/casefold.h
+++ b/enc/unicode/12.1.0/casefold.h
--- a/enc/unicode/12.1.0/name2ctype.h
+++ b/enc/unicode/12.1.0/name2ctype.h
--- a/lib/unicode_normalize/tables.rb
+++ b/lib/unicode_normalize/tables.rb
@ -1388,8 +1388,7 @@ module UnicodeNormalize  # :nodoc:
    "\u3200-\u321E" \
    "\u3220-\u3247" \
    "\u3250-\u327E" \
-    "\u3280-\u32FE" \
-    "\u3300-\u33FF" \
+    "\u3280-\u33FF" \
    "\uA69C\uA69D" \
    "\uA770" \
    "\uA7F8\uA7F9" \
@ -5493,6 +5492,7 @@ module UnicodeNormalize  # :nodoc:
    "\u32FC"=>"\u30F0",
    "\u32FD"=>"\u30F1",
    "\u32FE"=>"\u30F2",
+    "\u32FF"=>"\u4EE4\u548C",
    "\u3300"=>"\u30A2\u30D1\u30FC\u30C8",
    "\u3301"=>"\u30A2\u30EB\u30D5\u30A1",
    "\u3302"=>"\u30A2\u30F3\u30DA\u30A2",
--- a/test/ruby/test_regexp.rb
+++ b/test/ruby/test_regexp.rb
@ -1075,6 +1075,9 @@ class TestRegexp < Test::Unit::TestCase
    assert_no_match(/^\p{age=3.0}$/u, "\u2754")
    assert_no_match(/^\p{age=2.0}$/u, "\u2754")
    assert_no_match(/^\p{age=1.1}$/u, "\u2754")
+
+    assert_no_match(/^\p{age=12.0}$/u, "\u32FF")
+    assert_match(/^\p{age=12.1}$/u, "\u32FF")
  end

  MatchData_A = eval("class MatchData_\u{3042} < MatchData; self; end")
--- a/test/test_unicode_normalize.rb
+++ b/test/test_unicode_normalize.rb
@ -187,6 +187,10 @@ class TestUnicodeNormalize
    assert_raise(Encoding::CompatibilityError) { "abc".force_encoding('ISO-8859-1').unicode_normalized? }
  end

+  def test_reiwa
+    assert_equal "\u4EE4\u548C", "\u32FF".unicode_normalize(:nfkc)
+  end
+
  def test_us_ascii
    ascii_string = 'abc'.encode('US-ASCII')