akr
d77ddf33ae
add tests for sub/gsub with hash.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15535 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-18 03:51:34 +00:00
akr
1783b7aacc
typo fix.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15534 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-18 03:43:11 +00:00
akr
a74c11cd4a
* re.c (re_warn): defined to restore warnings for /[a-c-e]/, etc.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15532 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-18 02:52:10 +00:00
akr
583a4b1774
* re.c (rb_reg_regsub): don't repeat repl twice with
...
"X".sub!(/./, sprintf("\\%c", 255)).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15527 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-17 15:35:09 +00:00
akr
b8fd2fabbe
* re.c (rb_reg_prepare_re): add enable_warning parameter.
...
(rb_reg_adjust_startpos): disable warning by rb_reg_prepare_re.
(rb_reg_search): follow rb_reg_prepare_re parameter change.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15524 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-17 12:54:17 +00:00
akr
0f4199fb56
* re.c (rb_reg_quote): return US-ASCII string consistently.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15515 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-17 02:00:05 +00:00
akr
71c5e48598
* include/ruby/re.h (struct rmatch_offset): new struct for character
...
offsets.
(struct rmatch): new struct.
(struct RMatch): reference struct rmatch.
(RMATCH_REGS): new macro.
* re.c (match_alloc): initialize struct rmatch.
(pair_byte_cmp): new function.
(update_char_offset): update character offsets.
(match_init_copy): copy regexp and character offsets.
(match_sublen): removed.
(match_offset): use update_char_offset.
(match_begin): ditto.
(match_end): ditto.
(rb_reg_search): make character offset updated flag false.
(match_size): use RMATCH_REGS.
(match_backref_number): ditto.
(rb_reg_nth_defined): ditto.
(rb_reg_nth_match): ditto.
(rb_reg_match_pre): ditto.
(rb_reg_match_post): ditto.
(rb_reg_match_last): ditto.
(match_array): ditto.
(match_aref): ditto.
(match_values_at): ditto.
(match_inspect): ditto.
* string.c (rb_str_subpat_set): use RMATCH_REGS.
(rb_str_sub_bang): ditto.
(str_gsub): ditto.
(rb_str_split_m): ditto.
(scan_once): ditto.
* gc.c (obj_free): free character offsets.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15513 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-16 20:08:35 +00:00
akr
60fa63b819
* re.c (match_inspect): avoid SEGV with MatchData.allocate.inspect.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15509 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-16 11:13:47 +00:00
nobu
17fb1248af
* re.c (rb_reg_quote): set US-ACII for ASCII-only string.
...
[ruby-dev:33785]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15481 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-15 01:35:56 +00:00
akr
ec4756f633
* re.c (rb_reg_preprocess_dregexp): use non-preprocessed regexp source
...
for result.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15465 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-14 03:34:12 +00:00
akr
d5c8ad5359
* insns.def (toregexp): generate a regexp from strings instead of one
...
string.
* re.c (rb_reg_new_ary): defined for toregexp. it concatenates
strings after each string is preprocessed.
* compile.c (compile_dstr_fragments): split from compile_dstr.
(compile_dstr): call compile_dstr_fragments.
(compile_dregx): defined for dynamic regexp.
(iseq_compile_each): use compile_dregx for dynamic regexp.
[ruby-dev:33400]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15311 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-29 08:03:51 +00:00
naruse
3c6969ec11
* string.c, parse.y, re.c: use rb_ascii8bit_encoding.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15292 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-28 09:03:09 +00:00
akr
fc208c1bd5
* include/ruby/oniguruma.h: precise mbclen API redesigned to avoid
...
inline functions.
(onigenc_mbclen_charfound): removed.
(onigenc_mbclen_needmore): removed.
(onigenc_mbclen_recover): removed.
(ONIGENC_MBCLEN_CHARFOUND): removed.
(ONIGENC_MBCLEN_CHARFOUND_P): defined.
(ONIGENC_MBCLEN_CHARFOUND_LEN): defined.
(ONIGENC_MBCLEN_INVALID): removed.
(ONIGENC_MBCLEN_INVALID_P): defined.
(ONIGENC_MBCLEN_NEEDMORE): removed.
(ONIGENC_MBCLEN_NEEDMORE_P): defined.
(ONIGENC_MBCLEN_NEEDMORE_LEN): defined.
(ONIGENC_MBC_ENC_LEN): use onigenc_mbclen_approximate.
* regenc.c (onigenc_mbclen_approximate): defined.
* include/ruby/encoding.h (MBCLEN_CHARFOUND): removed.
(MBCLEN_INVALID): removed.
(MBCLEN_NEEDMORE): removed.
(MBCLEN_CHARFOUND_P): defined.
(MBCLEN_INVALID_P): defined.
(MBCLEN_NEEDMORE_P): defined.
(MBCLEN_CHARFOUND_LEN): defined.
(MBCLEN_NEEDMORE_LEN): defined.
* encoding.c: use new API.
* re.c: ditto.
* string.c: ditto.
* parse.y: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15280 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-27 14:27:07 +00:00
naruse
f3fe101d55
* re.c (rb_reg_source): set encoding as regexp encoding.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15265 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-27 07:26:51 +00:00
akr
b9c18bdcdd
* re.c (rb_reg_preprocess): force fixed encoding when ASCII
...
incompatible source string.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15260 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-26 21:01:52 +00:00
akr
1e41069754
* include/ruby/intern.h (rb_str_buf_cat_ascii): declared.
...
* string.c (rb_str_buf_cat_ascii): defined.
* re.c (rb_reg_s_union): use rb_str_buf_cat_ascii to support ASCII
incompatible encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-25 07:35:27 +00:00
usa
b1257d4d20
* re.c (rb_reg_fixed_encoding_p): no need to treat ASCII-8BIT specially.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-24 09:15:03 +00:00
usa
fbe52683e6
* re.c (rb_reg_initialize): 7bit clean regexp should be US-ASCII.
...
[ruby-dev:33346]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15212 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-24 07:56:12 +00:00
akr
3766eac339
* re.c (rb_reg_prepare_re): fix SEGV by
...
/a/ =~ "aa".force_encoding("utf-16be").
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15178 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-23 04:40:43 +00:00
usa
61fd7dbf6d
* re.c (rb_char_to_option_kcode): Regexp switch `s' should mean
...
Windows-31J, as wells as `-Ks'.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15101 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-18 00:44:15 +00:00
nobu
a0029e3adc
* re.c (rb_char_to_option_kcode): fixed typo.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-17 12:48:23 +00:00
matz
d9ff499bf3
* re.c (rb_char_to_option_kcode): use rb_enc_find_index() instead
...
of using fixed index value.
* enc/Makefile.in (encsrcdir): make US-ASCII built-in.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15047 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-14 13:49:29 +00:00
akr
a31e2da12c
* re.c (rb_reg_prepare_re): initialize error message buffer.
...
(rb_reg_search): ditto.
(rb_reg_check_preprocess): ditto.
(rb_reg_new_str): ditto.
(rb_enc_reg_new): ditto.
(rb_reg_compile): ditto.
(rb_reg_initialize_m): ditto.
(rb_reg_s_union_m): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15034 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-14 04:51:10 +00:00
akr
238c59842c
* re.c (rb_reg_preprocess): fix fixed_enc condition.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-07 04:55:26 +00:00
akr
063beac343
* encoding.c (rb_enc_internal_get_index): extracted from
...
rb_enc_get_index.
(rb_enc_internal_set_index): extracted from rb_enc_associate_index
* include/ruby/encoding.h (ENCODING_SET): work over ENCODING_INLINE_MAX.
(ENCODING_GET): ditto.
(ENCODING_IS_ASCII8BIT): defined.
(ENCODING_CODERANGE_SET): defined.
* re.c (rb_reg_fixed_encoding_p): use ENCODING_IS_ASCII8BIT.
* string.c (rb_enc_str_buf_cat): use ENCODING_IS_ASCII8BIT.
* parse.y (reg_fragment_setenc_gen): use ENCODING_IS_ASCII8BIT.
* marshal.c (has_ivars): use ENCODING_IS_ASCII8BIT.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14922 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-07 02:49:01 +00:00
akr
f38cc001a7
* re.c (rb_reg_initialize_str): forbid raw non ASCII character
...
for ASCII-8BIT regexp in non ASCII-8BIT script.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14911 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-06 12:15:48 +00:00
akr
8987b97ca9
* include/ruby/encoding.h (rb_enc_str_buf_cat): declared.
...
* string.c (coderange_scan): extracted from rb_enc_str_coderange.
(rb_enc_str_coderange): use coderange_scan.
(rb_str_shared_replace): copy encoding and coderange.
(rb_enc_str_buf_cat): new function for linear complexity string
accumulation with encoding.
(rb_str_sub_bang): don't conflict substituted part and replacement.
(str_gsub): use rb_enc_str_buf_cat.
(rb_str_clear): clear coderange.
* re.c (rb_reg_regsub): use rb_enc_str_buf_cat.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14910 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-06 09:25:09 +00:00
akr
da42c102c1
* re.c (rb_reg_initialize_str): /\x80/n is not an error even if script
...
encoding is EUC-JP.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14899 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-05 16:39:38 +00:00
nobu
8638ee26e7
* include/ruby/intern.h, re.c (rb_reg_new): keep interface same as
...
1.8. [ruby-core:14583]
* include/ruby/intern.h, re.c (rb_reg_new_str): renamed, and defines
HAVE_RB_REG_NEW_STR macro to tell if it is available.
* include/ruby/encoding.h (rb_enc_reg_new): added.
* insns.def (toregexp), marshal.c (r_object0): use rb_reg_new_str().
* re.c (rb_reg_regcomp, rb_reg_s_union): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14884 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 16:30:33 +00:00
akr
f780cdec75
* re.c (rb_reg_prepare_re): check string encoding. Oniguruma doesn't
...
support invalid encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14880 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 05:01:58 +00:00
akr
7d98c90ef2
unused variable removed.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14879 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 03:13:53 +00:00
matz
22e7258275
* re.c (rb_reg_search): avoid inner loop for reverse search.
...
* regexec.c: unset USE_MATCH_RANGE_MUST_BE_INSIDE_OF_SPECIFIED_RANGE
which is turned on since oniguruma 5.9.1.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14878 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 01:24:12 +00:00
akr
52f9c1d2e1
* re.c (rb_reg_search): iterate onig_match for reverse mode.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14876 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-03 17:48:06 +00:00
akr
e21907e0f8
fix typos.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14810 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-31 05:52:59 +00:00
nobu
5ee7f4b0b5
* re.c (rb_reg_regsub): returns the given string itself if nothing
...
changed.
* string.c (rb_str_sub_bang): keeps code-range as possible.
* string.c (str_gsub): adjusts code-range. [ruby-core:14566]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14782 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-29 13:44:32 +00:00
akr
fd640aec82
* re.c (rb_reg_s_union): show encodings in error message.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14734 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-27 07:38:23 +00:00
akr
b910bb7761
* re.c (rb_reg_prepare_re): show regexp encoding in the error message.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14597 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-24 09:38:20 +00:00
akr
5b809a28f8
* include/ruby/encoding.h, encoding.c, re.c, io.c, parse.y, numeric.c,
...
ruby.c, transcode.c: rename rb_ascii_encoding. to
rb_ascii8bit_encoding. rb_ascii_encoding is ambiguous with
ASCII-8BIT and US-ASCII.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14504 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-22 23:47:18 +00:00
akr
fa3d06c738
refine error message.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14475 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-22 07:14:07 +00:00
matz
d7cc14d436
* encoding.c (rb_ascii_encoding): renamed from previous
...
rb_default_encoding().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14443 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 18:55:30 +00:00
matz
b36c642a85
* re.c (rb_reg_prepare_re): stop ENCODING_NONE warning if the
...
encoding of the str is ASCII-8BIT.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14442 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 18:21:41 +00:00
akr
b82a05989e
* re.c (ARG_ENCODING_NONE): defined for /.../n option.
...
(REG_ENCODING_NONE): ditto.
(rb_char_to_option_kcode): return ARG_ENCODING_NONE for n.
(rb_reg_prepare_re): warn /ascii/n =~ "non-ascii".
(rb_reg_initialize): set REG_ENCODING_NONE from ARG_ENCODING_NONE.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14438 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 16:39:36 +00:00
akr
e667720bd4
* re.c (append_utf8): use rb_utf8_encoding() instead of
...
rb_enc_find("utf-8").
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14412 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 07:07:21 +00:00
matz
668bd7d992
* test/ruby/test_system.rb (TestSystem::valid_syntax): apply
...
ASCII-8BIT encoding explicitly.
* re.c (rb_reg_prepare_re): add encoding name in the message.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14402 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 05:03:14 +00:00
akr
59dca19910
* re.c: change "character encodings differ" error messages.
...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14401 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 04:54:54 +00:00
matz
77629d2cbe
* string.c (rb_str_rindex_m): too much adjustment.
...
* re.c (reg_match_pos): pos adjustment should be based on
characters.
* test/ruby/test_m17n.rb (TestM17N::test_str_insert): test updated
to check negative offset behavior.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14340 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-19 17:02:29 +00:00
nobu
474a88f041
* re.c (rb_reg_regsub): should set checked encoding.
...
* string.c (rb_str_sub_bang): applied r14212 too.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14333 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-19 12:42:19 +00:00
akr
2d01290cfd
* parse.y (arg tMATCH arg): call reg_named_capture_assign_gen if regexp
...
literal is used.
(reg_named_capture_assign_gen): assign the result of named capture
into local variables.
[ruby-dev:32588]
* re.c: document the assignment by named captures.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14297 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-18 11:26:24 +00:00
matz
ebfcc5d933
* re.c (rb_reg_initialize): raise error if non-Unicode fixed
...
encoding option is specified for regexp literals with \u{}
escapes.
* string.c (rb_str_squeeze_bang): should squeeze multibyte
characters as well.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14275 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-17 16:06:21 +00:00
matz
d6a70c4bb7
* string.c (scan_once): need no encoding compatibility check.
...
it's done inside of re_reg_seach().
* string.c (rb_str_split_m): ditto.
* re.c (rb_reg_regsub): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14269 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-17 09:44:06 +00:00