mirror of
				https://github.com/ruby/ruby.git
				synced 2022-11-09 12:17:21 -05:00 
			
		
		
		
	* doc/re.rdoc: Document difference between match and =~, options with
Regexp.new and global variables. Patch by Sylvain Daubert. [Ruby 1.9 - Bug #5709] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@33977 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit is contained in:
		
							parent
							
								
									52654367f6
								
							
						
					
					
						commit
						3e204989c1
					
				
					 2 changed files with 75 additions and 2 deletions
				
			
		| 
						 | 
				
			
			@ -1,3 +1,9 @@
 | 
			
		|||
Thu Dec  8 07:20:15 2011  Eric Hodel  <drbrain@segment7.net>
 | 
			
		||||
 | 
			
		||||
	* doc/re.rdoc:  Document difference between match and =~, options with
 | 
			
		||||
	  Regexp.new and global variables.  Patch by Sylvain Daubert.
 | 
			
		||||
	  [Ruby 1.9 - Bug #5709]
 | 
			
		||||
 | 
			
		||||
Thu Dec  8 06:53:10 2011  Eric Hodel  <drbrain@segment7.net>
 | 
			
		||||
 | 
			
		||||
	* doc/re.rdoc:  Fix example code to match documentation.  Patch by
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
							
								
								
									
										71
									
								
								doc/re.rdoc
									
										
									
									
									
								
							
							
						
						
									
										71
									
								
								doc/re.rdoc
									
										
									
									
									
								
							| 
						 | 
				
			
			@ -24,6 +24,32 @@ string matches itself.
 | 
			
		|||
Specifically, <tt>/st/</tt> requires that the string contains the letter
 | 
			
		||||
_s_ followed by the letter _t_, so it matches _haystack_, also.
 | 
			
		||||
 | 
			
		||||
== <tt>=~</tt> and Regexp#match
 | 
			
		||||
 | 
			
		||||
Pattern matching may be achieved by using <tt>=~</tt> operator or Regexp#match
 | 
			
		||||
method.
 | 
			
		||||
 | 
			
		||||
=== <tt>=~</tt> operator
 | 
			
		||||
 | 
			
		||||
<tt>=~</tt> is Ruby's basic pattern-matching operator.  When one operand is a
 | 
			
		||||
regular expression and is a string (this operator is equivalently defined by
 | 
			
		||||
Regexp and String). If a match is found, the operator returns index of first
 | 
			
		||||
match in string, otherwise it returns +nil+.
 | 
			
		||||
 | 
			
		||||
    /hay/ =~ 'haystack'   #=> 0
 | 
			
		||||
    /a/   =~ 'haystack'   #=> 1
 | 
			
		||||
    /u/   =~ 'haystack'   #=> nil
 | 
			
		||||
 | 
			
		||||
Using <tt>=~</tt> operator with a String and Regexp the <tt>$~</tt> global
 | 
			
		||||
variable is set after a successful match.  <tt>$~</tt> holds a MatchData
 | 
			
		||||
object. Regexp.last_match is equivalent to <tt>$~</tt>.
 | 
			
		||||
 | 
			
		||||
=== Regexp#match method
 | 
			
		||||
 | 
			
		||||
#match method return a MatchData object :
 | 
			
		||||
 | 
			
		||||
    /st/.match('haystack')   #=> #<MatchData "st">
 | 
			
		||||
 | 
			
		||||
== Metacharacters and Escapes
 | 
			
		||||
 | 
			
		||||
The following are <i>metacharacters</i> <tt>(</tt>, <tt>)</tt>,
 | 
			
		||||
| 
						 | 
				
			
			@ -111,7 +137,7 @@ matches any character in the Unicode _Nd_ category.
 | 
			
		|||
* <tt>/[[:print:]]/</tt> - Like [:graph:], but includes the space character
 | 
			
		||||
* <tt>/[[:punct:]]/</tt> - Punctuation character
 | 
			
		||||
* <tt>/[[:space:]]/</tt> - Whitespace character (<tt>[:blank:]</tt>, newline,
 | 
			
		||||
   carriage return, etc.)
 | 
			
		||||
  carriage return, etc.)
 | 
			
		||||
* <tt>/[[:upper:]]/</tt> - Uppercase alphabetical
 | 
			
		||||
* <tt>/[[:xdigit:]]/</tt> - Digit allowed in a hexadecimal number (i.e.,
 | 
			
		||||
  0-9a-fA-F)
 | 
			
		||||
| 
						 | 
				
			
			@ -169,7 +195,7 @@ jeopardises the overall match.
 | 
			
		|||
Parentheses can be used for <i>capturing</i>. The text enclosed by the
 | 
			
		||||
<i>n</i><sup>th</sup> group of parentheses can be subsequently referred to
 | 
			
		||||
with <i>n</i>. Within a pattern use the <i>backreference</i>
 | 
			
		||||
<tt>\</tt><i>n</i>; outside of the pattern use
 | 
			
		||||
<tt>\n</tt>; outside of the pattern use
 | 
			
		||||
<tt>MatchData[</tt><i>n</i><tt>]</tt>.
 | 
			
		||||
 | 
			
		||||
    # 'at' is captured by the first group of parentheses, then referred to
 | 
			
		||||
| 
						 | 
				
			
			@ -473,6 +499,13 @@ expression enclosed by the parentheses.
 | 
			
		|||
    /a(?i:b)c/.match('aBc') #=> #<MatchData "aBc">
 | 
			
		||||
    /a(?i:b)c/.match('abc') #=> #<MatchData "abc">
 | 
			
		||||
 | 
			
		||||
Options may also be used with <tt>Regexp.new</tt>:
 | 
			
		||||
 | 
			
		||||
    Regexp.new("abc", Regexp::IGNORECASE)                     #=> /abc/i
 | 
			
		||||
    Regexp.new("abc", Regexp::MULTILINE)                      #=> /abc/m
 | 
			
		||||
    Regexp.new("abc # Comment", Regexp::EXTENDED)             #=> /abc # Comment/x
 | 
			
		||||
    Regexp.new("abc", Regexp::IGNORECASE | Regexp::MULTILINE) #=> /abc/mi
 | 
			
		||||
 | 
			
		||||
== Free-Spacing Mode and Comments
 | 
			
		||||
 | 
			
		||||
As mentioned above, the <tt>x</tt> option enables <i>free-spacing</i>
 | 
			
		||||
| 
						 | 
				
			
			@ -525,6 +558,40 @@ regexp's encoding can be explicitly fixed by supplying
 | 
			
		|||
       #=> Encoding::CompatibilityError: incompatible encoding regexp match
 | 
			
		||||
            (ISO-8859-1 regexp with UTF-8 string)
 | 
			
		||||
 | 
			
		||||
== Special global variables
 | 
			
		||||
 | 
			
		||||
Pattern matching sets some global variables :
 | 
			
		||||
* <tt>$~</tt> is equivalent to Regexp.last_match;
 | 
			
		||||
* <tt>$&</tt> contains the complete matched text;
 | 
			
		||||
* <tt>$`</tt> contains string before match;
 | 
			
		||||
* <tt>$'</tt> contains string after match;
 | 
			
		||||
* <tt>$1</tt>, <tt>$2</tt> and so on contain text matching first, second, etc
 | 
			
		||||
  capture group;
 | 
			
		||||
* <tt>$+</tt> contains last capture group.
 | 
			
		||||
 | 
			
		||||
Example:
 | 
			
		||||
 | 
			
		||||
    m = /s(\w{2}).*(c)/.match('haystack') #=> #<MatchData "stac" 1:"ta" 2:"c">
 | 
			
		||||
    $~                                    #=> #<MatchData "stac" 1:"ta" 2:"c">
 | 
			
		||||
    Regexp.latch_match                    #=> #<MatchData "stac" 1:"ta" 2:"c">
 | 
			
		||||
 | 
			
		||||
    $&      #=> "stac"
 | 
			
		||||
            # same as m[0]
 | 
			
		||||
    $`      #=> "hay"
 | 
			
		||||
            # same as m.pre_match
 | 
			
		||||
    $'      #=> "k"
 | 
			
		||||
            # same as m.post_match
 | 
			
		||||
    $1      #=> "ta"
 | 
			
		||||
            # same as m[1]
 | 
			
		||||
    $2      #=> "c"
 | 
			
		||||
            # same as m[2]
 | 
			
		||||
    $3      #=> nil
 | 
			
		||||
            # no third group in pattern
 | 
			
		||||
    $+      #=> "c"
 | 
			
		||||
            # same as m[-1]
 | 
			
		||||
 | 
			
		||||
These global variables are thread-local and method-local varaibles.
 | 
			
		||||
 | 
			
		||||
== Performance
 | 
			
		||||
 | 
			
		||||
Certain pathological combinations of constructs can lead to abysmally bad
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue