1
0
Fork 0
mirror of https://github.com/ruby/ruby.git synced 2022-11-09 12:17:21 -05:00

* doc/regexp.rdoc: [DOC] Replace paragraphs in verbatim sections with

plain paragraphs to improve readability as ri and HTML.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@42958 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit is contained in:
drbrain 2013-09-17 03:56:32 +00:00
parent 3ee01c2980
commit 4afabb5a88
2 changed files with 86 additions and 51 deletions

View file

@ -1,3 +1,8 @@
Tue Sep 17 12:55:58 2013 Eric Hodel <drbrain@segment7.net>
* doc/regexp.rdoc: [DOC] Replace paragraphs in verbatim sections with
plain paragraphs to improve readability as ri and HTML.
Mon Sep 16 07:32:35 2013 Tadayoshi Funaba <tadf@dotrb.org> Mon Sep 16 07:32:35 2013 Tadayoshi Funaba <tadf@dotrb.org>
* complex.c: removed meaningless lines. * complex.c: removed meaningless lines.

View file

@ -16,9 +16,12 @@ example:
If a string contains the pattern it is said to <i>match</i>. A literal If a string contains the pattern it is said to <i>match</i>. A literal
string matches itself. string matches itself.
# 'haystack' does not contain the pattern 'needle', so doesn't match. Here 'haystack' does not contain the pattern 'needle', so it doesn't match:
/needle/.match('haystack') #=> nil /needle/.match('haystack') #=> nil
# 'haystack' does contain the pattern 'hay', so it matches
Here 'haystack' contains the pattern 'hay', so it matches:
/hay/.match('haystack') #=> #<MatchData "hay"> /hay/.match('haystack') #=> #<MatchData "hay">
Specifically, <tt>/st/</tt> requires that the string contains the letter Specifically, <tt>/st/</tt> requires that the string contains the letter
@ -50,7 +53,7 @@ object. Regexp.last_match is equivalent to <tt>$~</tt>.
=== Regexp#match method === Regexp#match method
#match method return a MatchData object : The #match method returns a MatchData object:
/st/.match('haystack') #=> #<MatchData "st"> /st/.match('haystack') #=> #<MatchData "st">
@ -108,7 +111,9 @@ operator which performs set intersection on its arguments. The two can be
combined as follows: combined as follows:
/[a-w&&[^c-g]z]/ # ([a-w] AND ([^c-g] OR z)) /[a-w&&[^c-g]z]/ # ([a-w] AND ([^c-g] OR z))
# This is equivalent to:
This is equivalent to:
/[abh-w]/ /[abh-w]/
The following metacharacters also behave like character classes: The following metacharacters also behave like character classes:
@ -173,8 +178,9 @@ to occur. Such metacharacters are called <i>quantifiers</i>.
* <tt>{</tt><i>n</i><tt>,</tt><i>m</i><tt>}</tt> - At least <i>n</i> and * <tt>{</tt><i>n</i><tt>,</tt><i>m</i><tt>}</tt> - At least <i>n</i> and
at most <i>m</i> times at most <i>m</i> times
# At least one uppercase character ('H'), at least one lowercase At least one uppercase character ('H'), at least one lowercase character
# character ('e'), two 'l' characters, then one 'o' ('e'), two 'l' characters, then one 'o':
"Hello".match(/[[:upper:]]+[[:lower:]]+l{2}o/) #=> #<MatchData "Hello"> "Hello".match(/[[:upper:]]+[[:lower:]]+l{2}o/) #=> #<MatchData "Hello">
Repetition is <i>greedy</i> by default: as many occurrences as possible Repetition is <i>greedy</i> by default: as many occurrences as possible
@ -183,9 +189,10 @@ contrast, <i>lazy</i> matching makes the minimal amount of matches
necessary for overall success. A greedy metacharacter can be made lazy by necessary for overall success. A greedy metacharacter can be made lazy by
following it with <tt>?</tt>. following it with <tt>?</tt>.
# Both patterns below match the string. The first uses a greedy Both patterns below match the string. The first uses a greedy quantifier so
# quantifier so '.+' matches '<a><b>'; the second uses a lazy '.+' matches '<a><b>'; the second uses a lazy quantifier so '.+?' matches
# quantifier so '.+?' matches '<a>'. '<a>':
/<.+>/.match("<a><b>") #=> #<MatchData "<a><b>"> /<.+>/.match("<a><b>") #=> #<MatchData "<a><b>">
/<.+?>/.match("<a><b>") #=> #<MatchData "<a>"> /<.+?>/.match("<a><b>") #=> #<MatchData "<a>">
@ -202,12 +209,15 @@ with <i>n</i>. Within a pattern use the <i>backreference</i>
<tt>\n</tt>; outside of the pattern use <tt>\n</tt>; outside of the pattern use
<tt>MatchData[</tt><i>n</i><tt>]</tt>. <tt>MatchData[</tt><i>n</i><tt>]</tt>.
# 'at' is captured by the first group of parentheses, then referred to 'at' is captured by the first group of parentheses, then referred to later
# later with \1 with <tt>\1</tt>:
/[csh](..) [csh]\1 in/.match("The cat sat in the hat") /[csh](..) [csh]\1 in/.match("The cat sat in the hat")
#=> #<MatchData "cat sat in" 1:"at"> #=> #<MatchData "cat sat in" 1:"at">
# Regexp#match returns a MatchData object which makes the captured
# text available with its #[] method. Regexp#match returns a MatchData object which makes the captured text
available with its #[] method:
/[csh](..) [csh]\1 in/.match("The cat sat in the hat")[1] #=> 'at' /[csh](..) [csh]\1 in/.match("The cat sat in the hat")[1] #=> 'at'
Capture groups can be referred to by name when defined with the Capture groups can be referred to by name when defined with the
@ -239,11 +249,13 @@ also assigned to local variables with corresponding names.
Parentheses also <i>group</i> the terms they enclose, allowing them to be Parentheses also <i>group</i> the terms they enclose, allowing them to be
quantified as one <i>atomic</i> whole. quantified as one <i>atomic</i> whole.
# The pattern below matches a vowel followed by 2 word characters: The pattern below matches a vowel followed by 2 word characters:
# 'aen'
/[aeiou]\w{2}/.match("Caenorhabditis elegans") #=> #<MatchData "aen"> /[aeiou]\w{2}/.match("Caenorhabditis elegans") #=> #<MatchData "aen">
# Whereas the following pattern matches a vowel followed by a word
# character, twice, i.e. <tt>[aeiou]\w[aeiou]\w</tt>: 'enor'. Whereas the following pattern matches a vowel followed by a word character,
twice, i.e. <tt>[aeiou]\w[aeiou]\w</tt>: 'enor'.
/([aeiou]\w){2}/.match("Caenorhabditis elegans") /([aeiou]\w){2}/.match("Caenorhabditis elegans")
#=> #<MatchData "enor" 1:"or"> #=> #<MatchData "enor" 1:"or">
@ -252,13 +264,16 @@ capturing. That is, it combines the terms it contains into an atomic whole
without creating a backreference. This benefits performance at the slight without creating a backreference. This benefits performance at the slight
expense of readability. expense of readability.
# The group of parentheses captures 'n' and the second 'ti'. The The first group of parentheses captures 'n' and the second 'ti'. The second
# second group is referred to later with the backreference \2 group is referred to later with the backreference <tt>\2</tt>:
/I(n)ves(ti)ga\2ons/.match("Investigations") /I(n)ves(ti)ga\2ons/.match("Investigations")
#=> #<MatchData "Investigations" 1:"n" 2:"ti"> #=> #<MatchData "Investigations" 1:"n" 2:"ti">
# The first group of parentheses is now made non-capturing with '?:',
# so it still matches 'n', but doesn't create the backreference. Thus, The first group of parentheses is now made non-capturing with '?:', so it
# the backreference \1 now refers to 'ti'. still matches 'n', but doesn't create the backreference. Thus, the
backreference <tt>\1</tt> now refers to 'ti'.
/I(?:n)ves(ti)ga\1ons/.match("Investigations") /I(?:n)ves(ti)ga\1ons/.match("Investigations")
#=> #<MatchData "Investigations" 1:"ti"> #=> #<MatchData "Investigations" 1:"ti">
@ -273,14 +288,16 @@ way <i>pat</i> is treated as a non-divisible whole. Atomic grouping is
typically used to optimise patterns so as to prevent the regular typically used to optimise patterns so as to prevent the regular
expression engine from backtracking needlessly. expression engine from backtracking needlessly.
# The <tt>"</tt> in the pattern below matches the first character of The <tt>"</tt> in the pattern below matches the first character of the string,
# the string, then <tt>.*</tt> matches <i>Quote"</i>. This causes the then <tt>.*</tt> matches <i>Quote"</i>. This causes the overall match to fail,
# overall match to fail, so the text matched by <tt>.*</tt> is so the text matched by <tt>.*</tt> is backtracked by one position, which
# backtracked by one position, which leaves the final character of the leaves the final character of the string available to match <tt>"</tt>
# string available to match <tt>"</tt>
/".*"/.match('"Quote"') #=> #<MatchData "\"Quote\""> /".*"/.match('"Quote"') #=> #<MatchData "\"Quote\"">
# If <tt>.*</tt> is grouped atomically, it refuses to backtrack
# <i>Quote"</i>, even though this means that the overall match fails If <tt>.*</tt> is grouped atomically, it refuses to backtrack <i>Quote"</i>,
even though this means that the overall match fails
/"(?>.*)"/.match('"Quote"') #=> nil /"(?>.*)"/.match('"Quote"') #=> nil
== Subexpression Calls == Subexpression Calls
@ -290,9 +307,10 @@ subexpression named _name_, which can be a group name or number, again.
This differs from backreferences in that it re-executes the group rather This differs from backreferences in that it re-executes the group rather
than simply trying to re-match the same text. than simply trying to re-match the same text.
# Matches a <i>(</i> character and assigns it to the <tt>paren</tt> This pattern matches a <i>(</i> character and assigns it to the <tt>paren</tt>
# group, tries to call that the <tt>paren</tt> sub-expression again group, tries to call that the <tt>paren</tt> sub-expression again but fails,
# but fails, then matches a literal <i>)</i>. then matches a literal <i>)</i>:
/\A(?<paren>\(\g<paren>*\))*\z/ =~ '()' /\A(?<paren>\(\g<paren>*\))*\z/ =~ '()'
@ -426,15 +444,17 @@ following scripts are supported: <i>Arabic</i>, <i>Armenian</i>,
<i>Tamil</i>, <i>Telugu</i>, <i>Thaana</i>, <i>Thai</i>, <i>Tibetan</i>, <i>Tamil</i>, <i>Telugu</i>, <i>Thaana</i>, <i>Thai</i>, <i>Tibetan</i>,
<i>Tifinagh</i>, <i>Ugaritic</i>, <i>Vai</i>, and <i>Yi</i>. <i>Tifinagh</i>, <i>Ugaritic</i>, <i>Vai</i>, and <i>Yi</i>.
# Unicode codepoint U+06E9 is named "ARABIC PLACE OF SAJDAH" and Unicode codepoint U+06E9 is named "ARABIC PLACE OF SAJDAH" and belongs to the
# belongs to the Arabic script. Arabic script:
/\p{Arabic}/.match("\u06E9") #=> #<MatchData "\u06E9"> /\p{Arabic}/.match("\u06E9") #=> #<MatchData "\u06E9">
All character properties can be inverted by prefixing their name with a All character properties can be inverted by prefixing their name with a
caret (<tt>^</tt>). caret (<tt>^</tt>).
# Letter 'A' is not in the Unicode Ll (Letter; Lowercase) category, so Letter 'A' is not in the Unicode Ll (Letter; Lowercase) category, so this
# this match succeeds match succeeds:
/\p{^Ll}/.match("A") #=> #<MatchData "A"> /\p{^Ll}/.match("A") #=> #<MatchData "A">
== Anchors == Anchors
@ -465,22 +485,30 @@ characters, <i>anchoring</i> the match to a specific position.
assertion: ensures that the preceding characters do not match assertion: ensures that the preceding characters do not match
<i>pat</i>, but doesn't include those characters in the matched text <i>pat</i>, but doesn't include those characters in the matched text
# If a pattern isn't anchored it can begin at any point in the string If a pattern isn't anchored it can begin at any point in the string:
/real/.match("surrealist") #=> #<MatchData "real"> /real/.match("surrealist") #=> #<MatchData "real">
# Anchoring the pattern to the beginning of the string forces the
# match to start there. 'real' doesn't occur at the beginning of the Anchoring the pattern to the beginning of the string forces the match to start
# string, so now the match fails there. 'real' doesn't occur at the beginning of the string, so now the match
fails:
/\Areal/.match("surrealist") #=> nil /\Areal/.match("surrealist") #=> nil
# The match below fails because although 'Demand' contains 'and', the
pattern does not occur at a word boundary. The match below fails because although 'Demand' contains 'and', the pattern
does not occur at a word boundary.
/\band/.match("Demand") /\band/.match("Demand")
# Whereas in the following example 'and' has been anchored to a
# non-word boundary so instead of matching the first 'and' it matches Whereas in the following example 'and' has been anchored to a non-word
# from the fourth letter of 'demand' instead boundary so instead of matching the first 'and' it matches from the fourth
letter of 'demand' instead:
/\Band.+/.match("Supply and demand curve") #=> #<MatchData "and curve"> /\Band.+/.match("Supply and demand curve") #=> #<MatchData "and curve">
# The pattern below uses positive lookahead and positive lookbehind to
# match text appearing in <b></b> tags without including the tags in the The pattern below uses positive lookahead and positive lookbehind to match
# match text appearing in <b></b> tags without including the tags in the match:
/(?<=<b>)\w+(?=<\/b>)/.match("Fortune favours the <b>bold</b>") /(?<=<b>)\w+(?=<\/b>)/.match("Fortune favours the <b>bold</b>")
#=> #<MatchData "bold"> #=> #<MatchData "bold">
@ -518,7 +546,8 @@ octothorpe (<tt>#</tt>) character introduces a comment until the end of
the line. This allows the components of the pattern to be organised in a the line. This allows the components of the pattern to be organised in a
potentially more readable fashion. potentially more readable fashion.
# A contrived pattern to match a number with optional decimal places A contrived pattern to match a number with optional decimal places:
float_pat = /\A float_pat = /\A
[[:digit:]]+ # 1 or more digits before the decimal point [[:digit:]]+ # 1 or more digits before the decimal point
(\. # Decimal point (\. # Decimal point
@ -634,8 +663,9 @@ backtracking:
A similar case is typified by the following example, which takes A similar case is typified by the following example, which takes
approximately 60 seconds to execute for me: approximately 60 seconds to execute for me:
# Match a string of 29 <i>a</i>s against a pattern of 29 optional Match a string of 29 <i>a</i>s against a pattern of 29 optional <i>a</i>s
# <i>a</i>s followed by 29 mandatory <i>a</i>s. followed by 29 mandatory <i>a</i>s:
Regexp.new('a?' * 29 + 'a' * 29) =~ 'a' * 29 Regexp.new('a?' * 29 + 'a' * 29) =~ 'a' * 29
The 29 optional <i>a</i>s match the string, but this prevents the 29 The 29 optional <i>a</i>s match the string, but this prevents the 29