mirror of
https://github.com/ruby/ruby.git
synced 2022-11-09 12:17:21 -05:00
* lib/racc: Merge Racc documentation downstream, add grammar ref file
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@39050 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit is contained in:
parent
f11ff34d8f
commit
941ea3713f
3 changed files with 420 additions and 13 deletions
|
@ -1,3 +1,7 @@
|
|||
Tue Feb 5 09:55:00 2013 Zachary Scott <zachary@zacharyscott.net>
|
||||
|
||||
* lib/racc: Merge Racc documentation downstream, add grammar ref file
|
||||
|
||||
Tue Feb 5 08:03:00 2013 Zachary Scott <zachary@zacharyscott.net>
|
||||
|
||||
* lib/irb.rb, lib/irb/ext/save-history.rb: Add documentation on how to
|
||||
|
|
|
@ -18,10 +18,164 @@ unless defined?(::ParseError)
|
|||
ParseError = Racc::ParseError
|
||||
end
|
||||
|
||||
# Racc is a LALR(1) parser generator.
|
||||
# It is written in Ruby itself, and generates Ruby programs.
|
||||
#
|
||||
# == Command-line Reference
|
||||
#
|
||||
# racc [-o<var>filename</var>] [--output-file=<var>filename</var>]
|
||||
# [-e<var>rubypath</var>] [--embedded=<var>rubypath</var>]
|
||||
# [-v] [--verbose]
|
||||
# [-O<var>filename</var>] [--log-file=<var>filename</var>]
|
||||
# [-g] [--debug]
|
||||
# [-E] [--embedded]
|
||||
# [-l] [--no-line-convert]
|
||||
# [-c] [--line-convert-all]
|
||||
# [-a] [--no-omit-actions]
|
||||
# [-C] [--check-only]
|
||||
# [-S] [--output-status]
|
||||
# [--version] [--copyright] [--help] <var>grammarfile</var>
|
||||
#
|
||||
# [+filename+]
|
||||
# Racc grammar file. Any extention is permitted.
|
||||
# [-o+outfile+, --output-file=+outfile+]
|
||||
# A filename for output. default is <+filename+>.tab.rb
|
||||
# [-O+filename+, --log-file=+filename+]
|
||||
# Place logging output in file +filename+.
|
||||
# Default log file name is <+filename+>.output.
|
||||
# [-e+rubypath+, --executable=+rubypath+]
|
||||
# output executable file(mode 755). where +path+ is the ruby interpreter.
|
||||
# [-v, --verbose]
|
||||
# verbose mode. create +filename+.output file, like yacc's y.output file.
|
||||
# [-g, --debug]
|
||||
# add debug code to parser class. To display debuggin information,
|
||||
# use this '-g' option and set @yydebug true in parser class.
|
||||
# [-E, --embedded]
|
||||
# Output parser which doesn't need runtime files (racc/parser.rb).
|
||||
# [-C, --check-only]
|
||||
# Check syntax of racc grammer file and quit.
|
||||
# [-S, --output-status]
|
||||
# Print messages time to time while compiling.
|
||||
# [-l, --no-line-convert]
|
||||
# turns off line number converting.
|
||||
# [-c, --line-convert-all]
|
||||
# Convert line number of actions, inner, header and footer.
|
||||
# [-a, --no-omit-actions]
|
||||
# Call all actions, even if an action is empty.
|
||||
# [--version]
|
||||
# print Racc version and quit.
|
||||
# [--copyright]
|
||||
# Print copyright and quit.
|
||||
# [--help]
|
||||
# Print usage and quit.
|
||||
#
|
||||
# == Generating Parser Using Racc
|
||||
#
|
||||
# To compile Racc grammar file, simply type:
|
||||
#
|
||||
# $ racc parse.y
|
||||
#
|
||||
# This creates ruby script file "parse.tab.y". The -o option can change the output filename.
|
||||
#
|
||||
# == Writing A Racc Grammar File
|
||||
#
|
||||
# If you want your own parser, you have to write a grammar file.
|
||||
# A grammar file contains the name of your parser class, grammar for the parser,
|
||||
# user code, and anything else.
|
||||
# When writing a grammar file, yacc's knowledge is helpful.
|
||||
# If you have not used yacc before, Racc is not too difficult.
|
||||
#
|
||||
# Here's an example Racc grammar file.
|
||||
#
|
||||
# class Calcparser
|
||||
# rule
|
||||
# target: exp { print val[0] }
|
||||
#
|
||||
# exp: exp '+' exp
|
||||
# | exp '*' exp
|
||||
# | '(' exp ')'
|
||||
# | NUMBER
|
||||
# end
|
||||
#
|
||||
# Racc grammar files resemble yacc files.
|
||||
# But (of course), this is Ruby code.
|
||||
# yacc's $$ is the 'result', $0, $1... is
|
||||
# an array called 'val', and $-1, $-2... is an array called '_values'.
|
||||
#
|
||||
# See the {Grammar File Reference}[rdoc-ref:lib/racc/rdoc/grammar.en.rdoc] for
|
||||
# more information on grammar files.
|
||||
#
|
||||
# == Parser
|
||||
#
|
||||
# Then you must prepare the parse entry method. There are two types of
|
||||
# parse methods in Racc, Racc::Parser#do_parse and Racc::Parser#yyparse
|
||||
#
|
||||
# Racc::Parser#do_parse is simple.
|
||||
#
|
||||
# It's yyparse() of yacc, and Racc::Parser#next_token is yylex().
|
||||
# This method must returns an array like [TOKENSYMBOL, ITS_VALUE].
|
||||
# EOF is [false, false].
|
||||
# (TOKENSYMBOL is a Ruby symbol (taken from String#intern) by default.
|
||||
# If you want to change this, see the grammar reference.
|
||||
#
|
||||
# Racc::Parser#yyparse is little complicated, but useful.
|
||||
# It does not use Racc::Parser#next_token, instead it gets tokens from any iterator.
|
||||
#
|
||||
# For example, <code>yyparse(obj, :scan)</code> causes
|
||||
# calling +obj#scan+, and you can return tokens by yielding them from +obj#scan+.
|
||||
#
|
||||
# == Debugging
|
||||
#
|
||||
# When debugging, "-v" or/and the "-g" option is helpful.
|
||||
#
|
||||
# "-v" creates verbose log file (.output).
|
||||
# "-g" creates a "Verbose Parser".
|
||||
# Verbose Parser prints the internal status when parsing.
|
||||
# But it's _not_ automatic.
|
||||
# You must use -g option and set +@yydebug+ to +true+ in order to get output.
|
||||
# -g option only creates the verbose parser.
|
||||
#
|
||||
# === Racc reported syntax error.
|
||||
#
|
||||
# Isn't there too many "end"?
|
||||
# grammar of racc file is changed in v0.10.
|
||||
#
|
||||
# Racc does not use '%' mark, while yacc uses huge number of '%' marks..
|
||||
#
|
||||
# === Racc reported "XXXX conflicts".
|
||||
#
|
||||
# Try "racc -v xxxx.y".
|
||||
# It causes producing racc's internal log file, xxxx.output.
|
||||
#
|
||||
# === Generated parsers does not work correctly
|
||||
#
|
||||
# Try "racc -g xxxx.y".
|
||||
# This command let racc generate "debugging parser".
|
||||
# Then set @yydebug=true in your parser.
|
||||
# It produces a working log of your parser.
|
||||
#
|
||||
# == Re-distributing Racc runtime
|
||||
#
|
||||
# A parser, which is created by Racc, requires the Racc runtime module;
|
||||
# racc/parser.rb.
|
||||
#
|
||||
# Ruby 1.8.x comes with Racc runtime module,
|
||||
# you need NOT distribute Racc runtime files.
|
||||
#
|
||||
# If you want to include the Racc runtime module with your parser.
|
||||
# This can be done by using '-E' option:
|
||||
#
|
||||
# $ racc -E -omyparser.rb myparser.y
|
||||
#
|
||||
# This command creates myparser.rb which `includes' Racc runtime.
|
||||
# Only you must do is to distribute your parser file (myparser.rb).
|
||||
#
|
||||
# Note: parser.rb is LGPL, but your parser is not.
|
||||
# Your own parser is completely yours.
|
||||
module Racc
|
||||
|
||||
unless defined?(Racc_No_Extentions)
|
||||
Racc_No_Extentions = false
|
||||
Racc_No_Extentions = false # :nodoc:
|
||||
end
|
||||
|
||||
class Parser
|
||||
|
@ -42,11 +196,11 @@ module Racc
|
|||
raise LoadError, 'selecting ruby version of racc runtime core'
|
||||
end
|
||||
|
||||
Racc_Main_Parsing_Routine = :_racc_do_parse_c
|
||||
Racc_YY_Parse_Method = :_racc_yyparse_c
|
||||
Racc_Runtime_Core_Version = Racc_Runtime_Core_Version_C
|
||||
Racc_Runtime_Core_Revision = Racc_Runtime_Core_Revision_C
|
||||
Racc_Runtime_Type = 'c'
|
||||
Racc_Main_Parsing_Routine = :_racc_do_parse_c # :nodoc:
|
||||
Racc_YY_Parse_Method = :_racc_yyparse_c # :nodoc:
|
||||
Racc_Runtime_Core_Version = Racc_Runtime_Core_Version_C # :nodoc:
|
||||
Racc_Runtime_Core_Revision = Racc_Runtime_Core_Revision_C # :nodoc:
|
||||
Racc_Runtime_Type = 'c' # :nodoc:
|
||||
rescue LoadError
|
||||
Racc_Main_Parsing_Routine = :_racc_do_parse_rb
|
||||
Racc_YY_Parse_Method = :_racc_yyparse_rb
|
||||
|
@ -55,12 +209,10 @@ module Racc
|
|||
Racc_Runtime_Type = 'ruby'
|
||||
end
|
||||
|
||||
def Parser.racc_runtime_type
|
||||
def Parser.racc_runtime_type # :nodoc:
|
||||
Racc_Runtime_Type
|
||||
end
|
||||
|
||||
private
|
||||
|
||||
def _racc_setup
|
||||
@yydebug = false unless self.class::Racc_debug_parser
|
||||
@yydebug = false unless defined?(@yydebug)
|
||||
|
@ -97,6 +249,14 @@ module Racc
|
|||
end
|
||||
}
|
||||
|
||||
# The method to fetch next token.
|
||||
# If you use #do_parse method, you must implement #next_token.
|
||||
#
|
||||
# The format of return value is [TOKEN_SYMBOL, VALUE].
|
||||
# +token-symbol+ is represented by Ruby's symbol by default, e.g. :IDENT
|
||||
# for 'IDENT'. ";" (String) for ';'.
|
||||
#
|
||||
# The final symbol (End of file) must be false.
|
||||
def next_token
|
||||
raise NotImplementedError, "#{self.class}\#next_token is not defined"
|
||||
end
|
||||
|
@ -343,27 +503,43 @@ module Racc
|
|||
goto_default[k1]
|
||||
end
|
||||
|
||||
# This method is called when a parse error is found.
|
||||
#
|
||||
# ERROR_TOKEN_ID is an internal ID of token which caused error.
|
||||
# You can get string representation of this ID by calling
|
||||
# #token_to_str.
|
||||
#
|
||||
# ERROR_VALUE is a value of error token.
|
||||
#
|
||||
# value_stack is a stack of symbol values.
|
||||
# DO NOT MODIFY this object.
|
||||
#
|
||||
# This method raises ParseError by default.
|
||||
#
|
||||
# If this method returns, parsers enter "error recovering mode".
|
||||
def on_error(t, val, vstack)
|
||||
raise ParseError, sprintf("\nparse error on value %s (%s)",
|
||||
val.inspect, token_to_str(t) || '?')
|
||||
end
|
||||
|
||||
# Enter error recovering mode.
|
||||
# This method does not call #on_error.
|
||||
def yyerror
|
||||
throw :racc_jump, 1
|
||||
end
|
||||
|
||||
# Exit parser.
|
||||
# Return value is Symbol_Value_Stack[0].
|
||||
def yyaccept
|
||||
throw :racc_jump, 2
|
||||
end
|
||||
|
||||
# Leave error recovering mode.
|
||||
def yyerrok
|
||||
@racc_error_status = 0
|
||||
end
|
||||
|
||||
#
|
||||
# for debugging output
|
||||
#
|
||||
|
||||
# For debugging output
|
||||
def racc_read_token(t, tok, val)
|
||||
@racc_debug_out.print 'read '
|
||||
@racc_debug_out.print tok.inspect, '(', racc_token2str(t), ') '
|
||||
|
@ -430,6 +606,7 @@ module Racc
|
|||
raise "[Racc Bug] can't convert token #{tok} to string"
|
||||
end
|
||||
|
||||
# Convert internal ID of token symbol to the string.
|
||||
def token_to_str(t)
|
||||
self.class::Racc_token_to_s_table[t]
|
||||
end
|
||||
|
|
226
lib/racc/rdoc/grammar.en.rdoc
Normal file
226
lib/racc/rdoc/grammar.en.rdoc
Normal file
|
@ -0,0 +1,226 @@
|
|||
= Racc Grammar File Reference
|
||||
|
||||
== Global Structure
|
||||
|
||||
== Class Block and User Code Block
|
||||
|
||||
There's two block on toplevel.
|
||||
one is 'class' block, another is 'user code' block. 'user code' block MUST
|
||||
places after 'class' block.
|
||||
|
||||
== Comment
|
||||
|
||||
You can insert comment about all places. Two style comment can be used,
|
||||
Ruby style (#.....) and C style (/*......*/) .
|
||||
|
||||
== Class Block
|
||||
|
||||
The class block is formed like this:
|
||||
|
||||
class CLASS_NAME
|
||||
[precedance table]
|
||||
[token declearations]
|
||||
[expected number of S/R conflict]
|
||||
[options]
|
||||
[semantic value convertion]
|
||||
[start rule]
|
||||
rule
|
||||
GRAMMARS
|
||||
|
||||
CLASS_NAME is a name of parser class.
|
||||
This is the name of generating parser class.
|
||||
|
||||
If CLASS_NAME includes '::', Racc outputs module clause.
|
||||
For example, writing "class M::C" causes creating the code bellow:
|
||||
|
||||
module M
|
||||
class C
|
||||
:
|
||||
:
|
||||
end
|
||||
end
|
||||
|
||||
== Grammar Block
|
||||
|
||||
The grammar block discripts grammar which is able
|
||||
to be understood by parser. Syntax is:
|
||||
|
||||
(token): (token) (token) (token).... (action)
|
||||
|
||||
(token): (token) (token) (token).... (action)
|
||||
| (token) (token) (token).... (action)
|
||||
| (token) (token) (token).... (action)
|
||||
|
||||
(action) is an action which is executed when its (token)s are found.
|
||||
(action) is a ruby code block, which is surrounded by braces:
|
||||
|
||||
{ print val[0]
|
||||
puts val[1] }
|
||||
|
||||
Note that you cannot use '%' string, here document, '%r' regexp in action.
|
||||
|
||||
Actions can be omitted.
|
||||
When it is omitted, '' (empty string) is used.
|
||||
|
||||
A return value of action is a value of left side value ($$).
|
||||
It is value of result, or returned value by "return" statement.
|
||||
|
||||
Here is an example of whole grammar block.
|
||||
|
||||
rule
|
||||
goal: definition ruls source { result = val }
|
||||
|
||||
definition: /* none */ { result = [] }
|
||||
| definition startdesig { result[0] = val[1] }
|
||||
| definition
|
||||
precrule # this line continue from upper line
|
||||
{
|
||||
result[1] = val[1]
|
||||
}
|
||||
|
||||
startdesig: START TOKEN
|
||||
|
||||
You can use following special local variables in action.
|
||||
|
||||
* result ($$)
|
||||
|
||||
The value of left-hand side (lhs). A default value is val[0].
|
||||
|
||||
* val ($1,$2,$3...)
|
||||
|
||||
An array of value of right-hand side (rhs).
|
||||
|
||||
* _values (...$-2,$-1,$0)
|
||||
|
||||
A stack of values.
|
||||
DO NOT MODIFY this stack unless you know what you are doing.
|
||||
|
||||
== Operator Precedance
|
||||
|
||||
This function is equal to '%prec' in yacc.
|
||||
To designate this block:
|
||||
|
||||
prechigh
|
||||
nonassoc '++'
|
||||
left '*' '/'
|
||||
left '+' '-'
|
||||
right '='
|
||||
preclow
|
||||
|
||||
`right' is yacc's %right, `left' is yacc's %left.
|
||||
|
||||
`=' + (symbol) means yacc's %prec:
|
||||
|
||||
prechigh
|
||||
nonassoc UMINUS
|
||||
left '*' '/'
|
||||
left '+' '-'
|
||||
preclow
|
||||
|
||||
rule
|
||||
exp: exp '*' exp
|
||||
| exp '-' exp
|
||||
| '-' exp =UMINUS # equals to "%prec UMINUS"
|
||||
:
|
||||
:
|
||||
|
||||
== expect
|
||||
|
||||
Racc has bison's "expect" directive.
|
||||
|
||||
# Example
|
||||
|
||||
class MyParser
|
||||
rule
|
||||
expect 3
|
||||
:
|
||||
:
|
||||
|
||||
This directive declears "expected" number of shift/reduce conflict.
|
||||
If "expected" number is equal to real number of conflicts,
|
||||
racc does not print confliction warning message.
|
||||
|
||||
== Declaring Tokens
|
||||
|
||||
By declaring tokens, you can avoid many meanless bugs.
|
||||
If decleared token does not exist/existing token does not decleared,
|
||||
Racc output warnings. Declearation syntax is:
|
||||
|
||||
token TOKEN_NAME AND_IS_THIS
|
||||
ALSO_THIS_IS AGAIN_AND_AGAIN THIS_IS_LAST
|
||||
|
||||
== Options
|
||||
|
||||
You can write options for racc command in your racc file.
|
||||
|
||||
options OPTION OPTION ...
|
||||
|
||||
Options are:
|
||||
|
||||
* omit_action_call
|
||||
|
||||
omit empty action call or not.
|
||||
|
||||
* result_var
|
||||
|
||||
use/does not use local variable "result"
|
||||
|
||||
You can use 'no_' prefix to invert its meanings.
|
||||
|
||||
== Converting Token Symbol
|
||||
|
||||
Token symbols are, as default,
|
||||
|
||||
* naked token string in racc file (TOK, XFILE, this_is_token, ...)
|
||||
--> symbol (:TOK, :XFILE, :this_is_token, ...)
|
||||
* quoted string (':', '.', '(', ...)
|
||||
--> same string (':', '.', '(', ...)
|
||||
|
||||
You can change this default by "convert" block.
|
||||
Here is an example:
|
||||
|
||||
convert
|
||||
PLUS 'PlusClass' # We use PlusClass for symbol of `PLUS'
|
||||
MIN 'MinusClass' # We use MinusClass for symbol of `MIN'
|
||||
end
|
||||
|
||||
We can use almost all ruby value can be used by token symbol,
|
||||
except 'false' and 'nil'. These are causes unexpected parse error.
|
||||
|
||||
If you want to use String as token symbol, special care is required.
|
||||
For example:
|
||||
|
||||
convert
|
||||
class '"cls"' # in code, "cls"
|
||||
PLUS '"plus\n"' # in code, "plus\n"
|
||||
MIN "\"minus#{val}\"" # in code, \"minus#{val}\"
|
||||
end
|
||||
|
||||
== Start Rule
|
||||
|
||||
'%start' in yacc. This changes start rule.
|
||||
|
||||
start real_target
|
||||
|
||||
This statement will not be used forever, I think.
|
||||
|
||||
== User Code Block
|
||||
|
||||
"User Code Block" is a Ruby source code which is copied to output.
|
||||
There are three user code block, "header" "inner" and "footer".
|
||||
|
||||
Format of user code is like this:
|
||||
|
||||
---- header
|
||||
ruby statement
|
||||
ruby statement
|
||||
ruby statement
|
||||
|
||||
---- inner
|
||||
ruby statement
|
||||
:
|
||||
:
|
||||
|
||||
If four '-' exist on line head,
|
||||
racc treat it as beginning of user code block.
|
||||
A name of user code must be one word.
|
Loading…
Add table
Reference in a new issue