1
0
Fork 0
mirror of https://github.com/ruby/ruby.git synced 2022-11-09 12:17:21 -05:00

* lib/racc: Merge Racc documentation downstream, add grammar ref file

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@39050 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit is contained in:
zzak 2013-02-05 00:56:11 +00:00
parent f11ff34d8f
commit 941ea3713f
3 changed files with 420 additions and 13 deletions

View file

@ -1,3 +1,7 @@
Tue Feb 5 09:55:00 2013 Zachary Scott <zachary@zacharyscott.net>
* lib/racc: Merge Racc documentation downstream, add grammar ref file
Tue Feb 5 08:03:00 2013 Zachary Scott <zachary@zacharyscott.net>
* lib/irb.rb, lib/irb/ext/save-history.rb: Add documentation on how to

View file

@ -18,10 +18,164 @@ unless defined?(::ParseError)
ParseError = Racc::ParseError
end
# Racc is a LALR(1) parser generator.
# It is written in Ruby itself, and generates Ruby programs.
#
# == Command-line Reference
#
# racc [-o<var>filename</var>] [--output-file=<var>filename</var>]
# [-e<var>rubypath</var>] [--embedded=<var>rubypath</var>]
# [-v] [--verbose]
# [-O<var>filename</var>] [--log-file=<var>filename</var>]
# [-g] [--debug]
# [-E] [--embedded]
# [-l] [--no-line-convert]
# [-c] [--line-convert-all]
# [-a] [--no-omit-actions]
# [-C] [--check-only]
# [-S] [--output-status]
# [--version] [--copyright] [--help] <var>grammarfile</var>
#
# [+filename+]
# Racc grammar file. Any extention is permitted.
# [-o+outfile+, --output-file=+outfile+]
# A filename for output. default is <+filename+>.tab.rb
# [-O+filename+, --log-file=+filename+]
# Place logging output in file +filename+.
# Default log file name is <+filename+>.output.
# [-e+rubypath+, --executable=+rubypath+]
# output executable file(mode 755). where +path+ is the ruby interpreter.
# [-v, --verbose]
# verbose mode. create +filename+.output file, like yacc's y.output file.
# [-g, --debug]
# add debug code to parser class. To display debuggin information,
# use this '-g' option and set @yydebug true in parser class.
# [-E, --embedded]
# Output parser which doesn't need runtime files (racc/parser.rb).
# [-C, --check-only]
# Check syntax of racc grammer file and quit.
# [-S, --output-status]
# Print messages time to time while compiling.
# [-l, --no-line-convert]
# turns off line number converting.
# [-c, --line-convert-all]
# Convert line number of actions, inner, header and footer.
# [-a, --no-omit-actions]
# Call all actions, even if an action is empty.
# [--version]
# print Racc version and quit.
# [--copyright]
# Print copyright and quit.
# [--help]
# Print usage and quit.
#
# == Generating Parser Using Racc
#
# To compile Racc grammar file, simply type:
#
# $ racc parse.y
#
# This creates ruby script file "parse.tab.y". The -o option can change the output filename.
#
# == Writing A Racc Grammar File
#
# If you want your own parser, you have to write a grammar file.
# A grammar file contains the name of your parser class, grammar for the parser,
# user code, and anything else.
# When writing a grammar file, yacc's knowledge is helpful.
# If you have not used yacc before, Racc is not too difficult.
#
# Here's an example Racc grammar file.
#
# class Calcparser
# rule
# target: exp { print val[0] }
#
# exp: exp '+' exp
# | exp '*' exp
# | '(' exp ')'
# | NUMBER
# end
#
# Racc grammar files resemble yacc files.
# But (of course), this is Ruby code.
# yacc's $$ is the 'result', $0, $1... is
# an array called 'val', and $-1, $-2... is an array called '_values'.
#
# See the {Grammar File Reference}[rdoc-ref:lib/racc/rdoc/grammar.en.rdoc] for
# more information on grammar files.
#
# == Parser
#
# Then you must prepare the parse entry method. There are two types of
# parse methods in Racc, Racc::Parser#do_parse and Racc::Parser#yyparse
#
# Racc::Parser#do_parse is simple.
#
# It's yyparse() of yacc, and Racc::Parser#next_token is yylex().
# This method must returns an array like [TOKENSYMBOL, ITS_VALUE].
# EOF is [false, false].
# (TOKENSYMBOL is a Ruby symbol (taken from String#intern) by default.
# If you want to change this, see the grammar reference.
#
# Racc::Parser#yyparse is little complicated, but useful.
# It does not use Racc::Parser#next_token, instead it gets tokens from any iterator.
#
# For example, <code>yyparse(obj, :scan)</code> causes
# calling +obj#scan+, and you can return tokens by yielding them from +obj#scan+.
#
# == Debugging
#
# When debugging, "-v" or/and the "-g" option is helpful.
#
# "-v" creates verbose log file (.output).
# "-g" creates a "Verbose Parser".
# Verbose Parser prints the internal status when parsing.
# But it's _not_ automatic.
# You must use -g option and set +@yydebug+ to +true+ in order to get output.
# -g option only creates the verbose parser.
#
# === Racc reported syntax error.
#
# Isn't there too many "end"?
# grammar of racc file is changed in v0.10.
#
# Racc does not use '%' mark, while yacc uses huge number of '%' marks..
#
# === Racc reported "XXXX conflicts".
#
# Try "racc -v xxxx.y".
# It causes producing racc's internal log file, xxxx.output.
#
# === Generated parsers does not work correctly
#
# Try "racc -g xxxx.y".
# This command let racc generate "debugging parser".
# Then set @yydebug=true in your parser.
# It produces a working log of your parser.
#
# == Re-distributing Racc runtime
#
# A parser, which is created by Racc, requires the Racc runtime module;
# racc/parser.rb.
#
# Ruby 1.8.x comes with Racc runtime module,
# you need NOT distribute Racc runtime files.
#
# If you want to include the Racc runtime module with your parser.
# This can be done by using '-E' option:
#
# $ racc -E -omyparser.rb myparser.y
#
# This command creates myparser.rb which `includes' Racc runtime.
# Only you must do is to distribute your parser file (myparser.rb).
#
# Note: parser.rb is LGPL, but your parser is not.
# Your own parser is completely yours.
module Racc
unless defined?(Racc_No_Extentions)
Racc_No_Extentions = false
Racc_No_Extentions = false # :nodoc:
end
class Parser
@ -42,11 +196,11 @@ module Racc
raise LoadError, 'selecting ruby version of racc runtime core'
end
Racc_Main_Parsing_Routine = :_racc_do_parse_c
Racc_YY_Parse_Method = :_racc_yyparse_c
Racc_Runtime_Core_Version = Racc_Runtime_Core_Version_C
Racc_Runtime_Core_Revision = Racc_Runtime_Core_Revision_C
Racc_Runtime_Type = 'c'
Racc_Main_Parsing_Routine = :_racc_do_parse_c # :nodoc:
Racc_YY_Parse_Method = :_racc_yyparse_c # :nodoc:
Racc_Runtime_Core_Version = Racc_Runtime_Core_Version_C # :nodoc:
Racc_Runtime_Core_Revision = Racc_Runtime_Core_Revision_C # :nodoc:
Racc_Runtime_Type = 'c' # :nodoc:
rescue LoadError
Racc_Main_Parsing_Routine = :_racc_do_parse_rb
Racc_YY_Parse_Method = :_racc_yyparse_rb
@ -55,12 +209,10 @@ module Racc
Racc_Runtime_Type = 'ruby'
end
def Parser.racc_runtime_type
def Parser.racc_runtime_type # :nodoc:
Racc_Runtime_Type
end
private
def _racc_setup
@yydebug = false unless self.class::Racc_debug_parser
@yydebug = false unless defined?(@yydebug)
@ -97,6 +249,14 @@ module Racc
end
}
# The method to fetch next token.
# If you use #do_parse method, you must implement #next_token.
#
# The format of return value is [TOKEN_SYMBOL, VALUE].
# +token-symbol+ is represented by Ruby's symbol by default, e.g. :IDENT
# for 'IDENT'. ";" (String) for ';'.
#
# The final symbol (End of file) must be false.
def next_token
raise NotImplementedError, "#{self.class}\#next_token is not defined"
end
@ -343,27 +503,43 @@ module Racc
goto_default[k1]
end
# This method is called when a parse error is found.
#
# ERROR_TOKEN_ID is an internal ID of token which caused error.
# You can get string representation of this ID by calling
# #token_to_str.
#
# ERROR_VALUE is a value of error token.
#
# value_stack is a stack of symbol values.
# DO NOT MODIFY this object.
#
# This method raises ParseError by default.
#
# If this method returns, parsers enter "error recovering mode".
def on_error(t, val, vstack)
raise ParseError, sprintf("\nparse error on value %s (%s)",
val.inspect, token_to_str(t) || '?')
end
# Enter error recovering mode.
# This method does not call #on_error.
def yyerror
throw :racc_jump, 1
end
# Exit parser.
# Return value is Symbol_Value_Stack[0].
def yyaccept
throw :racc_jump, 2
end
# Leave error recovering mode.
def yyerrok
@racc_error_status = 0
end
#
# for debugging output
#
# For debugging output
def racc_read_token(t, tok, val)
@racc_debug_out.print 'read '
@racc_debug_out.print tok.inspect, '(', racc_token2str(t), ') '
@ -430,6 +606,7 @@ module Racc
raise "[Racc Bug] can't convert token #{tok} to string"
end
# Convert internal ID of token symbol to the string.
def token_to_str(t)
self.class::Racc_token_to_s_table[t]
end

View file

@ -0,0 +1,226 @@
= Racc Grammar File Reference
== Global Structure
== Class Block and User Code Block
There's two block on toplevel.
one is 'class' block, another is 'user code' block. 'user code' block MUST
places after 'class' block.
== Comment
You can insert comment about all places. Two style comment can be used,
Ruby style (#.....) and C style (/*......*/) .
== Class Block
The class block is formed like this:
class CLASS_NAME
[precedance table]
[token declearations]
[expected number of S/R conflict]
[options]
[semantic value convertion]
[start rule]
rule
GRAMMARS
CLASS_NAME is a name of parser class.
This is the name of generating parser class.
If CLASS_NAME includes '::', Racc outputs module clause.
For example, writing "class M::C" causes creating the code bellow:
module M
class C
:
:
end
end
== Grammar Block
The grammar block discripts grammar which is able
to be understood by parser. Syntax is:
(token): (token) (token) (token).... (action)
(token): (token) (token) (token).... (action)
| (token) (token) (token).... (action)
| (token) (token) (token).... (action)
(action) is an action which is executed when its (token)s are found.
(action) is a ruby code block, which is surrounded by braces:
{ print val[0]
puts val[1] }
Note that you cannot use '%' string, here document, '%r' regexp in action.
Actions can be omitted.
When it is omitted, '' (empty string) is used.
A return value of action is a value of left side value ($$).
It is value of result, or returned value by "return" statement.
Here is an example of whole grammar block.
rule
goal: definition ruls source { result = val }
definition: /* none */ { result = [] }
| definition startdesig { result[0] = val[1] }
| definition
precrule # this line continue from upper line
{
result[1] = val[1]
}
startdesig: START TOKEN
You can use following special local variables in action.
* result ($$)
The value of left-hand side (lhs). A default value is val[0].
* val ($1,$2,$3...)
An array of value of right-hand side (rhs).
* _values (...$-2,$-1,$0)
A stack of values.
DO NOT MODIFY this stack unless you know what you are doing.
== Operator Precedance
This function is equal to '%prec' in yacc.
To designate this block:
prechigh
nonassoc '++'
left '*' '/'
left '+' '-'
right '='
preclow
`right' is yacc's %right, `left' is yacc's %left.
`=' + (symbol) means yacc's %prec:
prechigh
nonassoc UMINUS
left '*' '/'
left '+' '-'
preclow
rule
exp: exp '*' exp
| exp '-' exp
| '-' exp =UMINUS # equals to "%prec UMINUS"
:
:
== expect
Racc has bison's "expect" directive.
# Example
class MyParser
rule
expect 3
:
:
This directive declears "expected" number of shift/reduce conflict.
If "expected" number is equal to real number of conflicts,
racc does not print confliction warning message.
== Declaring Tokens
By declaring tokens, you can avoid many meanless bugs.
If decleared token does not exist/existing token does not decleared,
Racc output warnings. Declearation syntax is:
token TOKEN_NAME AND_IS_THIS
ALSO_THIS_IS AGAIN_AND_AGAIN THIS_IS_LAST
== Options
You can write options for racc command in your racc file.
options OPTION OPTION ...
Options are:
* omit_action_call
omit empty action call or not.
* result_var
use/does not use local variable "result"
You can use 'no_' prefix to invert its meanings.
== Converting Token Symbol
Token symbols are, as default,
* naked token string in racc file (TOK, XFILE, this_is_token, ...)
--> symbol (:TOK, :XFILE, :this_is_token, ...)
* quoted string (':', '.', '(', ...)
--> same string (':', '.', '(', ...)
You can change this default by "convert" block.
Here is an example:
convert
PLUS 'PlusClass' # We use PlusClass for symbol of `PLUS'
MIN 'MinusClass' # We use MinusClass for symbol of `MIN'
end
We can use almost all ruby value can be used by token symbol,
except 'false' and 'nil'. These are causes unexpected parse error.
If you want to use String as token symbol, special care is required.
For example:
convert
class '"cls"' # in code, "cls"
PLUS '"plus\n"' # in code, "plus\n"
MIN "\"minus#{val}\"" # in code, \"minus#{val}\"
end
== Start Rule
'%start' in yacc. This changes start rule.
start real_target
This statement will not be used forever, I think.
== User Code Block
"User Code Block" is a Ruby source code which is copied to output.
There are three user code block, "header" "inner" and "footer".
Format of user code is like this:
---- header
ruby statement
ruby statement
ruby statement
---- inner
ruby statement
:
:
If four '-' exist on line head,
racc treat it as beginning of user code block.
A name of user code must be one word.