.\" README.EXT -  -*- Text -*- created at: Mon Aug  7 16:45:54 JST 1995

This document explains how to make extension libraries for Ruby.

1. Basic knowledge

In C, variables have types and data do not have types.  In contrast,
Ruby variables do not have a static type, and data themselves have
types, so data will need to be converted between the languages.

Data in Ruby are represented by the C type `VALUE'.  Each VALUE data
has its data-type.

To retrieve C data from a VALUE, you need to:

 (1) Identify the VALUE's data type
 (2) Convert the VALUE into C data

Converting to the wrong data type may cause serious problems.


1.1 Data-types

The Ruby interpreter has the following data types:

	T_NIL		nil
	T_OBJECT	ordinary object
	T_CLASS		class
	T_MODULE	module
	T_FLOAT		floating point number
	T_STRING	string
	T_REGEXP	regular expression
	T_ARRAY		array
	T_HASH		associative array
	T_STRUCT	(Ruby) structure
	T_BIGNUM	multi precision integer
	T_FIXNUM	Fixnum(31bit or 63bit integer)
	T_COMPLEX       complex number
	T_RATIONAL      rational number
	T_FILE		IO
	T_TRUE		true
	T_FALSE		false
	T_DATA		data
	T_SYMBOL        symbol

In addition, there are several other types used internally:

	T_ICLASS
	T_MATCH
	T_UNDEF
	T_NODE
	T_ZOMBIE

Most of the types are represented by C structures.

1.2 Check Data Type of the VALUE

The macro TYPE() defined in ruby.h shows the data type of the VALUE.
TYPE() returns the constant number T_XXXX described above.  To handle
data types, your code will look something like this:

  switch (TYPE(obj)) {
    case T_FIXNUM:
      /* process Fixnum */
      break;
    case T_STRING:
      /* process String */
      break;
    case T_ARRAY:
      /* process Array */
      break;
    default:
      /* raise exception */
      rb_raise(rb_eTypeError, "not valid value");
      break;
  }

There is the data-type check function

  void Check_Type(VALUE value, int type)

which raises an exception if the VALUE does not have the type
specified.

There are also faster check macros for fixnums and nil.

  FIXNUM_P(obj)
  NIL_P(obj)

1.3 Convert VALUE into C data

The data for type T_NIL, T_FALSE, T_TRUE are nil, false, true
respectively.  They are singletons for the data type.
The equivalent C constants are: Qnil, Qfalse, Qtrue.
Note that Qfalse is false in C also (i.e. 0), but not Qnil.

The T_FIXNUM data is a 31bit or 63bit length fixed integer.
This size is depend on the size of long: if long is 32bit then
T_FIXNUM is 31bit, if long is 64bit then T_FIXNUM is 63bit.
T_FIXNUM can be converted to a C integer by using the
FIX2INT() macro or FIX2LONG().  Though you have to check that the
data is really FIXNUM before using them, they are faster.  FIX2LONG()
never raises exceptions, but FIX2INT() raises RangeError if the
result is bigger or smaller than the size of int.
There are also NUM2INT() and NUM2LONG() which converts any Ruby
numbers into C integers.  These macros includes a type check,
so an exception will be raised if the conversion failed.  NUM2DBL()
can be used to retrieve the double float value in the same way.

You can use the macros
StringValue() and StringValuePtr() to get a char* from a VALUE.
StringValue(var) replaces var's value with the result of "var.to_str()".
StringValuePtr(var) does same replacement and returns char*
representation of var.  These macros will skip the replacement if var
is a String.  Notice that the macros take only the lvalue as their
argument, to change the value of var in place.

You can also use the macro named StringValueCStr(). This is just
like StringValuePtr(), but always add nul character at the end of
the result. If the result contains nul character, this macro causes
the ArgumentError exception.
StringValuePtr() doesn't guarantee the existence of a nul at the end
of the result, and the result may contain nul.

Other data types have corresponding C structures, e.g. struct RArray
for T_ARRAY etc. The VALUE of the type which has the corresponding
structure can be cast to retrieve the pointer to the struct.  The
casting macro will be of the form RXXXX for each data type; for
instance, RARRAY(obj).  See "ruby.h".

There are some accessing macros for structure members, for example
`RSTRING_LEN(str)' to get the size of the Ruby String object.  The
allocated region can be accessed by `RSTRING_PTR(str)'.  For arrays,
use `RARRAY_LEN(ary)' and `RARRAY_PTR(ary)' respectively.

Notice: Do not change the value of the structure directly, unless you
are responsible for the result.  This ends up being the cause of
interesting bugs.

1.4 Convert C data into VALUE

To convert C data to Ruby values:

  * FIXNUM

    left shift 1 bit, and turn on LSB.

  * Other pointer values

    cast to VALUE.

You can determine whether a VALUE is pointer or not by checking its LSB.

Notice Ruby does not allow arbitrary pointer values to be a VALUE.  They
should be pointers to the structures which Ruby knows about.  The known
structures are defined in <ruby.h>.

To convert C numbers to Ruby values, use these macros.

  INT2FIX()	for integers within 31bits.
  INT2NUM()	for arbitrary sized integer.

INT2NUM() converts an integer into a Bignum if it is out of the FIXNUM
range, but is a bit slower.

1.5 Manipulating Ruby data

As I already mentioned, it is not recommended to modify an object's
internal structure.  To manipulate objects, use the functions supplied
by the Ruby interpreter. Some (not all) of the useful functions are
listed below:

 String functions

  rb_str_new(const char *ptr, long len)

    Creates a new Ruby string.

  rb_str_new2(const char *ptr)
  rb_str_new_cstr(const char *ptr)

    Creates a new Ruby string from a C string.  This is equivalent to
    rb_str_new(ptr, strlen(ptr)).

  rb_tainted_str_new(const char *ptr, long len)

    Creates a new tainted Ruby string.  Strings from external data
    sources should be tainted.

  rb_tainted_str_new2(const char *ptr)
  rb_tainted_str_new_cstr(const char *ptr)

    Creates a new tainted Ruby string from a C string.

  rb_sprintf(const char *format, ...)
  rb_vsprintf(const char *format, va_list ap)

    Creates a new Ruby string with printf(3) format.

  rb_str_cat(VALUE str, const char *ptr, long len)

    Appends len bytes of data from ptr to the Ruby string.

  rb_str_cat2(VALUE str, const char* ptr)

    Appends C string ptr to Ruby string str.  This function is
    equivalent to rb_str_cat(str, ptr, strlen(ptr)).

  rb_str_catf(VALUE str, const char* format, ...)
  rb_str_vcatf(VALUE str, const char* format, va_list ap)

    Appends C string format and successive arguments to Ruby string
    str according to a printf-like format.  These functions are
    equivalent to rb_str_cat2(str, rb_sprintf(format, ...)) and
    rb_str_cat2(str, rb_vsprintf(format, ap)), respectively.

  rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)

    Creates a new Ruby string with the specified encoding.

  rb_usascii_str_new(const char *ptr, long len)
  rb_usascii_str_new_cstr(const char *ptr)

    Creates a new Ruby string with encoding US-ASCII.

  rb_str_resize(VALUE str, long len)

    Resizes Ruby string to len bytes.  If str is not modifiable, this
    function raises an exception.  The length of str must be set in
    advance.  If len is less than the old length the content beyond
    len bytes is discarded, else if len is greater than the old length
    the content beyond the old length bytes will not be preserved but
    will be garbage.  Note that RSTRING_PTR(str) may change by calling
    this function.

  rb_str_set_len(VALUE str, long len)

    Sets the length of Ruby string.  If str is not modifiable, this
    function raises an exception.  This function preserves the content
    upto len bytes, regardless RSTRING_LEN(str).  len must not exceed
    the capacity of str.

 Array functions

  rb_ary_new()

    Creates an array with no elements.

  rb_ary_new2(long len)

    Creates an array with no elements, allocating internal buffer
    for len elements.

  rb_ary_new3(long n, ...)

    Creates an n-element array from the arguments.

  rb_ary_new4(long n, VALUE *elts)

    Creates an n-element array from a C array.

  rb_ary_to_ary(VALUE obj)

    Converts the object into an array.
    Equivalent to Object#to_ary.

 There are many functions to operate an array.
 They may dump core if other types are given.

  rb_ary_aref(argc, VALUE *argv, VALUE ary)

    Equivaelent to Array#[].

  rb_ary_entry(VALUE ary, long offset)

    ary[offset]

  rb_ary_subseq(VALUE ary, long beg, long len)

    ary[beg, len]

  rb_ary_push(VALUE ary, VALUE val)
  rb_ary_pop(VALUE ary)
  rb_ary_shift(VALUE ary)
  rb_ary_unshift(VALUE ary, VALUE val)

  rb_ary_cat(VALUE ary, const VALUE *ptr, long len)

    Appends len elements of objects from ptr to the array.

2. Extending Ruby with C

2.1 Adding new features to Ruby

You can add new features (classes, methods, etc.) to the Ruby
interpreter.  Ruby provides APIs for defining the following things:

 * Classes, Modules
 * Methods, Singleton Methods
 * Constants

2.1.1 Class/module definition

To define a class or module, use the functions below:

  VALUE rb_define_class(const char *name, VALUE super)
  VALUE rb_define_module(const char *name)

These functions return the newly created class or module.  You may
want to save this reference into a variable to use later.

To define nested classes or modules, use the functions below:

  VALUE rb_define_class_under(VALUE outer, const char *name, VALUE super)
  VALUE rb_define_module_under(VALUE outer, const char *name)

2.1.2 Method/singleton method definition

To define methods or singleton methods, use these functions:

  void rb_define_method(VALUE klass, const char *name,
		        VALUE (*func)(), int argc)

  void rb_define_singleton_method(VALUE object, const char *name,
			          VALUE (*func)(), int argc)

The `argc' represents the number of the arguments to the C function,
which must be less than 17.  But I doubt you'll need that many.

If `argc' is negative, it specifies the calling sequence, not number of
the arguments.

If argc is -1, the function will be called as:

  VALUE func(int argc, VALUE *argv, VALUE obj)

where argc is the actual number of arguments, argv is the C array of
the arguments, and obj is the receiver.

If argc is -2, the arguments are passed in a Ruby array. The function
will be called like:

  VALUE func(VALUE obj, VALUE args)

where obj is the receiver, and args is the Ruby array containing
actual arguments.

There are some more functions to define methods. One takes an ID
as the name of method to be defined. See 2.2.2 for IDs.

  void rb_define_method_id(VALUE klass, ID name,
                           VALUE (*func)(ANYARGS), int argc)

There are two functions to define private/protected methods:

  void rb_define_private_method(VALUE klass, const char *name,
			        VALUE (*func)(), int argc)
  void rb_define_protected_method(VALUE klass, const char *name,
			          VALUE (*func)(), int argc)

At last, rb_define_module_function defines a module functions,
which are private AND singleton methods of the module.
For example, sqrt is the module function defined in Math module.
It can be called in the following way:

  Math.sqrt(4)

or

  include Math
  sqrt(4)

To define module functions, use:

  void rb_define_module_function(VALUE module, const char *name,
				 VALUE (*func)(), int argc)

In addition, function-like methods, which are private methods defined
in the Kernel module, can be defined using:

  void rb_define_global_function(const char *name, VALUE (*func)(), int argc)

To define an alias for the method,

  void rb_define_alias(VALUE module, const char* new, const char* old);

To define a reader/writer for an attribute,

  void rb_define_attr(VALUE klass, const char *name, int read, int write)

To define and undefine the `allocate' class method,

  void rb_define_alloc_func(VALUE klass, VALUE (*func)(VALUE klass));
  void rb_undef_alloc_func(VALUE klass);

func has to take the klass as the argument and return a newly
allocated instance.  This instance should be as empty as possible,
without any expensive (including external) resources.

2.1.3 Constant definition

We have 2 functions to define constants:

  void rb_define_const(VALUE klass, const char *name, VALUE val)
  void rb_define_global_const(const char *name, VALUE val)

The former is to define a constant under specified class/module.  The
latter is to define a global constant.

2.2 Use Ruby features from C

There are several ways to invoke Ruby's features from C code.

2.2.1 Evaluate Ruby Programs in a String

The easiest way to use Ruby's functionality from a C program is to
evaluate the string as Ruby program.  This function will do the job:

  VALUE rb_eval_string(const char *str)

Evaluation is done under the current context, thus current local variables
of the innermost method (which is defined by Ruby) can be accessed.

Note that the evaluation can raise an exception. There is a safer
function:

  VALUE rb_eval_string_protect(const char *str, int *state)

It returns nil when an error occur. Moreover, *state is zero if str was
successfully evaluated, or nonzero otherwise.


2.2.2 ID or Symbol

You can invoke methods directly, without parsing the string.  First I
need to explain about ID.  ID is the integer number to represent
Ruby's identifiers such as variable names.  The Ruby data type
corresponding to ID is Symbol.  It can be accessed from Ruby in the
form:

 :Identifier
or
 :"any kind of string"

You can get the ID value from a string within C code by using

  rb_intern(const char *name)
  rb_intern_str(VALUE name)

You can retrieve ID from Ruby object (Symbol or String) given as an
argument by using

  rb_to_id(VALUE symbol)
  rb_check_id(volatile VALUE *name)
  rb_check_id_cstr(const char *name, long len, rb_encoding *enc)

These functions try to convert the argument to a String if it was not
a Symbol nor a String.  The second function stores the converted
result into *name, and returns 0 if the string is not a known symbol.
After this function returned a non-zero value, *name is always a
Symbol or a String, otherwise it is a String if the result is 0.
The third function takes NUL-terminated C string, not Ruby VALUE.

You can convert C ID to Ruby Symbol by using

  VALUE ID2SYM(ID id)

and to convert Ruby Symbol object to ID, use

  ID SYM2ID(VALUE symbol)

2.2.3 Invoke Ruby method from C

To invoke methods directly, you can use the function below

  VALUE rb_funcall(VALUE recv, ID mid, int argc, ...)

This function invokes a method on the recv, with the method name
specified by the symbol mid.

2.2.4 Accessing the variables and constants

You can access class variables and instance variables using access
functions.  Also, global variables can be shared between both
environments.  There's no way to access Ruby's local variables.

The functions to access/modify instance variables are below:

  VALUE rb_ivar_get(VALUE obj, ID id)
  VALUE rb_ivar_set(VALUE obj, ID id, VALUE val)

id must be the symbol, which can be retrieved by rb_intern().

To access the constants of the class/module:

  VALUE rb_const_get(VALUE obj, ID id)

See 2.1.3 for defining new constant.

3. Information sharing between Ruby and C

3.1 Ruby constants that C can be accessed from C

As stated in section 1.3,
the following Ruby constants can be referred from C.

  Qtrue
  Qfalse

Boolean values.  Qfalse is false in C also (i.e. 0).

  Qnil

Ruby nil in C scope.

3.2 Global variables shared between C and Ruby

Information can be shared between the two environments using shared global
variables.  To define them, you can use functions listed below:

  void rb_define_variable(const char *name, VALUE *var)

This function defines the variable which is shared by both environments.
The value of the global variable pointed to by `var' can be accessed
through Ruby's global variable named `name'.

You can define read-only (from Ruby, of course) variables using the
function below.

  void rb_define_readonly_variable(const char *name, VALUE *var)

You can defined hooked variables.  The accessor functions (getter and
setter) are called on access to the hooked variables.

  void rb_define_hooked_variable(const char *name, VALUE *var,
				 VALUE (*getter)(), void (*setter)())

If you need to supply either setter or getter, just supply 0 for the
hook you don't need.  If both hooks are 0, rb_define_hooked_variable()
works just like rb_define_variable().

The prototypes of the getter and setter functions are as follows:

  VALUE (*getter)(ID id, VALUE *var);
  void (*setter)(VALUE val, ID id, VALUE *var);


Also you can define a Ruby global variable without a corresponding C
variable.  The value of the variable will be set/get only by hooks.

  void rb_define_virtual_variable(const char *name,
				  VALUE (*getter)(), void (*setter)())

The prototypes of the getter and setter functions are as follows:

  VALUE (*getter)(ID id);
  void (*setter)(VALUE val, ID id);


3.3 Encapsulate C data into a Ruby object

To wrap and objectify a C pointer as a Ruby object (so called
DATA), use Data_Wrap_Struct().

  Data_Wrap_Struct(klass, mark, free, sval)

Data_Wrap_Struct() returns a created DATA object.  The klass argument
is the class for the DATA object.  The mark argument is the function
to mark Ruby objects pointed by this data.  The free argument is the
function to free the pointer allocation.  If this is -1, the pointer
will be just freed.  The functions mark and free will be called from
garbage collector.

These mark / free functions are invoked during GC execution.  No
object allocations are allowed during it, so do not allocate ruby
objects inside them.

You can allocate and wrap the structure in one step.

  Data_Make_Struct(klass, type, mark, free, sval)

This macro returns an allocated Data object, wrapping the pointer to
the structure, which is also allocated.  This macro works like:

  (sval = ALLOC(type), Data_Wrap_Struct(klass, mark, free, sval))

Arguments klass, mark, and free work like their counterparts in
Data_Wrap_Struct().  A pointer to the allocated structure will be
assigned to sval, which should be a pointer of the type specified.

To retrieve the C pointer from the Data object, use the macro
Data_Get_Struct().

  Data_Get_Struct(obj, type, sval)

A pointer to the structure will be assigned to the variable sval.

See the example below for details.

4. Example - Creating dbm extension

OK, here's the example of making an extension library.  This is the
extension to access DBMs.  The full source is included in the ext/
directory in the Ruby's source tree.

(1) make the directory

  % mkdir ext/dbm

Make a directory for the extension library under ext directory.

(2) design the library

You need to design the library features, before making it.

(3) write C code.

You need to write C code for your extension library.  If your library
has only one source file, choosing ``LIBRARY.c'' as a file name is
preferred.  On the other hand, in case your library has multiple source
files, avoid choosing ``LIBRARY.c'' for a file name.  It may conflict
with an intermediate file ``LIBRARY.o'' on some platforms.
Note that some functions in mkmf library described below generate
a file ``conftest.c'' for checking with compilation.  You shouldn't
choose ``conftest.c'' as a name of a source file.

Ruby will execute the initializing function named ``Init_LIBRARY'' in
the library.  For example, ``Init_dbm()'' will be executed when loading
the library.

Here's the example of an initializing function.

--
void
Init_dbm(void)
{
    /* define DBM class */
    cDBM = rb_define_class("DBM", rb_cObject);
    /* DBM includes Enumerate module */
    rb_include_module(cDBM, rb_mEnumerable);

    /* DBM has class method open(): arguments are received as C array */
    rb_define_singleton_method(cDBM, "open", fdbm_s_open, -1);

    /* DBM instance method close(): no args */
    rb_define_method(cDBM, "close", fdbm_close, 0);
    /* DBM instance method []: 1 argument */
    rb_define_method(cDBM, "[]", fdbm_fetch, 1);
		:

    /* ID for a instance variable to store DBM data */
    id_dbm = rb_intern("dbm");
}
--

The dbm extension wraps the dbm struct in the C environment using
Data_Make_Struct.

--
struct dbmdata {
    int  di_size;
    DBM *di_dbm;
};


obj = Data_Make_Struct(klass, struct dbmdata, 0, free_dbm, dbmp);
--

This code wraps the dbmdata structure into a Ruby object.  We avoid
wrapping DBM* directly, because we want to cache size information.

To retrieve the dbmdata structure from a Ruby object, we define the
following macro:

--
#define GetDBM(obj, dbmp) {\
    Data_Get_Struct(obj, struct dbmdata, dbmp);\
    if (dbmp->di_dbm == 0) closed_dbm();\
}
--

This sort of complicated macro does the retrieving and close checking for
the DBM.

There are three kinds of way to receive method arguments.  First,
methods with a fixed number of arguments receive arguments like this:

--
static VALUE
fdbm_delete(VALUE obj, VALUE keystr)
{
	:
}
--

The first argument of the C function is the self, the rest are the
arguments to the method.

Second, methods with an arbitrary number of arguments receive
arguments like this:

--
static VALUE
fdbm_s_open(int argc, VALUE *argv, VALUE klass)
{
	:
    if (rb_scan_args(argc, argv, "11", &file, &vmode) == 1) {
	mode = 0666;		/* default value */
    }
	:
}
--

The first argument is the number of method arguments, the second
argument is the C array of the method arguments, and the third
argument is the receiver of the method.

You can use the function rb_scan_args() to check and retrieve the
arguments.  The third argument is a string that specifies how to
capture method arguments and assign them to the following VALUE
references.


The following is an example of a method that takes arguments by Ruby's
array:

--
static VALUE
thread_initialize(VALUE thread, VALUE args)
{
	:
}
--

The first argument is the receiver, the second one is the Ruby array
which contains the arguments to the method.

** Notice

GC should know about global variables which refer to Ruby's objects, but
are not exported to the Ruby world.  You need to protect them by

  void rb_global_variable(VALUE *var)

(4) prepare extconf.rb

If the file named extconf.rb exists, it will be executed to generate
Makefile.

extconf.rb is the file for checking compilation conditions etc.  You
need to put

  require 'mkmf'

at the top of the file.  You can use the functions below to check
various conditions.

  have_macro(macro[, headers[, opt]]): check whether macro is defined
  have_library(lib[, func[, headers[, opt]]]): check whether library containing function exists
  find_library(lib[, func, *paths]): find library from paths
  have_func(func[, headers[, opt]): check whether function exists
  have_var(var[, headers[, opt]]): check whether variable exists
  have_header(header[, preheaders[, opt]]): check whether header file exists
  find_header(header, *paths): find header from paths
  have_framework(fw): check whether framework exists (for MacOS X)
  have_struct_member(type, member[, headers[, opt]]): check whether struct has member
  have_type(type[, headers[, opt]]): check whether type exists
  find_type(type, opt, *headers): check whether type exists in headers
  have_const(const[, headers[, opt]]): check whether constant is defined
  check_sizeof(type[, headers[, opts]]): check size of type
  check_signedness(type[, headers[, opts]]): check signedness of type
  convertible_int(type[, headers[, opts]]): find convertible integer type
  find_executable(bin[, path]): find excutable file path
  create_header(header): generate configured header
  create_makefile(target[, target_prefix]): generate Makefile

See MakeMakefile for full documentation of these functions.

The value of the variables below will affect the Makefile.

  $CFLAGS: included in CFLAGS make variable (such as -O)
  $CPPFLAGS: included in CPPFLAGS make variable (such as -I, -D)
  $LDFLAGS: included in LDFLAGS make variable (such as -L)
  $objs: list of object file names

Normally, the object files list is automatically generated by searching
source files, but you must define them explicitly if any sources will
be generated while building.

If a compilation condition is not fulfilled, you should not call
``create_makefile''.  The Makefile will not be generated, compilation will
not be done.

(5) prepare depend (optional)

If the file named depend exists, Makefile will include that file to
check dependencies.  You can make this file by invoking

  % gcc -MM *.c > depend

It's harmless.  Prepare it.

(6) generate Makefile

Try generating the Makefile by:

  ruby extconf.rb

If the library should be installed under vendor_ruby directory
instead of site_ruby directory, use --vendor option as follows.

  ruby extconf.rb --vendor

You don't need this step if you put the extension library under the ext
directory of the ruby source tree.  In that case, compilation of the
interpreter will do this step for you.

(7) make

Type

  make

to compile your extension.  You don't need this step either if you have
put the extension library under the ext directory of the ruby source tree.

(8) debug

You may need to rb_debug the extension.  Extensions can be linked
statically by adding the directory name in the ext/Setup file so that
you can inspect the extension with the debugger.

(9) done, now you have the extension library

You can do anything you want with your library.  The author of Ruby
will not claim any restrictions on your code depending on the Ruby API.
Feel free to use, modify, distribute or sell your program.

Appendix A. Ruby source files overview

ruby language core

  class.c         : classes and modules
  error.c         : exception classes and exception mechanism
  gc.c            : memory management
  load.c          : library loading
  object.c        : objects
  variable.c      : variables and constants

ruby syntax parser
  parse.y
    -> parse.c    : automatically generated
  keywords        : reserved keywords
    -> lex.c      : automatically generated

ruby evaluator (a.k.a. YARV)
  compile.c
  eval.c
  eval_error.c
  eval_jump.c
  eval_safe.c
  insns.def           : definition of VM instructions
  iseq.c              : implementation of VM::ISeq
  thread.c            : thread management and context swiching
  thread_win32.c      : thread implementation
  thread_pthread.c    : ditto
  vm.c
  vm_dump.c
  vm_eval.c
  vm_exec.c
  vm_insnhelper.c
  vm_method.c

  opt_insns_unif.def  : instruction unification
  opt_operand.def     : definitions for optimization

    -> insn*.inc      : automatically generated
    -> opt*.inc       : automatically generated
    -> vm.inc         : automatically generated

regular expression engine (oniguruma)
  regex.c
  regcomp.c
  regenc.c
  regerror.c
  regexec.c
  regparse.c
  regsyntax.c

utility functions

  debug.c       : debug symbols for C debuggger
  dln.c         : dynamic loading
  st.c          : general purpose hash table
  strftime.c    : formatting times
  util.c        : misc utilities

ruby interpreter implementation

  dmyext.c
  dmydln.c
  dmyencoding.c
  id.c
  inits.c
  main.c
  ruby.c
  version.c

  gem_prelude.rb
  prelude.rb


class library

  array.c       : Array
  bignum.c      : Bignum
  compar.c      : Comparable
  complex.c     : Complex
  cont.c        : Fiber, Continuation
  dir.c         : Dir
  enum.c        : Enumerable
  enumerator.c  : Enumerator
  file.c        : File
  hash.c        : Hash
  io.c          : IO
  marshal.c     : Marshal
  math.c        : Math
  numeric.c     : Numeric, Integer, Fixnum, Float
  pack.c        : Array#pack, String#unpack
  proc.c        : Binding, Proc
  process.c     : Process
  random.c      : random number
  range.c       : Range
  rational.c    : Rational
  re.c          : Regexp, MatchData
  signal.c      : Signal
  sprintf.c     :
  string.c      : String
  struct.c      : Struct
  time.c        : Time

  defs/known_errors.def  : Errno::* exception classes
    -> known_errors.inc  : automatically generated

multilingualization
  encoding.c    : Encoding
  transcode.c   : Encoding::Converter
  enc/*.c       : encoding classes
  enc/trans/*   : codepoint mapping tables

goruby interpreter implementation

  goruby.c
  golf_prelude.rb     : goruby specific libraries.
    -> golf_prelude.c : automatically generated


Appendix B. Ruby extension API reference

** Types

 VALUE

The type for the Ruby object.  Actual structures are defined in ruby.h,
such as struct RString, etc.  To refer the values in structures, use
casting macros like RSTRING(obj).

** Variables and constants

 Qnil

const: nil object

 Qtrue

const: true object(default true value)

 Qfalse

const: false object

** C pointer wrapping

 Data_Wrap_Struct(VALUE klass, void (*mark)(), void (*free)(), void *sval)

Wrap a C pointer into a Ruby object.  If object has references to other
Ruby objects, they should be marked by using the mark function during
the GC process.  Otherwise, mark should be 0.  When this object is no
longer referred by anywhere, the pointer will be discarded by free
function.

 Data_Make_Struct(klass, type, mark, free, sval)

This macro allocates memory using malloc(), assigns it to the variable
sval, and returns the DATA encapsulating the pointer to memory region.

 Data_Get_Struct(data, type, sval)

This macro retrieves the pointer value from DATA, and assigns it to
the variable sval.

** Checking data types

TYPE(value)
FIXNUM_P(value)
NIL_P(value)
void Check_Type(VALUE value, int type)
void Check_SafeStr(VALUE value)

** Data type conversion

FIX2INT(value), INT2FIX(i)
FIX2LONG(value), LONG2FIX(l)
NUM2INT(value), INT2NUM(i)
NUM2UINT(value), UINT2NUM(ui)
NUM2LONG(value), LONG2NUM(l)
NUM2ULONG(value), ULONG2NUM(ul)
NUM2LL(value), LL2NUM(ll)
NUM2ULL(value), ULL2NUM(ull)
NUM2OFFT(value), OFFT2NUM(off)
NUM2SIZET(value), SIZET2NUM(size)
NUM2SSIZET(value), SSIZET2NUM(ssize)
NUM2DBL(value)
rb_float_new(f)
StringValue(value)
StringValuePtr(value)
StringValueCStr(value)
rb_str_new2(s)

** defining class/module

 VALUE rb_define_class(const char *name, VALUE super)

Defines a new Ruby class as a subclass of super.

 VALUE rb_define_class_under(VALUE module, const char *name, VALUE super)

Creates a new Ruby class as a subclass of super, under the module's
namespace.

 VALUE rb_define_module(const char *name)

Defines a new Ruby module.

 VALUE rb_define_module_under(VALUE module, const char *name)

Defines a new Ruby module under the module's namespace.

 void rb_include_module(VALUE klass, VALUE module)

Includes module into class.  If class already includes it, just
ignored.

 void rb_extend_object(VALUE object, VALUE module)

Extend the object with the module's attributes.

** Defining Global Variables

 void rb_define_variable(const char *name, VALUE *var)

Defines a global variable which is shared between C and Ruby.  If name
contains a character which is not allowed to be part of the symbol,
it can't be seen from Ruby programs.

 void rb_define_readonly_variable(const char *name, VALUE *var)

Defines a read-only global variable.  Works just like
rb_define_variable(), except the defined variable is read-only.

 void rb_define_virtual_variable(const char *name,
				 VALUE (*getter)(), VALUE (*setter)())

Defines a virtual variable, whose behavior is defined by a pair of C
functions.  The getter function is called when the variable is
referenced.  The setter function is called when the variable is set to a
value.  The prototype for getter/setter functions are:

	VALUE getter(ID id)
	void setter(VALUE val, ID id)

The getter function must return the value for the access.

 void rb_define_hooked_variable(const char *name, VALUE *var,
				VALUE (*getter)(), VALUE (*setter)())

Defines hooked variable.  It's a virtual variable with a C variable.
The getter is called as

	VALUE getter(ID id, VALUE *var)

returning a new value.  The setter is called as

	void setter(VALUE val, ID id, VALUE *var)

GC requires C global variables which hold Ruby values to be marked.

 void rb_global_variable(VALUE *var)

Tells GC to protect these variables.

** Constant Definition

 void rb_define_const(VALUE klass, const char *name, VALUE val)

Defines a new constant under the class/module.

 void rb_define_global_const(const char *name, VALUE val)

Defines a global constant.  This is just the same as

     rb_define_const(cKernal, name, val)

** Method Definition

 rb_define_method(VALUE klass, const char *name, VALUE (*func)(), int argc)

Defines a method for the class.  func is the function pointer.  argc
is the number of arguments.  if argc is -1, the function will receive
3 arguments: argc, argv, and self.  if argc is -2, the function will
receive 2 arguments, self and args, where args is a Ruby array of
the method arguments.

 rb_define_private_method(VALUE klass, const char *name, VALUE (*func)(), int argc)

Defines a private method for the class.  Arguments are same as
rb_define_method().

 rb_define_singleton_method(VALUE klass, const char *name, VALUE (*func)(), int argc)

Defines a singleton method.  Arguments are same as rb_define_method().

 rb_scan_args(int argc, VALUE *argv, const char *fmt, ...)

Retrieve argument from argc and argv to given VALUE references
according to the format string.  The format can be described in ABNF
as follows:

--
scan-arg-spec  := param-arg-spec [option-hash-arg-spec] [block-arg-spec]

param-arg-spec := pre-arg-spec [post-arg-spec] / post-arg-spec / pre-opt-post-arg-spec
pre-arg-spec   := num-of-leading-mandatory-args [num-of-optional-args]
post-arg-spec  := sym-for-variable-length-args [num-of-trailing-mandatory-args]
pre-opt-post-arg-spec := num-of-leading-mandatory-args num-of-optional-args num-of-trailing-mandatory-args
option-hash-arg-spec := sym-for-option-hash-arg
block-arg-spec := sym-for-block-arg

num-of-leading-mandatory-args  := DIGIT ; The number of leading
                                        ; mandatory arguments
num-of-optional-args           := DIGIT ; The number of optional
                                        ; arguments
sym-for-variable-length-args   := "*"   ; Indicates that variable
                                        ; length arguments are
                                        ; captured as a ruby array
num-of-trailing-mandatory-args := DIGIT ; The number of trailing
                                        ; mandatory arguments
sym-for-option-hash-arg        := ":"   ; Indicates that an option
                                        ; hash is captured if the last
                                        ; argument is a hash or can be
                                        ; converted to a hash with
                                        ; #to_hash.  When the last
                                        ; argument is nil, it is
                                        ; captured if it is not
                                        ; ambiguous to take it as
                                        ; empty option hash; i.e. '*'
                                        ; is not specified and
                                        ; arguments are given more
                                        ; than sufficient.
sym-for-block-arg              := "&"   ; Indicates that an iterator
                                        ; block should be captured if
                                        ; given
--

For example, "12" means that the method requires at least one
argument, and at most receives three (1+2) arguments.  So, the format
string must be followed by three variable references, which are to be
assigned to captured arguments.  For omitted arguments, variables are
set to Qnil.  NULL can be put in place of a variable reference, which
means the corresponding captured argument(s) should be just dropped.

The number of given arguments, excluding an option hash or iterator
block, is returned.

** Invoking Ruby method

 VALUE rb_funcall(VALUE recv, ID mid, int narg, ...)

Invokes a method.  To retrieve mid from a method name, use rb_intern().

 VALUE rb_funcall2(VALUE recv, ID mid, int argc, VALUE *argv)

Invokes a method, passing arguments by an array of values.

 VALUE rb_eval_string(const char *str)

Compiles and executes the string as a Ruby program.

 ID rb_intern(const char *name)

Returns ID corresponding to the name.

 char *rb_id2name(ID id)

Returns the name corresponding ID.

 char *rb_class2name(VALUE klass)

Returns the name of the class.

 int rb_respond_to(VALUE object, ID id)

Returns true if the object responds to the message specified by id.

** Instance Variables

 VALUE rb_iv_get(VALUE obj, const char *name)

Retrieve the value of the instance variable.  If the name is not
prefixed by `@', that variable shall be inaccessible from Ruby.

 VALUE rb_iv_set(VALUE obj, const char *name, VALUE val)

Sets the value of the instance variable.

** Control Structure

 VALUE rb_block_call(VALUE recv, ID mid, int argc, VALUE * argv,
		     VALUE (*func) (ANYARGS), VALUE data2)

Calls a method on the recv, with the method name specified by the
symbol mid, with argc arguments in argv, supplying func as the
block. When func is called as the block, it will receive the value
from yield as the first argument, and data2 as the second argument.
When yielded with multiple values (in C, rb_yield_values(),
rb_yield_values2() and rb_yield_splat()), data2 is packed as an Array,
whereas yielded values can be gotten via argc/argv of the third/fourth
arguments.

 [OBSOLETE] VALUE rb_iterate(VALUE (*func1)(), void *arg1, VALUE (*func2)(), void *arg2)

Calls the function func1, supplying func2 as the block.  func1 will be
called with the argument arg1.  func2 receives the value from yield as
the first argument, arg2 as the second argument.

When rb_iterate is used in 1.9, func1 has to call some Ruby-level method.
This function is obsolete since 1.9; use rb_block_call instead.

 VALUE rb_yield(VALUE val)

Evaluates the block with value val.

 VALUE rb_rescue(VALUE (*func1)(), VALUE arg1, VALUE (*func2)(), VALUE arg2)

Calls the function func1, with arg1 as the argument.  If an exception
occurs during func1, it calls func2 with arg2 as the argument.  The
return value of rb_rescue() is the return value from func1 if no
exception occurs, from func2 otherwise.

 VALUE rb_ensure(VALUE (*func1)(), VALUE arg1, VALUE (*func2)(), VALUE arg2)

Calls the function func1 with arg1 as the argument, then calls func2
with arg2 if execution terminated.  The return value from
rb_ensure() is that of func1 when no exception occured.

 VALUE rb_protect(VALUE (*func) (VALUE), VALUE arg, int *state)

Calls the function func with arg as the argument.  If no exception
occured during func, it returns the result of func and *state is zero.
Otherwise, it returns Qnil and sets *state to nonzero.  If state is
NULL, it is not set in both cases.
You have to clear the error info with rb_set_errinfo(Qnil) when
ignoring the caught exception.

 void rb_jump_tag(int state)

Continues the exception caught by rb_protect() and rb_eval_string_protect().
state must be the returned value from those functions.  This function
never return to the caller.

 void rb_iter_break()

Exits from the current innermost block.  This function never return to
the caller.

 void rb_iter_break_value(VALUE value)

Exits from the current innermost block with the value.  The block will
return the given argument value.  This function never return to the
caller.

** Exceptions and Errors

 void rb_warn(const char *fmt, ...)

Prints a warning message according to a printf-like format.

 void rb_warning(const char *fmt, ...)

Prints a warning message according to a printf-like format, if
$VERBOSE is true.

void rb_raise(rb_eRuntimeError, const char *fmt, ...)

Raises RuntimeError.  The fmt is a format string just like printf().

 void rb_raise(VALUE exception, const char *fmt, ...)

Raises a class exception.  The fmt is a format string just like printf().

 void rb_fatal(const char *fmt, ...)

Raises a fatal error, terminates the interpreter.  No exception handling
will be done for fatal errors, but ensure blocks will be executed.

 void rb_bug(const char *fmt, ...)

Terminates the interpreter immediately.  This function should be
called under the situation caused by the bug in the interpreter.  No
exception handling nor ensure execution will be done.

** Initialize and Start the Interpreter

The embedding API functions are below (not needed for extension libraries):

 void ruby_init()

Initializes the interpreter.

 void ruby_options(int argc, char **argv)

Process command line arguments for the interpreter.

 void ruby_run()

Starts execution of the interpreter.

 void ruby_script(char *name)

Specifies the name of the script ($0).

** Hooks for the Interpreter Events

 void rb_add_event_hook(rb_event_hook_func_t func, rb_event_flag_t events, VALUE data)

Adds a hook function for the specified interpreter events.
events should be Or'ed value of:

	RUBY_EVENT_LINE
	RUBY_EVENT_CLASS
	RUBY_EVENT_END
	RUBY_EVENT_CALL
	RUBY_EVENT_RETURN
	RUBY_EVENT_C_CALL
	RUBY_EVENT_C_RETURN
	RUBY_EVENT_RAISE
	RUBY_EVENT_ALL

The definition of rb_event_hook_func_t is below:

 typedef void (*rb_event_hook_func_t)(rb_event_t event, VALUE data,
 				      VALUE self, ID id, VALUE klass)

The third argument `data' to rb_add_event_hook() is passed to the hook
function as the second argument, which was the pointer to the current
NODE in 1.8.  See RB_EVENT_HOOKS_HAVE_CALLBACK_DATA below.

 int rb_remove_event_hook(rb_event_hook_func_t func)

Removes the specified hook function.

** Macros for the Compatibilities

Some macros to check API compatibilities are available by default.

 NORETURN_STYLE_NEW

Means that NORETURN macro is functional style instead of prefix.

 HAVE_RB_DEFINE_ALLOC_FUNC

Means that function rb_define_alloc_func() is provided, that means the
allocation framework is used.  This is same as the result of
have_func("rb_define_alloc_func", "ruby.h").

 HAVE_RB_REG_NEW_STR

Means that function rb_reg_new_str() is provided, that creates Regexp
object from String object.  This is same as the result of
have_func("rb_reg_new_str", "ruby.h").

 HAVE_RB_IO_T

Means that type rb_io_t is provided.

 USE_SYMBOL_AS_METHOD_NAME

Means that Symbols will be returned as method names, e.g.,
Module#methods, #singleton_methods and so on.

 HAVE_RUBY_*_H

Defined in ruby.h and means correspoinding header is available.  For
instance, when HAVE_RUBY_ST_H is defined you should use ruby/st.h not
mere st.h.

 RB_EVENT_HOOKS_HAVE_CALLBACK_DATA

Means that rb_add_event_hook() takes the third argument `data', to be
passed to the given event hook function.

/*
 * Local variables:
 * fill-column: 70
 * end:
 */