mirror of
https://github.com/ruby/ruby.git
synced 2022-11-09 12:17:21 -05:00
* doc/marshal.rdoc: Add description of Marshal format.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@41075 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit is contained in:
parent
ed318080ad
commit
0454c0a281
2 changed files with 317 additions and 0 deletions
|
@ -1,3 +1,7 @@
|
|||
Wed Jun 5 06:35:15 2013 Eric Hodel <drbrain@segment7.net>
|
||||
|
||||
* doc/marshal.rdoc: Add description of Marshal format.
|
||||
|
||||
Wed Jun 5 01:16:09 2013 Benoit Daloze <eregontp@gmail.com>
|
||||
|
||||
* array.c (Array#+): fix documentation example.
|
||||
|
|
313
doc/marshal.rdoc
Normal file
313
doc/marshal.rdoc
Normal file
|
@ -0,0 +1,313 @@
|
|||
= Marshal Format
|
||||
|
||||
The Marshal format is used to serialize ruby objects. The format can store
|
||||
arbitrary objects through three user-defined extension mechanisms.
|
||||
|
||||
For documentation on using Marshal to serialize and deserialize objects, see
|
||||
the Marshal module.
|
||||
|
||||
This document calls a serialized set of objects a stream. The Ruby
|
||||
implementation can load a set of objects from a String, an IO or an object
|
||||
that implements a +getc+ method.
|
||||
|
||||
== Stream Format
|
||||
|
||||
The first two bytes of the stream contain the major and minor version, each as
|
||||
a single byte encoding a digit. The version implemented in Ruby is 4.8
|
||||
(stored as "\x04\x08") and is supported by ruby 1.8.0 and newer.
|
||||
|
||||
Different major versions of the Marshal format are not compatible and cannot
|
||||
be understood by other major versions. Lesser minor versions of the format
|
||||
can be understood by newer minor versions. Format 4.7 can be loaded by a 4.8
|
||||
implementation but format 4.8 cannot be loaded by a 4.7 implementation.
|
||||
|
||||
Following the version bytes is a stream describing the serialized object. The
|
||||
stream contains nested objects (the same as a Ruby object) but objects in the
|
||||
stream do not necessarily have a direct mapping to the Ruby object model.
|
||||
|
||||
Each object in the stream is described by a byte indicating its type followed
|
||||
by one or more bytes describing the object. When "object" is mentioned below
|
||||
it means any of the types below that defines a Ruby object.
|
||||
|
||||
=== true, false, nil
|
||||
|
||||
These objects are each one byte long. "T" is represents +true+, "F"
|
||||
represents +false+ and "0" represents +nil+.
|
||||
|
||||
=== Fixnum and long
|
||||
|
||||
"i" represents a signed 32 bit value using a packed format. One through five
|
||||
bytes follows the type. The value loaded will always be a Fixnum. On
|
||||
32 bit platforms (where the precision of a Fixnum is less than 32 bits)
|
||||
loading large values will cause overflow on CRuby.
|
||||
|
||||
The fixnum type is used to represent both ruby Fixnum objects and the sizes of
|
||||
marshaled arrays, hashes, instance variables and other types. In the
|
||||
following sections "long" will mean the format described below, which supports
|
||||
full 32 bit precision.
|
||||
|
||||
The first byte has the following special values:
|
||||
|
||||
"\x00"::
|
||||
The value of the integer is 0. No bytes follow.
|
||||
|
||||
"\x01"::
|
||||
The total size of the integer is two bytes. The following byte is a
|
||||
positive integer in the range of 0 through 255. Only values between 123
|
||||
and 255 should be represented this way to save bytes.
|
||||
|
||||
"\xff"::
|
||||
The total size of the integer is two bytes. The following byte is a
|
||||
negative integer in the range of -1 through -256.
|
||||
|
||||
"\x02"::
|
||||
The total size of the integer is three bytes. The following two bytes are a
|
||||
positive little-endian integer.
|
||||
|
||||
"\xfe"::
|
||||
The total size of the integer is three bytes. The following two bytes are a
|
||||
negative little-endian integer.
|
||||
|
||||
"\x03"::
|
||||
The total size of the integer is four bytes. The following three bytes are
|
||||
a positive little-endian integer.
|
||||
|
||||
"\xfd"::
|
||||
The total size of the integer is two bytes. The following three bytes are a
|
||||
negative little-endian integer.
|
||||
|
||||
"\x04"::
|
||||
The total size of the integer is five bytes. The following four bytes are a
|
||||
positive little-endian integer. For compatibility with 32 bit ruby,
|
||||
only Fixnums less than 1073741824 should be represented this way. For sizes
|
||||
of stream objects full precision may be used.
|
||||
|
||||
"\xfc"::
|
||||
The total size of the integer is two bytes. The following four bytes are a
|
||||
negative little-endian integer. For compatibility with 32 bit ruby,
|
||||
only Fixnums greater than -10737341824 should be represented this way. For
|
||||
sizes of stream objects full precision may be used.
|
||||
|
||||
Otherwise the first byte is a sign-extended eight-bit value with an offset.
|
||||
If the value is positive the value is determined by subtracting 5 from the
|
||||
value. If the value is negative the value is determined by adding 5 to the
|
||||
value.
|
||||
|
||||
There are multiple representations for many values. CRuby always outputs the
|
||||
shortest representation possible.
|
||||
|
||||
=== Symbols and Byte Sequence
|
||||
|
||||
":" represents a real symbol. A real symbol contains the data needed to
|
||||
define the symbol for the rest of the stream as future occurrences in the
|
||||
stream will instead be references (a symbol link) to this one. The reference
|
||||
is a zero-indexed 32 bit value (so the first occurrence of <code>:hello</code>
|
||||
is 0).
|
||||
|
||||
Following the type byte is byte sequence which consists of a long indicating
|
||||
the number of bytes in the sequence followed by that many bytes of data. Byte
|
||||
sequences have no encoding.
|
||||
|
||||
For example, the following stream contains the Symbol <code>:hello</code>:
|
||||
|
||||
"\x04\x08:\x0ahello"
|
||||
|
||||
";" represents a Symbol link which references a previously defined Symbol.
|
||||
Following the type byte is a long containing the index in the lookup table for
|
||||
the linked (referenced) Symbol.
|
||||
|
||||
For example, the following stream contains <code>[:hello, :hello]</code>:
|
||||
|
||||
"\x04\b[\a:\nhello;\x00"
|
||||
|
||||
When a "symbol" is referenced below it may be either a real symbol or a
|
||||
symbol link.
|
||||
|
||||
=== Object References
|
||||
|
||||
Separate from but similar to symbol references, the stream contains only one
|
||||
copy of each object (as determined by #object_id) for all objects except
|
||||
true, false, nil, Fixnums and Symbols (which are stored separately as
|
||||
described above) a one-indexed 32 bit value will be stored and reused when the
|
||||
object is encountered again. (The first object has an index of 1).
|
||||
|
||||
"@" represents an object link. Following the type byte is a long giving the
|
||||
index of the object.
|
||||
|
||||
For example, the following stream contains an Array of the object
|
||||
<code>"hello"</code> twice:
|
||||
|
||||
"\004\b[\a\"\nhello@\006"
|
||||
|
||||
=== Instance Variables
|
||||
|
||||
"I" indicates that instance variables follow the next object. An object
|
||||
follows the type byte. Following the object is a length indicating the number
|
||||
of instance variables for the object. Following the length is a set of
|
||||
name-value pairs. The names are symbols while the values are objects. The
|
||||
symbols must be instance variable names (<code>:@name</code>).
|
||||
|
||||
An Object ("o" type, described below) uses the same format for its instance
|
||||
variables as described here.
|
||||
|
||||
For a String and Regexp (described below) a special instance variable
|
||||
<code>:E</code> is used to indicate the Encoding.
|
||||
|
||||
=== Extended
|
||||
|
||||
"e" indicates that the next object is extended by a module. An object follows
|
||||
the type byte. Following the object is a symbol that contains the name of the
|
||||
module the object is extended by.
|
||||
|
||||
=== Array
|
||||
|
||||
"[" represents an Array. Following the type byte is a long indicating the
|
||||
number of objects in the array. The given number of objects follow the
|
||||
length.
|
||||
|
||||
=== Bignum
|
||||
|
||||
"l" represents a Bignum which is composed of three parts:
|
||||
|
||||
sign::
|
||||
A single byte containing "+" for a positive value or "-" for a negative
|
||||
value.
|
||||
length::
|
||||
A long indicating the number of bytes of Bignum data follows, divided by
|
||||
two. Multiply the length by two to determine the number of bytes of data
|
||||
that follow.
|
||||
data::
|
||||
Bytes of Bignum data representing the number.
|
||||
|
||||
The following ruby code will reconstruct the Bignum value from an array of
|
||||
bytes:
|
||||
|
||||
result = 0
|
||||
|
||||
bytes.each_with_index do |byte, exp|
|
||||
result += (byte * 2 ** (exp * 8))
|
||||
end
|
||||
|
||||
=== Class and Module
|
||||
|
||||
"c" represents a Class object, "m" represents a Module and "M" represents
|
||||
either a class or module (this is an old-style for compatibility). No class
|
||||
or module content is included, this type is only a reference. Following the
|
||||
type byte is a byte sequence which is used to look up an existing class or
|
||||
module, respectively.
|
||||
|
||||
Instance variables are not allowed on a class or module.
|
||||
|
||||
If no class or module exists an exception should be raised.
|
||||
|
||||
For "c" and "m" types, the loaded object must be a class or module,
|
||||
respectively.
|
||||
|
||||
=== Data
|
||||
|
||||
"d" represents a Data object. (Data objects are wrapped pointers from ruby
|
||||
extensions.) Following the type byte is a symbol indicating the class for the
|
||||
Data object and an object that contains the state of the Data object.
|
||||
|
||||
To dump a Data object Ruby calls _dump_data. To load a Data object Ruby calls
|
||||
_load_data with the state of the object on a newly allocated instance.
|
||||
|
||||
=== Float
|
||||
|
||||
"f" represents a Float object. Following the type byte is a byte sequence
|
||||
containing the float value. The following values are special:
|
||||
|
||||
"inf"::
|
||||
Positive infinity
|
||||
|
||||
"-inf"::
|
||||
Negative infinity
|
||||
|
||||
"nan"::
|
||||
Not a Number
|
||||
|
||||
Otherwise the byte sequence contains a C double (loadable by strtod(3)).
|
||||
Older minor versions of Marshal also stored extra mantissa bits to ensure
|
||||
portability across platforms but 4.8 does not include these. See
|
||||
[ruby-talk:69518] for some explanation.
|
||||
|
||||
=== Hash and Hash with Default Value
|
||||
|
||||
"{" represents a Hash object while "}" represents a Hash with a default value
|
||||
set (<code>Hash.new 0</code>). Following the type byte is a long indicating
|
||||
the number of key-value pairs in the Hash, the size. Double the given number
|
||||
of objects follow the size.
|
||||
|
||||
For a Hash with a default value, the default value follows all the pairs.
|
||||
|
||||
=== Module and Old Module
|
||||
|
||||
=== Object
|
||||
|
||||
"o" represents an object that doesn't have any other special form (such as
|
||||
a user-defined or built-in format). Following the type byte is a symbol
|
||||
containing the class name of the object. Following the class name is a long
|
||||
indicating the number of instance variable names and values for the object.
|
||||
Double the given number of pairs of objects follow the size.
|
||||
|
||||
The keys in the pairs must be symbols containing instance variable names.
|
||||
|
||||
=== Regular Expression
|
||||
|
||||
"/" represents a regular expression. Following the type byte is a byte
|
||||
sequence containing the regular expression source. Following the type byte is
|
||||
a byte containing the regular expression options (case-insensitive, etc.) as a
|
||||
signed 8-bit value.
|
||||
|
||||
Regular expressions can have an encoding attached through instance variables
|
||||
(see above). If no encoding is attached escapes for the following regexp
|
||||
specials not present in ruby 1.8 must be removed: g-m, o-q, u, y, E, F, H-L,
|
||||
N-V, X, Y.
|
||||
|
||||
=== String
|
||||
|
||||
'"' represents a String. Following the type byte is a byte sequence
|
||||
containing the string content. When dumped from ruby 1.9 an encoding instance
|
||||
variable (<code>:E</code> see above) should be included unless the encoding is
|
||||
binary.
|
||||
|
||||
=== Struct
|
||||
|
||||
"S" represents a Struct. Following the type byte is a symbol containing the
|
||||
name of the struct. Following the name is a long indicating the number of
|
||||
members in the struct. Double the number of objects follow the member count.
|
||||
Each member is a pair containing the member's symbol and an object for the
|
||||
value of that member.
|
||||
|
||||
If the struct name does not match a Struct subclass in the running ruby an
|
||||
exception should be raised.
|
||||
|
||||
If there is a mismatch between the struct in the currently running ruby and
|
||||
the member count in the marshaled struct an exception should be raised.
|
||||
|
||||
=== User Class
|
||||
|
||||
"C" represents a subclass of a String, Regexp, Array or Hash. Following the
|
||||
type byte is a symbol containing the name of the subclass. Following the name
|
||||
is the wrapped object.
|
||||
|
||||
=== User Defined
|
||||
|
||||
"u" represents an object with a user-defined serialization format using the
|
||||
+_dump+ instance method and +_load+ class method. Following the type byte is
|
||||
a symbol containing the class name. Following the class name is a byte
|
||||
sequence containing the user-defined representation of the object.
|
||||
|
||||
The class method +_load+ is called on the class with a string created from the
|
||||
byte-sequence.
|
||||
|
||||
=== User Marshal
|
||||
|
||||
"U" represents an object with a user-defined serialization format using the
|
||||
+marshal_dump+ and +marshal_load+ instance methods. Following the type byte
|
||||
is a symbol containing the class name. Following the class name is an object
|
||||
containing the data.
|
||||
|
||||
Upon loading a new instance must be allocated and +marshal_load+ must be
|
||||
called on the instance with the data.
|
||||
|
Loading…
Reference in a new issue