mirror of
https://github.com/ruby/ruby.git
synced 2022-11-09 12:17:21 -05:00
314 lines
12 KiB
Text
314 lines
12 KiB
Text
|
= Marshal Format
|
||
|
|
||
|
The Marshal format is used to serialize ruby objects. The format can store
|
||
|
arbitrary objects through three user-defined extension mechanisms.
|
||
|
|
||
|
For documentation on using Marshal to serialize and deserialize objects, see
|
||
|
the Marshal module.
|
||
|
|
||
|
This document calls a serialized set of objects a stream. The Ruby
|
||
|
implementation can load a set of objects from a String, an IO or an object
|
||
|
that implements a +getc+ method.
|
||
|
|
||
|
== Stream Format
|
||
|
|
||
|
The first two bytes of the stream contain the major and minor version, each as
|
||
|
a single byte encoding a digit. The version implemented in Ruby is 4.8
|
||
|
(stored as "\x04\x08") and is supported by ruby 1.8.0 and newer.
|
||
|
|
||
|
Different major versions of the Marshal format are not compatible and cannot
|
||
|
be understood by other major versions. Lesser minor versions of the format
|
||
|
can be understood by newer minor versions. Format 4.7 can be loaded by a 4.8
|
||
|
implementation but format 4.8 cannot be loaded by a 4.7 implementation.
|
||
|
|
||
|
Following the version bytes is a stream describing the serialized object. The
|
||
|
stream contains nested objects (the same as a Ruby object) but objects in the
|
||
|
stream do not necessarily have a direct mapping to the Ruby object model.
|
||
|
|
||
|
Each object in the stream is described by a byte indicating its type followed
|
||
|
by one or more bytes describing the object. When "object" is mentioned below
|
||
|
it means any of the types below that defines a Ruby object.
|
||
|
|
||
|
=== true, false, nil
|
||
|
|
||
|
These objects are each one byte long. "T" is represents +true+, "F"
|
||
|
represents +false+ and "0" represents +nil+.
|
||
|
|
||
|
=== Fixnum and long
|
||
|
|
||
|
"i" represents a signed 32 bit value using a packed format. One through five
|
||
|
bytes follows the type. The value loaded will always be a Fixnum. On
|
||
|
32 bit platforms (where the precision of a Fixnum is less than 32 bits)
|
||
|
loading large values will cause overflow on CRuby.
|
||
|
|
||
|
The fixnum type is used to represent both ruby Fixnum objects and the sizes of
|
||
|
marshaled arrays, hashes, instance variables and other types. In the
|
||
|
following sections "long" will mean the format described below, which supports
|
||
|
full 32 bit precision.
|
||
|
|
||
|
The first byte has the following special values:
|
||
|
|
||
|
"\x00"::
|
||
|
The value of the integer is 0. No bytes follow.
|
||
|
|
||
|
"\x01"::
|
||
|
The total size of the integer is two bytes. The following byte is a
|
||
|
positive integer in the range of 0 through 255. Only values between 123
|
||
|
and 255 should be represented this way to save bytes.
|
||
|
|
||
|
"\xff"::
|
||
|
The total size of the integer is two bytes. The following byte is a
|
||
|
negative integer in the range of -1 through -256.
|
||
|
|
||
|
"\x02"::
|
||
|
The total size of the integer is three bytes. The following two bytes are a
|
||
|
positive little-endian integer.
|
||
|
|
||
|
"\xfe"::
|
||
|
The total size of the integer is three bytes. The following two bytes are a
|
||
|
negative little-endian integer.
|
||
|
|
||
|
"\x03"::
|
||
|
The total size of the integer is four bytes. The following three bytes are
|
||
|
a positive little-endian integer.
|
||
|
|
||
|
"\xfd"::
|
||
|
The total size of the integer is two bytes. The following three bytes are a
|
||
|
negative little-endian integer.
|
||
|
|
||
|
"\x04"::
|
||
|
The total size of the integer is five bytes. The following four bytes are a
|
||
|
positive little-endian integer. For compatibility with 32 bit ruby,
|
||
|
only Fixnums less than 1073741824 should be represented this way. For sizes
|
||
|
of stream objects full precision may be used.
|
||
|
|
||
|
"\xfc"::
|
||
|
The total size of the integer is two bytes. The following four bytes are a
|
||
|
negative little-endian integer. For compatibility with 32 bit ruby,
|
||
|
only Fixnums greater than -10737341824 should be represented this way. For
|
||
|
sizes of stream objects full precision may be used.
|
||
|
|
||
|
Otherwise the first byte is a sign-extended eight-bit value with an offset.
|
||
|
If the value is positive the value is determined by subtracting 5 from the
|
||
|
value. If the value is negative the value is determined by adding 5 to the
|
||
|
value.
|
||
|
|
||
|
There are multiple representations for many values. CRuby always outputs the
|
||
|
shortest representation possible.
|
||
|
|
||
|
=== Symbols and Byte Sequence
|
||
|
|
||
|
":" represents a real symbol. A real symbol contains the data needed to
|
||
|
define the symbol for the rest of the stream as future occurrences in the
|
||
|
stream will instead be references (a symbol link) to this one. The reference
|
||
|
is a zero-indexed 32 bit value (so the first occurrence of <code>:hello</code>
|
||
|
is 0).
|
||
|
|
||
|
Following the type byte is byte sequence which consists of a long indicating
|
||
|
the number of bytes in the sequence followed by that many bytes of data. Byte
|
||
|
sequences have no encoding.
|
||
|
|
||
|
For example, the following stream contains the Symbol <code>:hello</code>:
|
||
|
|
||
|
"\x04\x08:\x0ahello"
|
||
|
|
||
|
";" represents a Symbol link which references a previously defined Symbol.
|
||
|
Following the type byte is a long containing the index in the lookup table for
|
||
|
the linked (referenced) Symbol.
|
||
|
|
||
|
For example, the following stream contains <code>[:hello, :hello]</code>:
|
||
|
|
||
|
"\x04\b[\a:\nhello;\x00"
|
||
|
|
||
|
When a "symbol" is referenced below it may be either a real symbol or a
|
||
|
symbol link.
|
||
|
|
||
|
=== Object References
|
||
|
|
||
|
Separate from but similar to symbol references, the stream contains only one
|
||
|
copy of each object (as determined by #object_id) for all objects except
|
||
|
true, false, nil, Fixnums and Symbols (which are stored separately as
|
||
|
described above) a one-indexed 32 bit value will be stored and reused when the
|
||
|
object is encountered again. (The first object has an index of 1).
|
||
|
|
||
|
"@" represents an object link. Following the type byte is a long giving the
|
||
|
index of the object.
|
||
|
|
||
|
For example, the following stream contains an Array of the object
|
||
|
<code>"hello"</code> twice:
|
||
|
|
||
|
"\004\b[\a\"\nhello@\006"
|
||
|
|
||
|
=== Instance Variables
|
||
|
|
||
|
"I" indicates that instance variables follow the next object. An object
|
||
|
follows the type byte. Following the object is a length indicating the number
|
||
|
of instance variables for the object. Following the length is a set of
|
||
|
name-value pairs. The names are symbols while the values are objects. The
|
||
|
symbols must be instance variable names (<code>:@name</code>).
|
||
|
|
||
|
An Object ("o" type, described below) uses the same format for its instance
|
||
|
variables as described here.
|
||
|
|
||
|
For a String and Regexp (described below) a special instance variable
|
||
|
<code>:E</code> is used to indicate the Encoding.
|
||
|
|
||
|
=== Extended
|
||
|
|
||
|
"e" indicates that the next object is extended by a module. An object follows
|
||
|
the type byte. Following the object is a symbol that contains the name of the
|
||
|
module the object is extended by.
|
||
|
|
||
|
=== Array
|
||
|
|
||
|
"[" represents an Array. Following the type byte is a long indicating the
|
||
|
number of objects in the array. The given number of objects follow the
|
||
|
length.
|
||
|
|
||
|
=== Bignum
|
||
|
|
||
|
"l" represents a Bignum which is composed of three parts:
|
||
|
|
||
|
sign::
|
||
|
A single byte containing "+" for a positive value or "-" for a negative
|
||
|
value.
|
||
|
length::
|
||
|
A long indicating the number of bytes of Bignum data follows, divided by
|
||
|
two. Multiply the length by two to determine the number of bytes of data
|
||
|
that follow.
|
||
|
data::
|
||
|
Bytes of Bignum data representing the number.
|
||
|
|
||
|
The following ruby code will reconstruct the Bignum value from an array of
|
||
|
bytes:
|
||
|
|
||
|
result = 0
|
||
|
|
||
|
bytes.each_with_index do |byte, exp|
|
||
|
result += (byte * 2 ** (exp * 8))
|
||
|
end
|
||
|
|
||
|
=== Class and Module
|
||
|
|
||
|
"c" represents a Class object, "m" represents a Module and "M" represents
|
||
|
either a class or module (this is an old-style for compatibility). No class
|
||
|
or module content is included, this type is only a reference. Following the
|
||
|
type byte is a byte sequence which is used to look up an existing class or
|
||
|
module, respectively.
|
||
|
|
||
|
Instance variables are not allowed on a class or module.
|
||
|
|
||
|
If no class or module exists an exception should be raised.
|
||
|
|
||
|
For "c" and "m" types, the loaded object must be a class or module,
|
||
|
respectively.
|
||
|
|
||
|
=== Data
|
||
|
|
||
|
"d" represents a Data object. (Data objects are wrapped pointers from ruby
|
||
|
extensions.) Following the type byte is a symbol indicating the class for the
|
||
|
Data object and an object that contains the state of the Data object.
|
||
|
|
||
|
To dump a Data object Ruby calls _dump_data. To load a Data object Ruby calls
|
||
|
_load_data with the state of the object on a newly allocated instance.
|
||
|
|
||
|
=== Float
|
||
|
|
||
|
"f" represents a Float object. Following the type byte is a byte sequence
|
||
|
containing the float value. The following values are special:
|
||
|
|
||
|
"inf"::
|
||
|
Positive infinity
|
||
|
|
||
|
"-inf"::
|
||
|
Negative infinity
|
||
|
|
||
|
"nan"::
|
||
|
Not a Number
|
||
|
|
||
|
Otherwise the byte sequence contains a C double (loadable by strtod(3)).
|
||
|
Older minor versions of Marshal also stored extra mantissa bits to ensure
|
||
|
portability across platforms but 4.8 does not include these. See
|
||
|
[ruby-talk:69518] for some explanation.
|
||
|
|
||
|
=== Hash and Hash with Default Value
|
||
|
|
||
|
"{" represents a Hash object while "}" represents a Hash with a default value
|
||
|
set (<code>Hash.new 0</code>). Following the type byte is a long indicating
|
||
|
the number of key-value pairs in the Hash, the size. Double the given number
|
||
|
of objects follow the size.
|
||
|
|
||
|
For a Hash with a default value, the default value follows all the pairs.
|
||
|
|
||
|
=== Module and Old Module
|
||
|
|
||
|
=== Object
|
||
|
|
||
|
"o" represents an object that doesn't have any other special form (such as
|
||
|
a user-defined or built-in format). Following the type byte is a symbol
|
||
|
containing the class name of the object. Following the class name is a long
|
||
|
indicating the number of instance variable names and values for the object.
|
||
|
Double the given number of pairs of objects follow the size.
|
||
|
|
||
|
The keys in the pairs must be symbols containing instance variable names.
|
||
|
|
||
|
=== Regular Expression
|
||
|
|
||
|
"/" represents a regular expression. Following the type byte is a byte
|
||
|
sequence containing the regular expression source. Following the type byte is
|
||
|
a byte containing the regular expression options (case-insensitive, etc.) as a
|
||
|
signed 8-bit value.
|
||
|
|
||
|
Regular expressions can have an encoding attached through instance variables
|
||
|
(see above). If no encoding is attached escapes for the following regexp
|
||
|
specials not present in ruby 1.8 must be removed: g-m, o-q, u, y, E, F, H-L,
|
||
|
N-V, X, Y.
|
||
|
|
||
|
=== String
|
||
|
|
||
|
'"' represents a String. Following the type byte is a byte sequence
|
||
|
containing the string content. When dumped from ruby 1.9 an encoding instance
|
||
|
variable (<code>:E</code> see above) should be included unless the encoding is
|
||
|
binary.
|
||
|
|
||
|
=== Struct
|
||
|
|
||
|
"S" represents a Struct. Following the type byte is a symbol containing the
|
||
|
name of the struct. Following the name is a long indicating the number of
|
||
|
members in the struct. Double the number of objects follow the member count.
|
||
|
Each member is a pair containing the member's symbol and an object for the
|
||
|
value of that member.
|
||
|
|
||
|
If the struct name does not match a Struct subclass in the running ruby an
|
||
|
exception should be raised.
|
||
|
|
||
|
If there is a mismatch between the struct in the currently running ruby and
|
||
|
the member count in the marshaled struct an exception should be raised.
|
||
|
|
||
|
=== User Class
|
||
|
|
||
|
"C" represents a subclass of a String, Regexp, Array or Hash. Following the
|
||
|
type byte is a symbol containing the name of the subclass. Following the name
|
||
|
is the wrapped object.
|
||
|
|
||
|
=== User Defined
|
||
|
|
||
|
"u" represents an object with a user-defined serialization format using the
|
||
|
+_dump+ instance method and +_load+ class method. Following the type byte is
|
||
|
a symbol containing the class name. Following the class name is a byte
|
||
|
sequence containing the user-defined representation of the object.
|
||
|
|
||
|
The class method +_load+ is called on the class with a string created from the
|
||
|
byte-sequence.
|
||
|
|
||
|
=== User Marshal
|
||
|
|
||
|
"U" represents an object with a user-defined serialization format using the
|
||
|
+marshal_dump+ and +marshal_load+ instance methods. Following the type byte
|
||
|
is a symbol containing the class name. Following the class name is an object
|
||
|
containing the data.
|
||
|
|
||
|
Upon loading a new instance must be allocated and +marshal_load+ must be
|
||
|
called on the instance with the data.
|
||
|
|