1
0
Fork 0
mirror of https://github.com/ruby/ruby.git synced 2022-11-09 12:17:21 -05:00
ruby--ruby/doc/packed_data.rdoc
Burdette Lamar 8d20632df8
[DOC] Packed data (#6520)
New page for packed data
2022-10-15 10:53:08 -05:00

586 lines
18 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

== Packed Data
Certain Ruby core methods deal with packing and unpacking data:
- \Method Array#pack:
Formats each element in array +self+ into a binary string;
returns that string.
- \Method String#unpack:
Extracts data from string +self+,
forming objects that become the elements of a new array;
returns that array.
- \Method String#unpack1:
Does the same, but returns only the first extracted object.
Each of these methods accepts a string +template+,
consisting of zero or more _directive_ characters,
each followed by zero or more _modifier_ characters.
Examples (directive <tt>'C'</tt> specifies 'unsigned character'):
[65].pack('C') # => "A" # One element, one directive.
[65, 66].pack('CC') # => "AB" # Two elements, two directives.
[65, 66].pack('C') # => "A" # Extra element is ignored.
[65].pack('') # => "" # No directives.
[65].pack('CC') # Extra directive raises ArgumentError.
'A'.unpack('C') # => [65] # One character, one directive.
'AB'.unpack('CC') # => [65, 66] # Two characters, two directives.
'AB'.unpack('C') # => [65] # Extra character is ignored.
'A'.unpack('CC') # => [65, nil] # Extra directive generates nil.
'AB'.unpack('') # => [] # No directives.
The string +template+ may contain any mixture of valid directives
(directive <tt>'c'</tt> specifies 'signed character'):
[65, -1].pack('cC') # => "A\xFF"
"A\xFF".unpack('cC') # => [65, 255]
The string +template+ may contain whitespace (which is ignored)
and comments, each of which begins with character <tt>'#'</tt>
and continues up to and including the next following newline:
[0,1].pack(" C #foo \n C ") # => "\x00\x01"
"\0\1".unpack(" C #foo \n C ") # => [0, 1]
Any directive may be followed by either of these modifiers:
- <tt>'*'</tt> - The directive is to be applied as many times as needed:
[65, 66].pack('C*') # => "AB"
'AB'.unpack('C*') # => [65, 66]
- Integer +count+ - The directive is to be applied +count+ times:
[65, 66].pack('C2') # => "AB"
[65, 66].pack('C3') # Raises ArgumentError.
'AB'.unpack('C2') # => [65, 66]
'AB'.unpack('C3') # => [65, 66, nil]
Note: Directives in <tt>%w[A a Z m]</tt> use +count+ differently;
see {String Directives}[rdoc-ref:packed_data.rdoc@String+Directives].
=== Packing \Method
\Method Array#pack accepts optional keyword argument
+buffer+ that specifies the target string (instead of a new string):
[65, 66].pack('C*', buffer: 'foo') # => "fooAB"
The method can accept a block:
# Packed string is passed to the block.
[65, 66].pack('C*') {|s| p s } # => "AB"
=== Unpacking Methods
Methods String#unpack and String#unpack1 each accept
an optional keyword argument +offset+ that specifies an offset
into the string:
'ABC'.unpack('C*', offset: 1) # => [66, 67]
'ABC'.unpack1('C*', offset: 1) # => 66
Both methods can accept a block:
# Each unpacked object is passed to the block.
ret = []
"ABCD".unpack("C*") {|c| ret << c }
ret # => [65, 66, 67, 68]
# The single unpacked object is passed to the block.
'AB'.unpack1('C*') {|ele| p ele } # => 65
=== \Integer Directives
Each integer directive specifies the packing or unpacking
for one element in the input or output array.
==== 8-Bit \Integer Directives
- <tt>'c'</tt> - 8-bit signed integer
(like C <tt>signed char</tt>):
[0, 1, 255].pack('c*') # => "\x00\x01\xFF"
s = [0, 1, -1].pack('c*') # => "\x00\x01\xFF"
s.unpack('c*') # => [0, 1, -1]
- <tt>'C'</tt> - 8-bit signed integer
(like C <tt>unsigned char</tt>):
[0, 1, 255].pack('C*') # => "\x00\x01\xFF"
s = [0, 1, -1].pack('C*') # => "\x00\x01\xFF"
s.unpack('C*') # => [0, 1, 255]
==== 16-Bit \Integer Directives
- <tt>'s'</tt> - 16-bit signed integer, native-endian
(like C <tt>int16_t</tt>):
[513, -514].pack('s*') # => "\x01\x02\xFE\xFD"
s = [513, 65022].pack('s*') # => "\x01\x02\xFE\xFD"
s.unpack('s*') # => [513, -514]
- <tt>'S'</tt> - 16-bit unsigned integer, native-endian
(like C <tt>uint16_t</tt>):
[513, -514].pack('S*') # => "\x01\x02\xFE\xFD"
s = [513, 65022].pack('S*') # => "\x01\x02\xFE\xFD"
s.unpack('S*') # => [513, 65022]
- <tt>'n'</tt> - 16-bit network integer, big-endian:
s = [0, 1, -1, 32767, -32768, 65535].pack('n*')
# => "\x00\x00\x00\x01\xFF\xFF\x7F\xFF\x80\x00\xFF\xFF"
s.unpack('n*')
# => [0, 1, 65535, 32767, 32768, 65535]
- <tt>'v'</tt> - 16-bit VAX integer, little-endian:
s = [0, 1, -1, 32767, -32768, 65535].pack('v*')
# => "\x00\x00\x01\x00\xFF\xFF\xFF\x7F\x00\x80\xFF\xFF"
s.unpack('v*')
# => [0, 1, 65535, 32767, 32768, 65535]
==== 32-Bit \Integer Directives
- <tt>'l'</tt> - 32-bit signed integer, native-endian
(like C <tt>int32_t</tt>):
s = [67305985, -50462977].pack('l*')
# => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC"
s.unpack('l*')
# => [67305985, -50462977]
- <tt>'L'</tt> - 32-bit unsigned integer, native-endian
(like C <tt>uint32_t</tt>):
s = [67305985, 4244504319].pack('L*')
# => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC"
s.unpack('L*')
# => [67305985, 4244504319]
- <tt>'N'</tt> - 32-bit network integer, big-endian:
s = [0,1,-1].pack('N*')
# => "\x00\x00\x00\x00\x00\x00\x00\x01\xFF\xFF\xFF\xFF"
s.unpack('N*')
# => [0, 1, 4294967295]
- <tt>'V'</tt> - 32-bit VAX integer, little-endian:
s = [0,1,-1].pack('V*')
# => "\x00\x00\x00\x00\x01\x00\x00\x00\xFF\xFF\xFF\xFF"
s.unpack('v*')
# => [0, 0, 1, 0, 65535, 65535]
==== 64-Bit \Integer Directives
- <tt>'q'</tt> - 64-bit signed integer, native-endian
(like C <tt>int64_t</tt>):
s = [578437695752307201, -506097522914230529].pack('q*')
# => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8"
s.unpack('q*')
# => [578437695752307201, -506097522914230529]
- <tt>'Q'</tt> - 64-bit unsigned integer, native-endian
(like C <tt>uint64_t</tt>):
s = [578437695752307201, 17940646550795321087].pack('Q*')
# => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8"
s.unpack('Q*')
# => [578437695752307201, 17940646550795321087]
==== Platform-Dependent \Integer Directives
- <tt>'i'</tt> - Platform-dependent width signed integer,
native-endian (like C <tt>int</tt>):
s = [67305985, -50462977].pack('i*')
# => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC"
s.unpack('i*')
# => [67305985, -50462977]
- <tt>'I'</tt> - Platform-dependent width unsigned integer,
native-endian (like C <tt>unsigned int</tt>):
s = [67305985, -50462977].pack('I*')
# => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC"
s.unpack('I*')
# => [67305985, 4244504319]
==== Pointer Directives
- <tt>'j'</tt> - 64-bit pointer-width signed integer,
native-endian (like C <tt>intptr_t</tt>):
s = [67305985, -50462977].pack('j*')
# => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\xFF\xFF\xFF\xFF"
s.unpack('j*')
# => [67305985, -50462977]
- <tt>'j'</tt> - 64-bit pointer-width unsigned integer,
native-endian (like C <tt>uintptr_t</tt>):
s = [67305985, 4244504319].pack('J*')
# => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\x00\x00\x00\x00"
s.unpack('J*')
# => [67305985, 4244504319]
==== Other \Integer Directives
:
- <tt>'U'</tt> - UTF-8 character:
s = [4194304].pack('U*')
# => "\xF8\x90\x80\x80\x80"
s.unpack('U*')
# => [4194304]
- <tt>'w'</tt> - BER-encoded integer
(see {BER enocding}[https://en.wikipedia.org/wiki/X.690#BER_encoding]):
s = [1073741823].pack('w*')
# => "\x83\xFF\xFF\xFF\x7F"
s.unpack('w*')
# => [1073741823]
==== Modifiers for \Integer Directives
For directives in
<tt>'i'</tt>,
<tt>'I'</tt>,
<tt>'s'</tt>,
<tt>'S'</tt>,
<tt>'l'</tt>,
<tt>'L'</tt>,
<tt>'q'</tt>,
<tt>'Q'</tt>,
<tt>'j'</tt>, and
<tt>'J'</tt>,
these modifiers may be suffixed:
- <tt>'!'</tt> or <tt>'_'</tt> - Underlying platforms native size.
- <tt>'>'</tt> - Big-endian.
- <tt>'<'</tt> - Little-endian.
=== \Float Directives
Each float directive specifies the packing or unpacking
for one element in the input or output array.
==== Single-Precision \Float Directives
- <tt>'F'</tt> or <tt>'f'</tt> - Native format:
s = [3.0].pack('F') # => "\x00\x00@@"
s.unpack('F') # => [3.0]
- <tt>'e'</tt> - Little-endian:
s = [3.0].pack('e') # => "\x00\x00@@"
s.unpack('e') # => [3.0]
- <tt>'g'</tt> - Big-endian:
s = [3.0].pack('g') # => "@@\x00\x00"
s.unpack('g') # => [3.0]
==== Double-Precision \Float Directives
- <tt>'D'</tt> or <tt>'d'</tt> - Native format:
s = [3.0].pack('D') # => "\x00\x00\x00\x00\x00\x00\b@"
s.unpack('D') # => [3.0]
- <tt>'E'</tt> - Little-endian:
s = [3.0].pack('E') # => "\x00\x00\x00\x00\x00\x00\b@"
s.unpack('E') # => [3.0]
- <tt>'G'</tt> - Big-endian:
s = [3.0].pack('G') # => "@\b\x00\x00\x00\x00\x00\x00"
s.unpack('G') # => [3.0]
A float directive may be infinity or not-a-number:
inf = 1.0/0.0 # => Infinity
[inf].pack('f') # => "\x00\x00\x80\x7F"
"\x00\x00\x80\x7F".unpack('f') # => [Infinity]
nan = inf/inf # => NaN
[nan].pack('f') # => "\x00\x00\xC0\x7F"
"\x00\x00\xC0\x7F".unpack('f') # => [NaN]
=== \String Directives
Each string directive specifies the packing or unpacking
for one byte in the input or output string.
==== Binary \String Directives
- <tt>'A'</tt> - Arbitrary binary string (space padded; count is width);
+nil+ is treated as the empty string:
['foo'].pack('A') # => "f"
['foo'].pack('A*') # => "foo"
['foo'].pack('A2') # => "fo"
['foo'].pack('A4') # => "foo "
[nil].pack('A') # => " "
[nil].pack('A*') # => ""
[nil].pack('A2') # => " "
[nil].pack('A4') # => " "
"foo\0".unpack('A') # => ["f"]
"foo\0".unpack('A4') # => ["foo"]
"foo\0bar".unpack('A10') # => ["foo\x00bar"] # Reads past "\0".
"foo ".unpack('A') # => ["f"]
"foo ".unpack('A4') # => ["foo"]
"foo".unpack('A4') # => ["foo"]
russian = "\u{442 435 441 442}" # => "тест"
russian.size # => 4
russian.bytesize # => 8
[russian].pack('A') # => "\xD1"
[russian].pack('A*') # => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"
russian.unpack('A') # => ["\xD1"]
russian.unpack('A2') # => ["\xD1\x82"]
russian.unpack('A4') # => ["\xD1\x82\xD0\xB5"]
russian.unpack('A*') # => ["\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"]
- <tt>'a'</tt> - Arbitrary binary string (null padded; count is width):
["foo"].pack('a') # => "f"
["foo"].pack('a*') # => "foo"
["foo"].pack('a2') # => "fo"
["foo\0"].pack('a4') # => "foo\x00"
[nil].pack('a') # => "\x00"
[nil].pack('a*') # => ""
[nil].pack('a2') # => "\x00\x00"
[nil].pack('a4') # => "\x00\x00\x00\x00"
"foo\0".unpack('a') # => ["f"]
"foo\0".unpack('a4') # => ["foo\x00"]
"foo ".unpack('a4') # => ["foo "]
"foo".unpack('a4') # => ["foo"]
"foo\0bar".unpack('a4') # => ["foo\x00"] # Reads past "\0".
- <tt>'Z'</tt> - Same as <tt>'a'</tt>,
except that null is added or ignored with <tt>'*'</tt>:
["foo"].pack('Z*') # => "foo\x00"
[nil].pack('Z*') # => "\x00"
"foo\0".unpack('Z*') # => ["foo"]
"foo".unpack('Z*') # => ["foo"]
"foo\0bar".unpack('Z*') # => ["foo"] # Does not read past "\0".
==== Bit \String Directives
- <tt>'B'</tt> - Bit string (high byte first):
['11111111' + '00000000'].pack('B*') # => "\xFF\x00"
['10000000' + '01000000'].pack('B*') # => "\x80@"
['1'].pack('B0') # => ""
['1'].pack('B1') # => "\x80"
['1'].pack('B2') # => "\x80\x00"
['1'].pack('B3') # => "\x80\x00"
['1'].pack('B4') # => "\x80\x00\x00"
['1'].pack('B5') # => "\x80\x00\x00"
['1'].pack('B6') # => "\x80\x00\x00\x00"
"\xff\x00".unpack("B*") # => ["1111111100000000"]
"\x01\x02".unpack("B*") # => ["0000000100000010"]
"".unpack("B0") # => [""]
"\x80".unpack("B1") # => ["1"]
"\x80".unpack("B2") # => ["10"]
"\x80".unpack("B3") # => ["100"]
- <tt>'b'</tt> - Bit string (low byte first):
['11111111' + '00000000'].pack('b*') # => "\xFF\x00"
['10000000' + '01000000'].pack('b*') # => "\x01\x02"
['1'].pack('b0') # => ""
['1'].pack('b1') # => "\x01"
['1'].pack('b2') # => "\x01\x00"
['1'].pack('b3') # => "\x01\x00"
['1'].pack('b4') # => "\x01\x00\x00"
['1'].pack('b5') # => "\x01\x00\x00"
['1'].pack('b6') # => "\x01\x00\x00\x00"
"\xff\x00".unpack("b*") # => ["1111111100000000"]
"\x01\x02".unpack("b*") # => ["1000000001000000"]
"".unpack("b0") # => [""]
"\x01".unpack("b1") # => ["1"]
"\x01".unpack("b2") # => ["10"]
"\x01".unpack("b3") # => ["100"]
==== Hex \String Directives
- <tt>'H'</tt> - Hex string (high nibble first):
['10ef'].pack('H*') # => "\x10\xEF"
['10ef'].pack('H0') # => ""
['10ef'].pack('H3') # => "\x10\xE0"
['10ef'].pack('H5') # => "\x10\xEF\x00"
['fff'].pack('H3') # => "\xFF\xF0"
['fff'].pack('H4') # => "\xFF\xF0"
['fff'].pack('H5') # => "\xFF\xF0\x00"
['fff'].pack('H6') # => "\xFF\xF0\x00"
['fff'].pack('H7') # => "\xFF\xF0\x00\x00"
['fff'].pack('H8') # => "\xFF\xF0\x00\x00"
"\x10\xef".unpack('H*') # => ["10ef"]
"\x10\xef".unpack('H0') # => [""]
"\x10\xef".unpack('H1') # => ["1"]
"\x10\xef".unpack('H2') # => ["10"]
"\x10\xef".unpack('H3') # => ["10e"]
"\x10\xef".unpack('H4') # => ["10ef"]
"\x10\xef".unpack('H5') # => ["10ef"]
- <tt>'h'</tt> - Hex string (low nibble first):
['10ef'].pack('h*') # => "\x01\xFE"
['10ef'].pack('h0') # => ""
['10ef'].pack('h3') # => "\x01\x0E"
['10ef'].pack('h5') # => "\x01\xFE\x00"
['fff'].pack('h3') # => "\xFF\x0F"
['fff'].pack('h4') # => "\xFF\x0F"
['fff'].pack('h5') # => "\xFF\x0F\x00"
['fff'].pack('h6') # => "\xFF\x0F\x00"
['fff'].pack('h7') # => "\xFF\x0F\x00\x00"
['fff'].pack('h8') # => "\xFF\x0F\x00\x00"
"\x01\xfe".unpack('h*') # => ["10ef"]
"\x01\xfe".unpack('h0') # => [""]
"\x01\xfe".unpack('h1') # => ["1"]
"\x01\xfe".unpack('h2') # => ["10"]
"\x01\xfe".unpack('h3') # => ["10e"]
"\x01\xfe".unpack('h4') # => ["10ef"]
"\x01\xfe".unpack('h5') # => ["10ef"]
==== Pointer \String Directives
- <tt>'P'</tt> - Pointer to a structure (fixed-length string):
s = ['abc'].pack('P') # => "\xE0O\x7F\xE5\xA1\x01\x00\x00"
s.unpack('P*') # => ["abc"]
".".unpack("P") # => []
("\0" * 8).unpack("P") # => [nil]
[nil].pack("P") # => "\x00\x00\x00\x00\x00\x00\x00\x00"
- <tt>'p'</tt> - Pointer to a null-terminated string:
s = ['abc'].pack('p') # => "(\xE4u\xE5\xA1\x01\x00\x00"
s.unpack('p*') # => ["abc"]
".".unpack("p") # => []
("\0" * 8).unpack("p") # => [nil]
[nil].pack("p") # => "\x00\x00\x00\x00\x00\x00\x00\x00"
==== Other \String Directives
- <tt>'M'</tt> - Quoted printable, MIME encoding;
text mode, but input must use LF and output LF;
(see {RFC 2045}[https://www.ietf.org/rfc/rfc2045.txt]):
["a b c\td \ne"].pack('M') # => "a b c\td =\n\ne=\n"
["\0"].pack('M') # => "=00=\n"
["a"*1023].pack('M') == ("a"*73+"=\n")*14+"a=\n" # => true
("a"*73+"=\na=\n").unpack('M') == ["a"*74] # => true
(("a"*73+"=\n")*14+"a=\n").unpack('M') == ["a"*1023] # => true
"a b c\td =\n\ne=\n".unpack('M') # => ["a b c\td \ne"]
"=00=\n".unpack('M') # => ["\x00"]
"pre=31=32=33after".unpack('M') # => ["pre123after"]
"pre=\nafter".unpack('M') # => ["preafter"]
"pre=\r\nafter".unpack('M') # => ["preafter"]
"pre=".unpack('M') # => ["pre="]
"pre=\r".unpack('M') # => ["pre=\r"]
"pre=hoge".unpack('M') # => ["pre=hoge"]
"pre==31after".unpack('M') # => ["pre==31after"]
"pre===31after".unpack('M') # => ["pre===31after"]
- <tt>'m'</tt> - Base64 encoded string;
count specifies input bytes between each newline,
rounded down to nearest multiple of 3;
if count is zero, no newlines are added;
(see {RFC 4648}[https://www.ietf.org/rfc/rfc4648.txt]):
[""].pack('m') # => ""
["\0"].pack('m') # => "AA==\n"
["\0\0"].pack('m') # => "AAA=\n"
["\0\0\0"].pack('m') # => "AAAA\n"
["\377"].pack('m') # => "/w==\n"
["\377\377"].pack('m') # => "//8=\n"
["\377\377\377"].pack('m') # => "////\n"
"".unpack('m') # => [""]
"AA==\n".unpack('m') # => ["\x00"]
"AAA=\n".unpack('m') # => ["\x00\x00"]
"AAAA\n".unpack('m') # => ["\x00\x00\x00"]
"/w==\n".unpack('m') # => ["\xFF"]
"//8=\n".unpack('m') # => ["\xFF\xFF"]
"////\n".unpack('m') # => ["\xFF\xFF\xFF"]
"A\n".unpack('m') # => [""]
"AA\n".unpack('m') # => ["\x00"]
"AA=\n".unpack('m') # => ["\x00"]
"AAA\n".unpack('m') # => ["\x00\x00"]
[""].pack('m0') # => ""
["\0"].pack('m0') # => "AA=="
["\0\0"].pack('m0') # => "AAA="
["\0\0\0"].pack('m0') # => "AAAA"
["\377"].pack('m0') # => "/w=="
["\377\377"].pack('m0') # => "//8="
["\377\377\377"].pack('m0') # => "////"
"".unpack('m0') # => [""]
"AA==".unpack('m0') # => ["\x00"]
"AAA=".unpack('m0') # => ["\x00\x00"]
"AAAA".unpack('m0') # => ["\x00\x00\x00"]
"/w==".unpack('m0') # => ["\xFF"]
"//8=".unpack('m0') # => ["\xFF\xFF"]
"////".unpack('m0') # => ["\xFF\xFF\xFF"]
- <tt>'u'</tt> - UU-encoded string:
[0].pack("U") # => "\u0000"
[0x3fffffff].pack("U") # => "\xFC\xBF\xBF\xBF\xBF\xBF"
[0x40000000].pack("U") # => "\xFD\x80\x80\x80\x80\x80"
[0x7fffffff].pack("U") # => "\xFD\xBF\xBF\xBF\xBF\xBF"
=== Offset Directives
- <tt>'@'</tt> - Begin packing at the given byte offset;
for packing, null fill if necessary:
[1, 2].pack("C@0C") # => "\x02"
[1, 2].pack("C@1C") # => "\x01\x02"
[1, 2].pack("C@5C") # => "\x01\x00\x00\x00\x00\x02"
"\x01\x00\x00\x02".unpack("C@3C") # => [1, 2]
"\x00".unpack("@1C") # => [nil]
- <tt>'X'</tt> - Back up a byte:
[0, 1, 2].pack("CCXC") # => "\x00\x02"
[0, 1, 2].pack("CCX2C") # => "\x02"
"\x00\x02".unpack("CCXC") # => [0, 2, 2]
=== Null Byte Direcive
- <tt>'x'</tt> - Null byte:
[].pack("x0") # => ""
[].pack("x") # => "\x00"
[].pack("x8") # => "\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x02".unpack("CxC") # => [0, 2]