mirror of
https://github.com/ruby/ruby.git
synced 2022-11-09 12:17:21 -05:00
[DOC] Enhancements for encoding.rdoc (#5578)
Adds sections: String Encoding Symbol and Regexp Encodings Filesystem Encoding Locale Encoding IO Encodings External Encoding Internal Encoding Script Encoding Transcoding Transcoding a String
This commit is contained in:
parent
fc7e42a473
commit
c19a631c99
Notes:
git
2022-02-25 05:11:10 +09:00
Merged-By: BurdetteLamar <BurdetteLamar@Yahoo.com>
1 changed files with 169 additions and 1 deletions
|
@ -132,7 +132,175 @@ returns the \Encoding of the concatenated string, or +nil+ if incompatible:
|
|||
s1 = "\xa1\xa1".force_encoding('euc-jp') # => "\x{A1A1}"
|
||||
Encoding.compatible?(s0, s1) # => nil
|
||||
|
||||
==== \Encoding Options
|
||||
=== \String \Encoding
|
||||
|
||||
A Ruby String object has an encoding that is an instance of class \Encoding.
|
||||
The encoding may be retrieved by method String#encoding.
|
||||
|
||||
The default encoding for a string literal is the script encoding
|
||||
(see Encoding@Script+encoding):
|
||||
|
||||
's'.encoding # => #<Encoding:UTF-8>
|
||||
|
||||
The default encoding for a string created with method String.new is:
|
||||
|
||||
- For a \String object argument, the encoding of that string.
|
||||
- For a string literal, the script encoding (see Encoding@Script+encoding).
|
||||
|
||||
In either case, any encoding may be specified:
|
||||
|
||||
s = String.new(encoding: 'UTF-8') # => ""
|
||||
s.encoding # => #<Encoding:UTF-8>
|
||||
s = String.new('foo', encoding: 'ASCII-8BIT') # => "foo"
|
||||
s.encoding # => #<Encoding:ASCII-8BIT>
|
||||
|
||||
The encoding for a string may be changed:
|
||||
|
||||
s = "R\xC3\xA9sum\xC3\xA9" # => "Résumé"
|
||||
s.encoding # => #<Encoding:UTF-8>
|
||||
s.force_encoding('ISO-8859-1') # => "R\xC3\xA9sum\xC3\xA9"
|
||||
s.encoding # => #<Encoding:ISO-8859-1>
|
||||
|
||||
Changing the assigned encoding does not alter the content of the string;
|
||||
it changes only the way the content is to be interpreted:
|
||||
|
||||
s # => "R\xC3\xA9sum\xC3\xA9"
|
||||
s.force_encoding('UTF-8') # => "Résumé"
|
||||
|
||||
The actual content of a string may also be altered;
|
||||
see {Transcoding a String}[#label-Transcoding+a+String].
|
||||
|
||||
Here are a couple of useful query methods:
|
||||
|
||||
s = "abc".force_encoding("UTF-8") # => "abc"
|
||||
s.ascii_only? # => true
|
||||
s = "abc\u{6666}".force_encoding("UTF-8") # => "abc晦"
|
||||
s.ascii_only? # => false
|
||||
|
||||
s = "\xc2\xa1".force_encoding("UTF-8") # => "¡"
|
||||
s.valid_encoding? # => true
|
||||
s = "\xc2".force_encoding("UTF-8") # => "\xC2"
|
||||
s.valid_encoding? # => false
|
||||
|
||||
=== \Symbol and \Regexp Encodings
|
||||
|
||||
The string stored in a Symbol or Regexp object also has an encoding;
|
||||
the encoding may be retrieved by method Symbol#encoding or Regexp#encoding.
|
||||
|
||||
The default encoding for these, however, is:
|
||||
|
||||
- US-ASCII, if all characters are US-ASCII.
|
||||
- The script encoding, otherwise (see Encoding@Script+encoding).
|
||||
|
||||
=== Filesystem \Encoding
|
||||
|
||||
The filesystem encoding is the default \Encoding for a string from the filesystem:
|
||||
|
||||
Encoding.find("filesystem") # => #<Encoding:UTF-8>
|
||||
|
||||
=== Locale \Encoding
|
||||
|
||||
The locale encoding is the default encoding for a string from the environment,
|
||||
other than from the filesystem:
|
||||
|
||||
Encoding.find('locale') # => #<Encoding:IBM437>
|
||||
|
||||
=== \IO Encodings
|
||||
|
||||
An IO object (an input/output stream), and by inheritance a File object,
|
||||
has at least one, and sometimes two, encodings:
|
||||
|
||||
- Its _external_ _encoding_ identifies the encoding of the stream.
|
||||
- Its _internal_ _encoding_, if not +nil+, specifies the encoding
|
||||
to be used for the string constructed from the stream.
|
||||
|
||||
==== External \Encoding
|
||||
|
||||
Bytes read from the stream are decoded into characters via the external encoding;
|
||||
by default (that is, if the internal encoding is +nil),
|
||||
those characters become a string whose encoding is set to the external encoding.
|
||||
|
||||
The default external encoding is:
|
||||
|
||||
- UTF-8 for a text stream.
|
||||
- ASCII-8BIT for a binary stream.
|
||||
|
||||
f = File.open('t.rus', 'rb')
|
||||
f.external_encoding # => #<Encoding:ASCII-8BIT>
|
||||
|
||||
The external encoding may be set by the open option +external_encoding+:
|
||||
|
||||
f = File.open('t.txt', external_encoding: 'ASCII-8BIT')
|
||||
f.external_encoding # => #<Encoding:ASCII-8BIT>
|
||||
|
||||
The external encoding may also set by method #set_encoding:
|
||||
|
||||
f = File.open('t.txt')
|
||||
f.set_encoding('ASCII-8BIT')
|
||||
f.external_encoding # => #<Encoding:ASCII-8BIT>
|
||||
|
||||
==== Internal \Encoding
|
||||
|
||||
If not +nil+, the internal encoding specifies that the characters read
|
||||
from the stream are to be converted to characters in the internal encoding;
|
||||
those characters become a string whose encoding is set to the internal encoding.
|
||||
|
||||
The default internal encoding is +nil+ (no conversion).
|
||||
The internal encoding may set by the open option +internal_encoding+:
|
||||
|
||||
f = File.open('t.txt', internal_encoding: 'ASCII-8BIT')
|
||||
f.internal_encoding # => #<Encoding:ASCII-8BIT>
|
||||
|
||||
The internal encoding may also set by method #set_encoding:
|
||||
|
||||
f = File.open('t.txt')
|
||||
f.set_encoding('UTF-8', 'ASCII-8BIT')
|
||||
f.internal_encoding # => #<Encoding:ASCII-8BIT>
|
||||
|
||||
=== Script \Encoding
|
||||
|
||||
A Ruby script has a script encoding, which may be retrieved by:
|
||||
|
||||
__ENCODING__ # => #<Encoding:UTF-8>
|
||||
|
||||
The default script encoding is UTF-8;
|
||||
a Ruby source file may set its script encoding with a magic comment
|
||||
on the first line of the file (or second line, if there is a shebang on the first).
|
||||
The comment must contain the word +coding+ or +encoding+,
|
||||
followed by a colon, space and the Encoding name or alias:
|
||||
|
||||
# encoding: ISO-8859-1
|
||||
__ENCODING__ #=> #<Encoding:ISO-8859-1>
|
||||
|
||||
=== Transcoding
|
||||
|
||||
_Transcoding_ is the process of revising the content of a string or stream
|
||||
by changing its encoding.
|
||||
|
||||
==== Transcoding a \String
|
||||
|
||||
Each of these methods transcodes a string:
|
||||
|
||||
String#encode :: Transcodes a string into a new string
|
||||
according to a given destination encoding,
|
||||
a given or default source encoding, and encoding options.
|
||||
|
||||
String#encode! :: Like String#encode,
|
||||
but transcodes the string in place.
|
||||
|
||||
String#scrub :: Transcodes a string into a new string
|
||||
by replacing invalid byte sequences
|
||||
with a given or default replacement string.
|
||||
|
||||
String#scrub! :: Like String#scrub, but transcodes the string in place.
|
||||
|
||||
String#unicode_normalize :: Transcodes a string into a new string
|
||||
according to Unicode normalization:
|
||||
|
||||
String#unicode_normalize! :: Like String#unicode_normalize,
|
||||
but transcodes the string in place.
|
||||
|
||||
=== \Encoding Options
|
||||
|
||||
A number of methods in the Ruby core accept keyword arguments as encoding options.
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue