1
0
Fork 0
mirror of https://github.com/rest-client/rest-client.git synced 2022-11-09 13:49:40 -05:00
rest-client--rest-client/lib/restclient/utils.rb
Andy Brody de03c9d4d1 Use URI.get_encoding to look up encodings.
Use the (undocumented) URI.get_encoding method introduced in Ruby 2.1 to
look up encodings by the aliases specified in HTML5. This means that the
behavior will differ slightly between versions of Ruby, but the
encodings selected are largely compatible.

For example, `ISO-8859-1` is an alias for `Windows-1252` per the HTML5
specification, while in ruby versions < 2.1 it will be used as is. These
two encodings are largely compatible, and the alias exists due to
servers that return a `charset=ISO-8859-1` when they actually are using
`Windows-1252`.

Other aliases that differ include `shift_jis` (rendered as
`Windows-31J`) and `euc-jp` (rendered as `CP51932`).
2015-11-16 15:23:08 -08:00

123 lines
3.5 KiB
Ruby

module RestClient
# Various utility methods
module Utils
# Return encoding from an HTTP header hash.
#
# We use the RFC 7231 specification and do not impose a default encoding on
# text. This differs from the older RFC 2616 behavior, which specifies
# using ISO-8859-1 for text/* content types without a charset.
#
# Strings will effectively end up using `Encoding.default_external` when
# this method returns nil.
#
# @param headers [Hash]
#
# @return [String, nil] encoding Return the string encoding or nil if no
# header is found.
#
def self.get_encoding_from_headers(headers)
type_header = headers[:content_type]
return nil unless type_header
_content_type, params = cgi_parse_header(type_header)
if params.include?('charset')
return params.fetch('charset').gsub(/(\A["']*)|(["']*\z)/, '')
end
nil
end
# Return the Encoding for a String encoding name.
#
# In ruby 2.1+ use URI.get_encoding() in order to support the encoding
# names and aliases specified by HTML5. Otherwise call Encoding.find().
#
# Note that the HTML5 specification indicates that certain valid encodings
# be treated as other similar encodings. For example, `ISO-8859-1` is
# rendered as `Windows-1252` even though it differs in certain control
# characters.
#
# @param [String] name A string encoding name, such as "utf-8"
# @return [Encoding, nil]
#
# @see Encoding.find
# @see URI.get_encoding
# @see https://encoding.spec.whatwg.org/#concept-encoding-get
#
def self.find_encoding(name)
if URI.respond_to?(:get_encoding)
return URI.get_encoding(name)
end
begin
Encoding.find(name)
rescue ArgumentError => e
raise unless e.message.include?('unknown encoding name')
nil
end
end
# Parse semi-colon separated, potentially quoted header string iteratively.
#
# @private
#
def self._cgi_parseparam(s)
return enum_for(__method__, s) unless block_given?
while s[0] == ';'
s = s[1..-1]
ends = s.index(';')
while ends && ends > 0 \
&& (s[0...ends].count('"') -
s[0...ends].scan('\"').count) % 2 != 0
ends = s.index(';', ends + 1)
end
if ends.nil?
ends = s.length
end
f = s[0...ends]
yield f.strip
s = s[ends..-1]
end
nil
end
# Parse a Content-type like header.
#
# Return the main content-type and a hash of options.
#
# This method was ported directly from Python's cgi.parse_header(). It
# probably doesn't read or perform particularly well in ruby.
# https://github.com/python/cpython/blob/3.4/Lib/cgi.py#L301-L331
#
#
# @param [String] line
# @return [Array(String, Hash)]
#
def self.cgi_parse_header(line)
parts = _cgi_parseparam(';' + line)
key = parts.next
pdict = {}
begin
while (p = parts.next)
i = p.index('=')
if i
name = p[0...i].strip.downcase
value = p[i+1..-1].strip
if value.length >= 2 && value[0] == '"' && value[-1] == '"'
value = value[1...-1]
value = value.gsub('\\\\', '\\').gsub('\\"', '"')
end
pdict[name] = value
end
end
rescue StopIteration
end
[key, pdict]
end
end
end