1
0
Fork 0
mirror of https://github.com/ruby/ruby.git synced 2022-11-09 12:17:21 -05:00
ruby--ruby/lib/rexml/encoding.rb
ser fa4bfa6af5 Merged from REXML main repository:
Fixes ticket:68.
  NOTE that this involves an API change!  Entity declarations in the doctype
  now generate events that carry two, not one, arguments.

Implements ticket:15, using gwrite's suggestion.  This allows Element to be
subclassed.

Two unrelated changes, because subversion is retarded and doesn't do
block-level commits:

  1) Fixed a typo bug in previous change for ticket:15
  2) Fixed namespaces handling in XPath and element.  

    ***** Note that this is an API change!!! *****

    Element.namespaces() now returns a hash of namespace mappings which are
    relevant for that node.

Fixes a bug in multiple decodings

The changeset 1230:1231 was bad.  The default behavior is *not* to use the
native REXML encodings by default, but rather to use ICONV by default.  I know
that this will piss some people off, but defaulting to the pure Ruby version
isn't the correct solution, and it breaks other encodings, so I've reverted it.

* Fixes ticket:61 (xpath_parser)
* Fixes ticket:63 (UTF-16; UNILE decoding was bad)
* Cleans up some tests, removing opportunities for test corruption
* Improves parsing error messages a little
* Adds the ability to override the encoding detection in Source construction
* Fixes an edge case in Functions::string, where document nodes weren't 
  correctly converted
* Fixes Functions::string() for Element and Document nodes
* Fixes some problems in entity handling

Addresses ticket:66

Fixes ticket:71

Addresses ticket:78
  NOTE: that this also fixes what is technically another bug in REXML.  REXML's
  XPath parser used to allow exponential notation in numbers.  The XPath spec
  is specific about what a number is, and scientific notation is not included.
  Therefore, this has been fixed.

Cross-ported a fix for ticket:88 from CVS.

Fixes ticket:80

Documentation cleanup.  Ticket:84

Applied Kou's fix for an un-trac'ed bug.

------------------------------------------------------------------------


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@11548 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-01-20 03:56:02 +00:00

66 lines
2 KiB
Ruby

# -*- mode: ruby; ruby-indent-level: 2; indent-tabs-mode: t; tab-width: 2 -*- vim: sw=2 ts=2
module REXML
module Encoding
@encoding_methods = {}
def self.register(enc, &block)
@encoding_methods[enc] = block
end
def self.apply(obj, enc)
@encoding_methods[enc][obj]
end
def self.encoding_method(enc)
@encoding_methods[enc]
end
# Native, default format is UTF-8, so it is declared here rather than in
# an encodings/ definition.
UTF_8 = 'UTF-8'
UTF_16 = 'UTF-16'
UNILE = 'UNILE'
# ID ---> Encoding name
attr_reader :encoding
def encoding=( enc )
old_verbosity = $VERBOSE
begin
$VERBOSE = false
enc = enc.nil? ? nil : enc.upcase
return false if defined? @encoding and enc == @encoding
if enc and enc != UTF_8
@encoding = enc
raise ArgumentError, "Bad encoding name #@encoding" unless @encoding =~ /^[\w-]+$/
@encoding.untaint
begin
require 'rexml/encodings/ICONV.rb'
Encoding.apply(self, "ICONV")
rescue LoadError, Exception
begin
enc_file = File.join( "rexml", "encodings", "#@encoding.rb" )
require enc_file
Encoding.apply(self, @encoding)
rescue LoadError => err
puts err.message
raise ArgumentError, "No decoder found for encoding #@encoding. Please install iconv."
end
end
else
@encoding = UTF_8
require 'rexml/encodings/UTF-8.rb'
Encoding.apply(self, @encoding)
end
ensure
$VERBOSE = old_verbosity
end
true
end
def check_encoding str
# We have to recognize UTF-16, LSB UTF-16, and UTF-8
return UTF_16 if /\A\xfe\xff/n =~ str
return UNILE if /\A\xff\xfe/n =~ str
str =~ /^\s*<?xml\s*version=(['"]).*?\2\s*encoding=(["'])(.*?)\2/um
return $1.upcase if $1
return UTF_8
end
end
end