1
0
Fork 0
mirror of https://github.com/ruby/ruby.git synced 2022-11-09 12:17:21 -05:00

Merged from REXML main repository:

Fixes ticket:68.
  NOTE that this involves an API change!  Entity declarations in the doctype
  now generate events that carry two, not one, arguments.

Implements ticket:15, using gwrite's suggestion.  This allows Element to be
subclassed.

Two unrelated changes, because subversion is retarded and doesn't do
block-level commits:

  1) Fixed a typo bug in previous change for ticket:15
  2) Fixed namespaces handling in XPath and element.  

    ***** Note that this is an API change!!! *****

    Element.namespaces() now returns a hash of namespace mappings which are
    relevant for that node.

Fixes a bug in multiple decodings

The changeset 1230:1231 was bad.  The default behavior is *not* to use the
native REXML encodings by default, but rather to use ICONV by default.  I know
that this will piss some people off, but defaulting to the pure Ruby version
isn't the correct solution, and it breaks other encodings, so I've reverted it.

* Fixes ticket:61 (xpath_parser)
* Fixes ticket:63 (UTF-16; UNILE decoding was bad)
* Cleans up some tests, removing opportunities for test corruption
* Improves parsing error messages a little
* Adds the ability to override the encoding detection in Source construction
* Fixes an edge case in Functions::string, where document nodes weren't 
  correctly converted
* Fixes Functions::string() for Element and Document nodes
* Fixes some problems in entity handling

Addresses ticket:66

Fixes ticket:71

Addresses ticket:78
  NOTE: that this also fixes what is technically another bug in REXML.  REXML's
  XPath parser used to allow exponential notation in numbers.  The XPath spec
  is specific about what a number is, and scientific notation is not included.
  Therefore, this has been fixed.

Cross-ported a fix for ticket:88 from CVS.

Fixes ticket:80

Documentation cleanup.  Ticket:84

Applied Kou's fix for an un-trac'ed bug.

------------------------------------------------------------------------


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@11548 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit is contained in:
ser 2007-01-20 03:56:02 +00:00
parent f700c1354f
commit fa4bfa6af5
13 changed files with 142 additions and 83 deletions

View file

@ -6,7 +6,7 @@ module REXML
# Generates a Source object
# @param arg Either a String, or an IO
# @return a Source, or nil if a bad argument was given
def SourceFactory::create_from arg#, slurp=true
def SourceFactory::create_from(arg)
if arg.kind_of? String
Source.new(arg)
elsif arg.respond_to? :read and
@ -35,12 +35,19 @@ module REXML
# Constructor
# @param arg must be a String, and should be a valid XML document
def initialize(arg)
# @param encoding if non-null, sets the encoding of the source to this
# value, overriding all encoding detection
def initialize(arg, encoding=nil)
@orig = @buffer = arg
self.encoding = check_encoding( @buffer )
if encoding
self.encoding = encoding
else
self.encoding = check_encoding( @buffer )
end
@line = 0
end
# Inherited from Encoding
# Overridden to support optimized en/decoding
def encoding=(enc)
@ -124,7 +131,7 @@ module REXML
#attr_reader :block_size
# block_size has been deprecated
def initialize(arg, block_size=500)
def initialize(arg, block_size=500, encoding=nil)
@er_source = @source = arg
@to_utf = false
# Determining the encoding is a deceptively difficult issue to resolve.
@ -134,10 +141,12 @@ module REXML
# if there is one. If there isn't one, the file MUST be UTF-8, as per
# the XML spec. If there is one, we can determine the encoding from
# it.
@buffer = ""
str = @source.read( 2 )
if /\A(?:\xfe\xff|\xff\xfe)/n =~ str
if encoding
self.encoding = encoding
elsif /\A(?:\xfe\xff|\xff\xfe)/n =~ str
self.encoding = check_encoding( str )
@line_break = encode( '>' )
else
@line_break = '>'
end
@ -159,6 +168,8 @@ module REXML
str = @source.readline(@line_break)
str = decode(str) if @to_utf and str
@buffer << str
rescue Iconv::IllegalSequence
raise
rescue
@source = nil
end