1
0
Fork 0
mirror of https://github.com/ruby/ruby.git synced 2022-11-09 12:17:21 -05:00

Short summary:

This is a version bump to REXML 3.1.4 for Ruby HEAD.  This change log is
  identical to the log for the 1.8 branch.

  It includes numerous bug fixes and is a pretty big patch, but is nonetheless
  a minor revision bump, since the API hasn't changed.

  For more information, see:

    http:/www.germane-software.com/projects/rexml/milestone/3.1.4

  For all tickets, see:

    http://www.germane-software.com/projects/rexml/ticket/#

  Where '#' is replaced with the ticket number.

Changelog:

* Fixed the documentation WRT the raw mode of text nodes (ticket #4)
* Fixes roundup ticket #43: substring-after bug.
* Fixed ticket #44, Element#xpath
* Patch submitted by an anonymous doner to allow parsing of Tempfiles.  I was
  hoping that, by now, that whole Source thing would have been changed to use
  duck typing and avoid this sort of ticket... but in the meantime, the patch
  has been applied.
* Fixes ticket:30, XPath default namespace bug.  The fix was provided
  by Lucas Nussbaum.
* Aliases #size to #length, as per zdennis's request.
* Fixes typo from previous commit
* Fixes ticket #32, preceding-sibling fails attempting delete_if on nil nodeset
* Merges a user-contributed patch for ticket #40
* Adds a forgotten-to-commit unit test for ticket #32
* Changes Date, Version, and Copyright to upper case, to avoid conflicts with
  the Date class.  All of the other changes in the altered files are because
  Subversion doesn't allow block-level commits, like it should.  English cased
  Version and Copyright are aliased to the upper case versions, for partial
  backward compatability.
* Resolves ticket #34, SAX parser change makes it impossible to parse IO feeds.
* Moves parser.source.position() to parser.position()
* Fixes ticket:48, repeated writes munging text content
* Fixes ticket:46, adding methods for accessing notation DTD information.
* Encodes some characters and removes a brokes link in the documentation
* Deals with carriage returns after XML declarations
* Improved doctype handling
* Whitespace handling changes
* Applies a patch by David Tardon, which (incidentally) fixes ticket:50
* Closes #26, allowing anything that walks like an IO to be a source.
* Ticket #31 - One unescape too many
  This wasn't really a bug, per se... "value" always returns
  a normalized string, and "value" is the method used to get
  the text() of an element.  However, entities have no meaning
  in CDATA sections, so there's no justification for value
  to be normalizing the content of CData objects.  This behavior
  has therefore been changed.
* Ticket #45 -- Now parses notation declarations in DTDs properly.
* Resolves ticket #49, Document.parse_stream returns ArgumentError
* Adds documentation to clarify how XMLDecl works, to avoid invalid bug reports.
* Addresses ticket #10, fixing the StreamParser API for DTDs.
* Fixes ticket #42, XPath node-set function 'name' fails with relative node
  set parameter
* Good patch by Aaron to fix ticket #53: REXML ignoring unbalanced tags
  at the end of a document.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@10092 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit is contained in:
ser 2006-04-15 04:11:04 +00:00
parent 406c1cb485
commit 78d9dd71a6
23 changed files with 1385 additions and 1236 deletions

View file

@ -101,20 +101,20 @@ module REXML
end end
@unnormalized = nil @unnormalized = nil
@value = @normalized = Text::normalize( @value, doctype ) @normalized = Text::normalize( @value, doctype )
end end
# Returns the UNNORMALIZED value of this attribute. That is, entities # Returns the UNNORMALIZED value of this attribute. That is, entities
# have been expanded to their values # have been expanded to their values
def value def value
@unnormalized if @unnormalized return @unnormalized if @unnormalized
doctype = nil doctype = nil
if @element if @element
doc = @element.document doc = @element.document
doctype = doc.doctype if doc doctype = doc.doctype if doc
end end
@normalized = nil @normalized = nil
@value = @unnormalized = Text::unnormalize( @value, doctype ) @unnormalized = Text::unnormalize( @value, doctype )
end end
# Returns a copy of this attribute # Returns a copy of this attribute

View file

@ -35,6 +35,10 @@ module REXML
@string @string
end end
def value
@string
end
# Generates XML output of this object # Generates XML output of this object
# #
# output:: # output::

View file

@ -6,55 +6,55 @@ require 'rexml/attlistdecl'
require 'rexml/xmltokens' require 'rexml/xmltokens'
module REXML module REXML
# Represents an XML DOCTYPE declaration; that is, the contents of <!DOCTYPE # Represents an XML DOCTYPE declaration; that is, the contents of <!DOCTYPE
# ... >. DOCTYPES can be used to declare the DTD of a document, as well as # ... >. DOCTYPES can be used to declare the DTD of a document, as well as
# being used to declare entities used in the document. # being used to declare entities used in the document.
class DocType < Parent class DocType < Parent
include XMLTokens include XMLTokens
START = "<!DOCTYPE" START = "<!DOCTYPE"
STOP = ">" STOP = ">"
SYSTEM = "SYSTEM" SYSTEM = "SYSTEM"
PUBLIC = "PUBLIC" PUBLIC = "PUBLIC"
DEFAULT_ENTITIES = { DEFAULT_ENTITIES = {
'gt'=>EntityConst::GT, 'gt'=>EntityConst::GT,
'lt'=>EntityConst::LT, 'lt'=>EntityConst::LT,
'quot'=>EntityConst::QUOT, 'quot'=>EntityConst::QUOT,
"apos"=>EntityConst::APOS "apos"=>EntityConst::APOS
} }
# name is the name of the doctype # name is the name of the doctype
# external_id is the referenced DTD, if given # external_id is the referenced DTD, if given
attr_reader :name, :external_id, :entities, :namespaces attr_reader :name, :external_id, :entities, :namespaces
# Constructor # Constructor
# #
# dt = DocType.new( 'foo', '-//I/Hate/External/IDs' ) # dt = DocType.new( 'foo', '-//I/Hate/External/IDs' )
# # <!DOCTYPE foo '-//I/Hate/External/IDs'> # # <!DOCTYPE foo '-//I/Hate/External/IDs'>
# dt = DocType.new( doctype_to_clone ) # dt = DocType.new( doctype_to_clone )
# # Incomplete. Shallow clone of doctype # # Incomplete. Shallow clone of doctype
# #
# +Note+ that the constructor: # +Note+ that the constructor:
# #
# Doctype.new( Source.new( "<!DOCTYPE foo 'bar'>" ) ) # Doctype.new( Source.new( "<!DOCTYPE foo 'bar'>" ) )
# #
# is _deprecated_. Do not use it. It will probably disappear. # is _deprecated_. Do not use it. It will probably disappear.
def initialize( first, parent=nil ) def initialize( first, parent=nil )
@entities = DEFAULT_ENTITIES @entities = DEFAULT_ENTITIES
@long_name = @uri = nil @long_name = @uri = nil
if first.kind_of? String if first.kind_of? String
super() super()
@name = first @name = first
@external_id = parent @external_id = parent
elsif first.kind_of? DocType elsif first.kind_of? DocType
super( parent ) super( parent )
@name = first.name @name = first.name
@external_id = first.external_id @external_id = first.external_id
elsif first.kind_of? Array elsif first.kind_of? Array
super( parent ) super( parent )
@name = first[0] @name = first[0]
@external_id = first[1] @external_id = first[1]
@long_name = first[2] @long_name = first[2]
@uri = first[3] @uri = first[3]
elsif first.kind_of? Source elsif first.kind_of? Source
super( parent ) super( parent )
parser = Parsers::BaseParser.new( first ) parser = Parsers::BaseParser.new( first )
@ -64,150 +64,215 @@ module REXML
end end
else else
super() super()
end end
end end
def node_type def node_type
:doctype :doctype
end end
def attributes_of element def attributes_of element
rv = [] rv = []
each do |child| each do |child|
child.each do |key,val| child.each do |key,val|
rv << Attribute.new(key,val) rv << Attribute.new(key,val)
end if child.kind_of? AttlistDecl and child.element_name == element end if child.kind_of? AttlistDecl and child.element_name == element
end end
rv rv
end end
def attribute_of element, attribute def attribute_of element, attribute
att_decl = find do |child| att_decl = find do |child|
child.kind_of? AttlistDecl and child.kind_of? AttlistDecl and
child.element_name == element and child.element_name == element and
child.include? attribute child.include? attribute
end end
return nil unless att_decl return nil unless att_decl
att_decl[attribute] att_decl[attribute]
end end
def clone def clone
DocType.new self DocType.new self
end end
# output:: # output::
# Where to write the string # Where to write the string
# indent:: # indent::
# An integer. If -1, no indenting will be used; otherwise, the # An integer. If -1, no indenting will be used; otherwise, the
# indentation will be this number of spaces, and children will be # indentation will be this number of spaces, and children will be
# indented an additional amount. # indented an additional amount.
# transitive:: # transitive::
# If transitive is true and indent is >= 0, then the output will be # If transitive is true and indent is >= 0, then the output will be
# pretty-printed in such a way that the added whitespace does not affect # pretty-printed in such a way that the added whitespace does not affect
# the absolute *value* of the document -- that is, it leaves the value # the absolute *value* of the document -- that is, it leaves the value
# and number of Text nodes in the document unchanged. # and number of Text nodes in the document unchanged.
# ie_hack:: # ie_hack::
# Internet Explorer is the worst piece of crap to have ever been # Internet Explorer is the worst piece of crap to have ever been
# written, with the possible exception of Windows itself. Since IE is # written, with the possible exception of Windows itself. Since IE is
# unable to parse proper XML, we have to provide a hack to generate XML # unable to parse proper XML, we have to provide a hack to generate XML
# that IE's limited abilities can handle. This hack inserts a space # that IE's limited abilities can handle. This hack inserts a space
# before the /> on empty tags. # before the /> on empty tags.
# #
def write( output, indent=0, transitive=false, ie_hack=false ) def write( output, indent=0, transitive=false, ie_hack=false )
indent( output, indent ) indent( output, indent )
output << START output << START
output << ' ' output << ' '
output << @name output << @name
output << " #@external_id" if @external_id output << " #@external_id" if @external_id
output << " #@long_name" if @long_name output << " #@long_name" if @long_name
output << " #@uri" if @uri output << " #@uri" if @uri
unless @children.empty? unless @children.empty?
next_indent = indent + 1 next_indent = indent + 1
output << ' [' output << ' ['
child = nil # speed child = nil # speed
@children.each { |child| @children.each { |child|
output << "\n" output << "\n"
child.write( output, next_indent ) child.write( output, next_indent )
} }
output << "\n" #output << ' '*next_indent
#output << ' '*next_indent output << "\n]"
output << "]" end
end output << STOP
output << STOP end
end
def context def context
@parent.context @parent.context
end end
def entity( name ) def entity( name )
@entities[name].unnormalized if @entities[name] @entities[name].unnormalized if @entities[name]
end end
def add child def add child
super(child) super(child)
@entities = DEFAULT_ENTITIES.clone if @entities == DEFAULT_ENTITIES @entities = DEFAULT_ENTITIES.clone if @entities == DEFAULT_ENTITIES
@entities[ child.name ] = child if child.kind_of? Entity @entities[ child.name ] = child if child.kind_of? Entity
end end
end
# This method retrieves the public identifier identifying the document's
# DTD.
#
# Method contributed by Henrik Martensson
def public
case @external_id
when "SYSTEM"
nil
when "PUBLIC"
strip_quotes(@long_name)
end
end
# This method retrieves the system identifier identifying the document's DTD
#
# Method contributed by Henrik Martensson
def system
case @external_id
when "SYSTEM"
strip_quotes(@long_name)
when "PUBLIC"
@uri.kind_of?(String) ? strip_quotes(@uri) : nil
end
end
# This method returns a list of notations that have been declared in the
# _internal_ DTD subset. Notations in the external DTD subset are not
# listed.
#
# Method contributed by Henrik Martensson
def notations
children().select {|node| node.kind_of?(REXML::NotationDecl)}
end
# Retrieves a named notation. Only notations declared in the internal
# DTD subset can be retrieved.
#
# Method contributed by Henrik Martensson
def notation(name)
notations.find { |notation_decl|
notation_decl.name == name
}
end
private
# Method contributed by Henrik Martensson
def strip_quotes(quoted_string)
quoted_string =~ /^[\'\"].*[\´\"]$/ ?
quoted_string[1, quoted_string.length-2] :
quoted_string
end
end
# We don't really handle any of these since we're not a validating # We don't really handle any of these since we're not a validating
# parser, so we can be pretty dumb about them. All we need to be able # parser, so we can be pretty dumb about them. All we need to be able
# to do is spew them back out on a write() # to do is spew them back out on a write()
# This is an abstract class. You never use this directly; it serves as a # This is an abstract class. You never use this directly; it serves as a
# parent class for the specific declarations. # parent class for the specific declarations.
class Declaration < Child class Declaration < Child
def initialize src def initialize src
super() super()
@string = src @string = src
end end
def to_s def to_s
@string+'>' @string+'>'
end end
def write( output, indent ) def write( output, indent )
output << (' '*indent) if indent > 0 output << (' '*indent) if indent > 0
output << to_s output << to_s
end end
end end
public public
class ElementDecl < Declaration class ElementDecl < Declaration
def initialize( src ) def initialize( src )
super super
end end
end end
class ExternalEntity < Child class ExternalEntity < Child
def initialize( src ) def initialize( src )
super() super()
@entity = src @entity = src
end end
def to_s def to_s
@entity @entity
end end
def write( output, indent ) def write( output, indent )
output << @entity output << @entity
output << "\n" end
end end
end
class NotationDecl < Child class NotationDecl < Child
def initialize name, middle, rest attr_accessor :public, :system
@name = name def initialize name, middle, pub, sys
@middle = middle super(nil)
@rest = rest @name = name
end @middle = middle
@public = pub
@system = sys
end
def to_s def to_s
"<!NOTATION #@name '#@middle #@rest'>" "<!NOTATION #@name #@middle#{
end @public ? ' ' + public.inspect : ''
}#{
@system ? ' ' +@system.inspect : ''
}>"
end
def write( output, indent=-1 ) def write( output, indent=-1 )
output << (' '*indent) if indent > 0 output << (' '*indent) if indent > 0
output << to_s output << to_s
end end
end
# This method retrieves the name of the notation.
#
# Method contributed by Henrik Martensson
def name
@name
end
end
end end

View file

@ -16,166 +16,178 @@ module REXML
# Document has a single child that can be accessed by root(). # Document has a single child that can be accessed by root().
# Note that if you want to have an XML declaration written for a document # Note that if you want to have an XML declaration written for a document
# you create, you must add one; REXML documents do not write a default # you create, you must add one; REXML documents do not write a default
# declaration for you. See |DECLARATION| and |write|. # declaration for you. See |DECLARATION| and |write|.
class Document < Element class Document < Element
# A convenient default XML declaration. If you want an XML declaration, # A convenient default XML declaration. If you want an XML declaration,
# the easiest way to add one is mydoc << Document::DECLARATION # the easiest way to add one is mydoc << Document::DECLARATION
# +DEPRECATED+ # +DEPRECATED+
# Use: mydoc << XMLDecl.default # Use: mydoc << XMLDecl.default
DECLARATION = XMLDecl.default DECLARATION = XMLDecl.default
# Constructor # Constructor
# @param source if supplied, must be a Document, String, or IO. # @param source if supplied, must be a Document, String, or IO.
# Documents have their context and Element attributes cloned. # Documents have their context and Element attributes cloned.
# Strings are expected to be valid XML documents. IOs are expected # Strings are expected to be valid XML documents. IOs are expected
# to be sources of valid XML documents. # to be sources of valid XML documents.
# @param context if supplied, contains the context of the document; # @param context if supplied, contains the context of the document;
# this should be a Hash. # this should be a Hash.
# NOTE that I'm not sure what the context is for; I cloned it out of # NOTE that I'm not sure what the context is for; I cloned it out of
# the Electric XML API (in which it also seems to do nothing), and it # the Electric XML API (in which it also seems to do nothing), and it
# is now legacy. It may do something, someday... it may disappear. # is now legacy. It may do something, someday... it may disappear.
def initialize( source = nil, context = {} ) def initialize( source = nil, context = {} )
super() super()
@context = context @context = context
return if source.nil? return if source.nil?
if source.kind_of? Document if source.kind_of? Document
@context = source.context @context = source.context
super source super source
else else
build( source ) build( source )
end end
end end
def node_type def node_type
:document :document
end end
# Should be obvious # Should be obvious
def clone def clone
Document.new self Document.new self
end end
# According to the XML spec, a root node has no expanded name # According to the XML spec, a root node has no expanded name
def expanded_name def expanded_name
'' ''
#d = doc_type #d = doc_type
#d ? d.name : "UNDEFINED" #d ? d.name : "UNDEFINED"
end end
alias :name :expanded_name alias :name :expanded_name
# We override this, because XMLDecls and DocTypes must go at the start # We override this, because XMLDecls and DocTypes must go at the start
# of the document # of the document
def add( child ) def add( child )
if child.kind_of? XMLDecl if child.kind_of? XMLDecl
@children.unshift child @children.unshift child
elsif child.kind_of? DocType elsif child.kind_of? DocType
if @children[0].kind_of? XMLDecl # Find first Element or DocType node and insert the decl right
@children[1,0] = child # before it. If there is no such node, just insert the child at the
else # end. If there is a child and it is an DocType, then replace it.
@children.unshift child insert_before_index = 0
@children.find { |x|
insert_before_index += 1
x.kind_of?(Element) || x.kind_of?(DocType)
}
if @children[ insert_before_index ] # Not null = not end of list
if @children[ insert_before_index ].kind_of DocType
@children[ insert_before_index ] = child
else
@children[ index_before_index-1, 0 ] = child
end
else # Insert at end of list
@children[insert_before_index] = child
end end
child.parent = self child.parent = self
else else
rv = super rv = super
raise "attempted adding second root element to document" if @elements.size > 1 raise "attempted adding second root element to document" if @elements.size > 1
rv rv
end end
end end
alias :<< :add alias :<< :add
def add_element(arg=nil, arg2=nil) def add_element(arg=nil, arg2=nil)
rv = super rv = super
raise "attempted adding second root element to document" if @elements.size > 1 raise "attempted adding second root element to document" if @elements.size > 1
rv rv
end end
# @return the root Element of the document, or nil if this document # @return the root Element of the document, or nil if this document
# has no children. # has no children.
def root def root
elements[1] elements[1]
#self #self
#@children.find { |item| item.kind_of? Element } #@children.find { |item| item.kind_of? Element }
end end
# @return the DocType child of the document, if one exists, # @return the DocType child of the document, if one exists,
# and nil otherwise. # and nil otherwise.
def doctype def doctype
@children.find { |item| item.kind_of? DocType } @children.find { |item| item.kind_of? DocType }
end end
# @return the XMLDecl of this document; if no XMLDecl has been # @return the XMLDecl of this document; if no XMLDecl has been
# set, the default declaration is returned. # set, the default declaration is returned.
def xml_decl def xml_decl
rv = @children[0] rv = @children[0]
return rv if rv.kind_of? XMLDecl return rv if rv.kind_of? XMLDecl
rv = @children.unshift(XMLDecl.default)[0] rv = @children.unshift(XMLDecl.default)[0]
end end
# @return the XMLDecl version of this document as a String. # @return the XMLDecl version of this document as a String.
# If no XMLDecl has been set, returns the default version. # If no XMLDecl has been set, returns the default version.
def version def version
xml_decl().version xml_decl().version
end end
# @return the XMLDecl encoding of this document as a String. # @return the XMLDecl encoding of this document as a String.
# If no XMLDecl has been set, returns the default encoding. # If no XMLDecl has been set, returns the default encoding.
def encoding def encoding
xml_decl().encoding xml_decl().encoding
end end
# @return the XMLDecl standalone value of this document as a String. # @return the XMLDecl standalone value of this document as a String.
# If no XMLDecl has been set, returns the default setting. # If no XMLDecl has been set, returns the default setting.
def stand_alone? def stand_alone?
xml_decl().stand_alone? xml_decl().stand_alone?
end end
# Write the XML tree out, optionally with indent. This writes out the # Write the XML tree out, optionally with indent. This writes out the
# entire XML document, including XML declarations, doctype declarations, # entire XML document, including XML declarations, doctype declarations,
# and processing instructions (if any are given). # and processing instructions (if any are given).
# A controversial point is whether Document should always write the XML # A controversial point is whether Document should always write the XML
# declaration (<?xml version='1.0'?>) whether or not one is given by the # declaration (<?xml version='1.0'?>) whether or not one is given by the
# user (or source document). REXML does not write one if one was not # user (or source document). REXML does not write one if one was not
# specified, because it adds unneccessary bandwidth to applications such # specified, because it adds unneccessary bandwidth to applications such
# as XML-RPC. # as XML-RPC.
# #
# #
# output:: # output::
# output an object which supports '<< string'; this is where the # output an object which supports '<< string'; this is where the
# document will be written. # document will be written.
# indent:: # indent::
# An integer. If -1, no indenting will be used; otherwise, the # An integer. If -1, no indenting will be used; otherwise, the
# indentation will be this number of spaces, and children will be # indentation will be this number of spaces, and children will be
# indented an additional amount. Defaults to -1 # indented an additional amount. Defaults to -1
# transitive:: # transitive::
# If transitive is true and indent is >= 0, then the output will be # If transitive is true and indent is >= 0, then the output will be
# pretty-printed in such a way that the added whitespace does not affect # pretty-printed in such a way that the added whitespace does not affect
# the absolute *value* of the document -- that is, it leaves the value # the absolute *value* of the document -- that is, it leaves the value
# and number of Text nodes in the document unchanged. # and number of Text nodes in the document unchanged.
# ie_hack:: # ie_hack::
# Internet Explorer is the worst piece of crap to have ever been # Internet Explorer is the worst piece of crap to have ever been
# written, with the possible exception of Windows itself. Since IE is # written, with the possible exception of Windows itself. Since IE is
# unable to parse proper XML, we have to provide a hack to generate XML # unable to parse proper XML, we have to provide a hack to generate XML
# that IE's limited abilities can handle. This hack inserts a space # that IE's limited abilities can handle. This hack inserts a space
# before the /> on empty tags. Defaults to false # before the /> on empty tags. Defaults to false
def write( output=$stdout, indent_level=-1, transitive=false, ie_hack=false ) def write( output=$stdout, indent=-1, transitive=false, ie_hack=false )
output = Output.new( output, xml_decl.encoding ) if xml_decl.encoding != "UTF-8" && !output.kind_of?(Output) output = Output.new( output, xml_decl.encoding ) if xml_decl.encoding != "UTF-8" && !output.kind_of?(Output)
@children.each { |node| @children.each { |node|
indent( output, indent_level ) if node.node_type == :element indent( output, indent ) if node.node_type == :element
if node.write( output, indent_level, transitive, ie_hack ) if node.write( output, indent, transitive, ie_hack )
output << "\n" unless indent_level<0 or node == @children[-1] output << "\n" unless indent<0 or node == @children[-1]
end end
} }
end end
def Document::parse_stream( source, listener ) def Document::parse_stream( source, listener )
Parsers::StreamParser.new( source, listener ).parse Parsers::StreamParser.new( source, listener ).parse
end end
private private
def build( source ) def build( source )
Parsers::TreeParser.new( source, self ).parse Parsers::TreeParser.new( source, self ).parse
end end
end end
end end

View file

@ -36,8 +36,6 @@ module REXML
# If an Element, the object will be shallowly cloned; name, # If an Element, the object will be shallowly cloned; name,
# attributes, and namespaces will be copied. Children will +not+ be # attributes, and namespaces will be copied. Children will +not+ be
# copied. # copied.
# If a Source, the source will be scanned and parsed for an Element,
# and all child elements will be recursively parsed as well.
# parent:: # parent::
# if supplied, must be a Parent, and will be used as # if supplied, must be a Parent, and will be used as
# the parent of this object. # the parent of this object.
@ -223,7 +221,7 @@ module REXML
# b.namespace("y") # -> '2' # b.namespace("y") # -> '2'
def namespace(prefix=nil) def namespace(prefix=nil)
if prefix.nil? if prefix.nil?
prefix = self.prefix() prefix = prefix()
end end
if prefix == '' if prefix == ''
prefix = "xmlns" prefix = "xmlns"
@ -715,7 +713,7 @@ module REXML
private private
def __to_xpath_helper node def __to_xpath_helper node
rv = node.expanded_name rv = node.expanded_name.clone
if node.parent if node.parent
results = node.parent.find_all {|n| results = node.parent.find_all {|n|
n.kind_of?(REXML::Element) and n.expanded_name == node.expanded_name n.kind_of?(REXML::Element) and n.expanded_name == node.expanded_name
@ -1226,5 +1224,20 @@ module REXML
rv.each{ |attr| attr.remove } rv.each{ |attr| attr.remove }
return rv return rv
end end
# The +get_attribute_ns+ method retrieves a method by its namespace
# and name. Thus it is possible to reliably identify an attribute
# even if an XML processor has changed the prefix.
#
# Method contributed by Henrik Martensson
def get_attribute_ns(namespace, name)
each_attribute() { |attribute|
if name == attribute.name &&
namespace == attribute.namespace()
return attribute
end
}
nil
end
end end
end end

View file

@ -1,58 +1,64 @@
# -*- mode: ruby; ruby-indent-level: 2; indent-tabs-mode: t; tab-width: 2 -*- vim: sw=2 ts=2 # -*- mode: ruby; ruby-indent-level: 2; indent-tabs-mode: t; tab-width: 2 -*- vim: sw=2 ts=2
module REXML module REXML
module Encoding module Encoding
@encoding_methods = {} @encoding_methods = {}
def self.register(enc, &block) def self.register(enc, &block)
@encoding_methods[enc] = block @encoding_methods[enc] = block
end end
def self.apply(obj, enc) def self.apply(obj, enc)
@encoding_methods[enc][obj] @encoding_methods[enc][obj]
end end
def self.encoding_method(enc) def self.encoding_method(enc)
@encoding_methods[enc] @encoding_methods[enc]
end end
# Native, default format is UTF-8, so it is declared here rather than in # Native, default format is UTF-8, so it is declared here rather than in
# an encodings/ definition. # an encodings/ definition.
UTF_8 = 'UTF-8' UTF_8 = 'UTF-8'
UTF_16 = 'UTF-16' UTF_16 = 'UTF-16'
UNILE = 'UNILE' UNILE = 'UNILE'
# ID ---> Encoding name # ID ---> Encoding name
attr_reader :encoding attr_reader :encoding
def encoding=( enc ) def encoding=( enc )
old_verbosity = $VERBOSE old_verbosity = $VERBOSE
begin begin
$VERBOSE = false $VERBOSE = false
return if defined? @encoding and enc == @encoding return if defined? @encoding and enc == @encoding
if enc if enc and enc != UTF_8
raise ArgumentError, "Bad encoding name #{enc}" unless /\A[\w-]+\z/n =~ enc @encoding = enc.upcase
@encoding = enc.upcase.untaint begin
else require 'rexml/encodings/ICONV.rb'
@encoding = UTF_8 Encoding.apply(self, "ICONV")
end rescue LoadError, Exception => err
err = nil raise ArgumentError, "Bad encoding name #@encoding" unless @encoding =~ /^[\w-]+$/
[@encoding, "ICONV"].each do |enc| @encoding.untaint
begin enc_file = File.join( "rexml", "encodings", "#@encoding.rb" )
require File.join("rexml", "encodings", "#{enc}.rb") begin
return Encoding.apply(self, enc) require enc_file
rescue LoadError, Exception => err Encoding.apply(self, @encoding)
end rescue LoadError
end puts $!.message
puts err.message raise ArgumentError, "No decoder found for encoding #@encoding. Please install iconv."
raise ArgumentError, "No decoder found for encoding #@encoding. Please install iconv." end
ensure end
$VERBOSE = old_verbosity else
end @encoding = UTF_8
end require 'rexml/encodings/UTF-8.rb'
Encoding.apply(self, @encoding)
end
ensure
$VERBOSE = old_verbosity
end
end
def check_encoding str def check_encoding str
# We have to recognize UTF-16, LSB UTF-16, and UTF-8 # We have to recognize UTF-16, LSB UTF-16, and UTF-8
return UTF_16 if str[0] == 254 && str[1] == 255 return UTF_16 if str[0] == 254 && str[1] == 255
return UNILE if str[0] == 255 && str[1] == 254 return UNILE if str[0] == 255 && str[1] == 254
str =~ /^\s*<?xml\s*version=(['"]).*?\2\s*encoding=(["'])(.*?)\2/um str =~ /^\s*<?xml\s*version=(['"]).*?\2\s*encoding=(["'])(.*?)\2/um
return $1.upcase if $1 return $1.upcase if $1
return UTF_8 return UTF_8
end end
end end
end end

View file

@ -67,11 +67,10 @@ module REXML
if node_set == nil if node_set == nil
yield @@context[:node] if defined? @@context[:node].namespace yield @@context[:node] if defined? @@context[:node].namespace
else else
if node_set.namespace if node_set.respond_to? :each
yield node_set
else
return unless node_set.kind_of? Enumerable
node_set.each { |node| yield node if defined? node.namespace } node_set.each { |node| yield node if defined? node.namespace }
elsif node_set.respond_to? :namespace
yield node_set
end end
end end
end end
@ -157,12 +156,9 @@ module REXML
# Kouhei fixed this too # Kouhei fixed this too
def Functions::substring_after( string, test ) def Functions::substring_after( string, test )
ruby_string = string(string) ruby_string = string(string)
ruby_index = ruby_string.index(string(test)) test_string = string(test)
if ruby_index.nil? return $1 if ruby_string =~ /#{test}(.*)/
"" ""
else
ruby_string[ ruby_index+1..-1 ]
end
end end
# Take equal portions of Mike Stok and Sean Russell; mix # Take equal portions of Mike Stok and Sean Russell; mix
@ -339,6 +335,8 @@ module REXML
end end
def Functions::sum( nodes ) def Functions::sum( nodes )
nodes = [nodes] unless nodes.kind_of? Array
nodes.inject(0) { |r,n| r += number(string(n)) }
end end
def Functions::floor( number ) def Functions::floor( number )

View file

@ -38,8 +38,8 @@ module REXML
Instruction.new self Instruction.new self
end end
def write writer, indent_level=-1, transitive=false, ie_hack=false def write writer, indent=-1, transitive=false, ie_hack=false
indent(writer, indent_level) indent(writer, indent)
writer << START.sub(/\\/u, '') writer << START.sub(/\\/u, '')
writer << @target writer << @target
writer << ' ' writer << ' '

View file

@ -1,165 +1,166 @@
require "rexml/child" require "rexml/child"
module REXML module REXML
# A parent has children, and has methods for accessing them. The Parent # A parent has children, and has methods for accessing them. The Parent
# class is never encountered except as the superclass for some other # class is never encountered except as the superclass for some other
# object. # object.
class Parent < Child class Parent < Child
include Enumerable include Enumerable
# Constructor # Constructor
# @param parent if supplied, will be set as the parent of this object # @param parent if supplied, will be set as the parent of this object
def initialize parent=nil def initialize parent=nil
super(parent) super(parent)
@children = [] @children = []
end end
def add( object ) def add( object )
#puts "PARENT GOTS #{size} CHILDREN" #puts "PARENT GOTS #{size} CHILDREN"
object.parent = self object.parent = self
@children << object @children << object
#puts "PARENT NOW GOTS #{size} CHILDREN" #puts "PARENT NOW GOTS #{size} CHILDREN"
object object
end end
alias :push :add alias :push :add
alias :<< :push alias :<< :push
def unshift( object ) def unshift( object )
object.parent = self object.parent = self
@children.unshift object @children.unshift object
end end
def delete( object ) def delete( object )
return unless @children.include? object found = false
@children.delete object @children.delete_if {|c| c.equal?(object) and found = true }
object.parent = nil object.parent = nil if found
end end
def each(&block) def each(&block)
@children.each(&block) @children.each(&block)
end end
def delete_if( &block ) def delete_if( &block )
@children.delete_if(&block) @children.delete_if(&block)
end end
def delete_at( index ) def delete_at( index )
@children.delete_at index @children.delete_at index
end end
def each_index( &block ) def each_index( &block )
@children.each_index(&block) @children.each_index(&block)
end end
# Fetches a child at a given index # Fetches a child at a given index
# @param index the Integer index of the child to fetch # @param index the Integer index of the child to fetch
def []( index ) def []( index )
@children[index] @children[index]
end end
alias :each_child :each alias :each_child :each
# Set an index entry. See Array.[]= # Set an index entry. See Array.[]=
# @param index the index of the element to set # @param index the index of the element to set
# @param opt either the object to set, or an Integer length # @param opt either the object to set, or an Integer length
# @param child if opt is an Integer, this is the child to set # @param child if opt is an Integer, this is the child to set
# @return the parent (self) # @return the parent (self)
def []=( *args ) def []=( *args )
args[-1].parent = self args[-1].parent = self
@children[*args[0..-2]] = args[-1] @children[*args[0..-2]] = args[-1]
end end
# Inserts an child before another child # Inserts an child before another child
# @param child1 this is either an xpath or an Element. If an Element, # @param child1 this is either an xpath or an Element. If an Element,
# child2 will be inserted before child1 in the child list of the parent. # child2 will be inserted before child1 in the child list of the parent.
# If an xpath, child2 will be inserted before the first child to match # If an xpath, child2 will be inserted before the first child to match
# the xpath. # the xpath.
# @param child2 the child to insert # @param child2 the child to insert
# @return the parent (self) # @return the parent (self)
def insert_before( child1, child2 ) def insert_before( child1, child2 )
if child1.kind_of? String if child1.kind_of? String
child1 = XPath.first( self, child1 ) child1 = XPath.first( self, child1 )
child1.parent.insert_before child1, child2 child1.parent.insert_before child1, child2
else else
ind = index(child1) ind = index(child1)
child2.parent.delete(child2) if child2.parent child2.parent.delete(child2) if child2.parent
@children[ind,0] = child2 @children[ind,0] = child2
child2.parent = self child2.parent = self
end end
self self
end end
# Inserts an child after another child # Inserts an child after another child
# @param child1 this is either an xpath or an Element. If an Element, # @param child1 this is either an xpath or an Element. If an Element,
# child2 will be inserted after child1 in the child list of the parent. # child2 will be inserted after child1 in the child list of the parent.
# If an xpath, child2 will be inserted after the first child to match # If an xpath, child2 will be inserted after the first child to match
# the xpath. # the xpath.
# @param child2 the child to insert # @param child2 the child to insert
# @return the parent (self) # @return the parent (self)
def insert_after( child1, child2 ) def insert_after( child1, child2 )
if child1.kind_of? String if child1.kind_of? String
child1 = XPath.first( self, child1 ) child1 = XPath.first( self, child1 )
child1.parent.insert_after child1, child2 child1.parent.insert_after child1, child2
else else
ind = index(child1)+1 ind = index(child1)+1
child2.parent.delete(child2) if child2.parent child2.parent.delete(child2) if child2.parent
@children[ind,0] = child2 @children[ind,0] = child2
child2.parent = self child2.parent = self
end end
self self
end end
def to_a def to_a
@children.dup @children.dup
end end
# Fetches the index of a given child # Fetches the index of a given child
# @param child the child to get the index of # @param child the child to get the index of
# @return the index of the child, or nil if the object is not a child # @return the index of the child, or nil if the object is not a child
# of this parent. # of this parent.
def index( child ) def index( child )
count = -1 count = -1
@children.find { |i| count += 1 ; i.hash == child.hash } @children.find { |i| count += 1 ; i.hash == child.hash }
count count
end end
# @return the number of children of this parent # @return the number of children of this parent
def size def size
@children.size @children.size
end end
# Replaces one child with another, making sure the nodelist is correct alias :length :size
# @param to_replace the child to replace (must be a Child)
# @param replacement the child to insert into the nodelist (must be a # Replaces one child with another, making sure the nodelist is correct
# Child) # @param to_replace the child to replace (must be a Child)
def replace_child( to_replace, replacement ) # @param replacement the child to insert into the nodelist (must be a
ind = @children.index( to_replace ) # Child)
to_replace.parent = nil def replace_child( to_replace, replacement )
@children[ind,0] = replacement @children.map! {|c| c.equal?( to_replace ) ? replacement : c }
replacement.parent = self to_replace.parent = nil
end replacement.parent = self
end
# Deeply clones this object. This creates a complete duplicate of this
# Parent, including all descendants. # Deeply clones this object. This creates a complete duplicate of this
def deep_clone # Parent, including all descendants.
cl = clone() def deep_clone
each do |child| cl = clone()
if child.kind_of? Parent each do |child|
cl << child.deep_clone if child.kind_of? Parent
else cl << child.deep_clone
cl << child.clone else
end cl << child.clone
end end
cl end
end cl
end
alias :children :to_a
alias :children :to_a
def parent?
true def parent?
end true
end end
end
end end

View file

@ -2,103 +2,103 @@ require 'rexml/parseexception'
require 'rexml/source' require 'rexml/source'
module REXML module REXML
module Parsers module Parsers
# = Using the Pull Parser # = Using the Pull Parser
# <em>This API is experimental, and subject to change.</em> # <em>This API is experimental, and subject to change.</em>
# parser = PullParser.new( "<a>text<b att='val'/>txet</a>" ) # parser = PullParser.new( "<a>text<b att='val'/>txet</a>" )
# while parser.has_next? # while parser.has_next?
# res = parser.next # res = parser.next
# puts res[1]['att'] if res.start_tag? and res[0] == 'b' # puts res[1]['att'] if res.start_tag? and res[0] == 'b'
# end # end
# See the PullEvent class for information on the content of the results. # See the PullEvent class for information on the content of the results.
# The data is identical to the arguments passed for the various events to # The data is identical to the arguments passed for the various events to
# the StreamListener API. # the StreamListener API.
# #
# Notice that: # Notice that:
# parser = PullParser.new( "<a>BAD DOCUMENT" ) # parser = PullParser.new( "<a>BAD DOCUMENT" )
# while parser.has_next? # while parser.has_next?
# res = parser.next # res = parser.next
# raise res[1] if res.error? # raise res[1] if res.error?
# end # end
# #
# Nat Price gave me some good ideas for the API. # Nat Price gave me some good ideas for the API.
class BaseParser class BaseParser
NCNAME_STR= '[\w:][\-\w\d.]*' NCNAME_STR= '[\w:][\-\w\d.]*'
NAME_STR= "(?:#{NCNAME_STR}:)?#{NCNAME_STR}" NAME_STR= "(?:#{NCNAME_STR}:)?#{NCNAME_STR}"
NAMECHAR = '[\-\w\d\.:]' NAMECHAR = '[\-\w\d\.:]'
NAME = "([\\w:]#{NAMECHAR}*)" NAME = "([\\w:]#{NAMECHAR}*)"
NMTOKEN = "(?:#{NAMECHAR})+" NMTOKEN = "(?:#{NAMECHAR})+"
NMTOKENS = "#{NMTOKEN}(\\s+#{NMTOKEN})*" NMTOKENS = "#{NMTOKEN}(\\s+#{NMTOKEN})*"
REFERENCE = "(?:&#{NAME};|&#\\d+;|&#x[0-9a-fA-F]+;)" REFERENCE = "(?:&#{NAME};|&#\\d+;|&#x[0-9a-fA-F]+;)"
REFERENCE_RE = /#{REFERENCE}/ REFERENCE_RE = /#{REFERENCE}/
DOCTYPE_START = /\A\s*<!DOCTYPE\s/um DOCTYPE_START = /\A\s*<!DOCTYPE\s/um
DOCTYPE_PATTERN = /\s*<!DOCTYPE\s+(.*?)(\[|>)/um DOCTYPE_PATTERN = /\s*<!DOCTYPE\s+(.*?)(\[|>)/um
ATTRIBUTE_PATTERN = /\s*(#{NAME_STR})\s*=\s*(["'])(.*?)\2/um ATTRIBUTE_PATTERN = /\s*(#{NAME_STR})\s*=\s*(["'])(.*?)\2/um
COMMENT_START = /\A<!--/u COMMENT_START = /\A<!--/u
COMMENT_PATTERN = /<!--(.*?)-->/um COMMENT_PATTERN = /<!--(.*?)-->/um
CDATA_START = /\A<!\[CDATA\[/u CDATA_START = /\A<!\[CDATA\[/u
CDATA_END = /^\s*\]\s*>/um CDATA_END = /^\s*\]\s*>/um
CDATA_PATTERN = /<!\[CDATA\[(.*?)\]\]>/um CDATA_PATTERN = /<!\[CDATA\[(.*?)\]\]>/um
XMLDECL_START = /\A<\?xml\s/u; XMLDECL_START = /\A<\?xml\s/u;
XMLDECL_PATTERN = /<\?xml\s+(.*?)\?>*/um XMLDECL_PATTERN = /<\?xml\s+(.*?)\?>/um
INSTRUCTION_START = /\A<\?/u INSTRUCTION_START = /\A<\?/u
INSTRUCTION_PATTERN = /<\?(.*?)(\s+.*?)?\?>/um INSTRUCTION_PATTERN = /<\?(.*?)(\s+.*?)?\?>/um
TAG_MATCH = /^<((?>#{NAME_STR}))\s*((?>\s+#{NAME_STR}\s*=\s*(["']).*?\3)*)\s*(\/)?>/um TAG_MATCH = /^<((?>#{NAME_STR}))\s*((?>\s+#{NAME_STR}\s*=\s*(["']).*?\3)*)\s*(\/)?>/um
CLOSE_MATCH = /^\s*<\/(#{NAME_STR})\s*>/um CLOSE_MATCH = /^\s*<\/(#{NAME_STR})\s*>/um
VERSION = /\bversion\s*=\s*["'](.*?)['"]/um VERSION = /\bversion\s*=\s*["'](.*?)['"]/um
ENCODING = /\bencoding=["'](.*?)['"]/um ENCODING = /\bencoding=["'](.*?)['"]/um
STANDALONE = /\bstandalone=["'](.*?)['"]/um STANDALONE = /\bstandalone=["'](.*?)['"]/um
ENTITY_START = /^\s*<!ENTITY/ ENTITY_START = /^\s*<!ENTITY/
IDENTITY = /^([!\*\w\-]+)(\s+#{NCNAME_STR})?(\s+["'].*?['"])?(\s+['"].*?["'])?/u IDENTITY = /^([!\*\w\-]+)(\s+#{NCNAME_STR})?(\s+["'].*?['"])?(\s+['"].*?["'])?/u
ELEMENTDECL_START = /^\s*<!ELEMENT/um ELEMENTDECL_START = /^\s*<!ELEMENT/um
ELEMENTDECL_PATTERN = /^\s*(<!ELEMENT.*?)>/um ELEMENTDECL_PATTERN = /^\s*(<!ELEMENT.*?)>/um
SYSTEMENTITY = /^\s*(%.*?;)\s*$/um SYSTEMENTITY = /^\s*(%.*?;)\s*$/um
ENUMERATION = "\\(\\s*#{NMTOKEN}(?:\\s*\\|\\s*#{NMTOKEN})*\\s*\\)" ENUMERATION = "\\(\\s*#{NMTOKEN}(?:\\s*\\|\\s*#{NMTOKEN})*\\s*\\)"
NOTATIONTYPE = "NOTATION\\s+\\(\\s*#{NAME}(?:\\s*\\|\\s*#{NAME})*\\s*\\)" NOTATIONTYPE = "NOTATION\\s+\\(\\s*#{NAME}(?:\\s*\\|\\s*#{NAME})*\\s*\\)"
ENUMERATEDTYPE = "(?:(?:#{NOTATIONTYPE})|(?:#{ENUMERATION}))" ENUMERATEDTYPE = "(?:(?:#{NOTATIONTYPE})|(?:#{ENUMERATION}))"
ATTTYPE = "(CDATA|ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS|#{ENUMERATEDTYPE})" ATTTYPE = "(CDATA|ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS|#{ENUMERATEDTYPE})"
ATTVALUE = "(?:\"((?:[^<&\"]|#{REFERENCE})*)\")|(?:'((?:[^<&']|#{REFERENCE})*)')" ATTVALUE = "(?:\"((?:[^<&\"]|#{REFERENCE})*)\")|(?:'((?:[^<&']|#{REFERENCE})*)')"
DEFAULTDECL = "(#REQUIRED|#IMPLIED|(?:(#FIXED\\s+)?#{ATTVALUE}))" DEFAULTDECL = "(#REQUIRED|#IMPLIED|(?:(#FIXED\\s+)?#{ATTVALUE}))"
ATTDEF = "\\s+#{NAME}\\s+#{ATTTYPE}\\s+#{DEFAULTDECL}" ATTDEF = "\\s+#{NAME}\\s+#{ATTTYPE}\\s+#{DEFAULTDECL}"
ATTDEF_RE = /#{ATTDEF}/ ATTDEF_RE = /#{ATTDEF}/
ATTLISTDECL_START = /^\s*<!ATTLIST/um ATTLISTDECL_START = /^\s*<!ATTLIST/um
ATTLISTDECL_PATTERN = /^\s*<!ATTLIST\s+#{NAME}(?:#{ATTDEF})*\s*>/um ATTLISTDECL_PATTERN = /^\s*<!ATTLIST\s+#{NAME}(?:#{ATTDEF})*\s*>/um
NOTATIONDECL_START = /^\s*<!NOTATION/um NOTATIONDECL_START = /^\s*<!NOTATION/um
PUBLIC = /^\s*<!NOTATION\s+(\w[\-\w]*)\s+(PUBLIC)\s+((["']).*?\4)\s*>/um PUBLIC = /^\s*<!NOTATION\s+(\w[\-\w]*)\s+(PUBLIC)\s+(["'])(.*?)\3(?:\s+(["'])(.*?)\5)?\s*>/um
SYSTEM = /^\s*<!NOTATION\s+(\w[\-\w]*)\s+(SYSTEM)\s+((["']).*?\4)\s*>/um SYSTEM = /^\s*<!NOTATION\s+(\w[\-\w]*)\s+(SYSTEM)\s+(["'])(.*?)\3\s*>/um
TEXT_PATTERN = /\A([^<]*)/um TEXT_PATTERN = /\A([^<]*)/um
# Entity constants # Entity constants
PUBIDCHAR = "\x20\x0D\x0Aa-zA-Z0-9\\-()+,./:=?;!*@$_%#" PUBIDCHAR = "\x20\x0D\x0Aa-zA-Z0-9\\-()+,./:=?;!*@$_%#"
SYSTEMLITERAL = %Q{((?:"[^"]*")|(?:'[^']*'))} SYSTEMLITERAL = %Q{((?:"[^"]*")|(?:'[^']*'))}
PUBIDLITERAL = %Q{("[#{PUBIDCHAR}']*"|'[#{PUBIDCHAR}]*')} PUBIDLITERAL = %Q{("[#{PUBIDCHAR}']*"|'[#{PUBIDCHAR}]*')}
EXTERNALID = "(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))" EXTERNALID = "(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"
NDATADECL = "\\s+NDATA\\s+#{NAME}" NDATADECL = "\\s+NDATA\\s+#{NAME}"
PEREFERENCE = "%#{NAME};" PEREFERENCE = "%#{NAME};"
ENTITYVALUE = %Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))} ENTITYVALUE = %Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}
PEDEF = "(?:#{ENTITYVALUE}|#{EXTERNALID})" PEDEF = "(?:#{ENTITYVALUE}|#{EXTERNALID})"
ENTITYDEF = "(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))" ENTITYDEF = "(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"
PEDECL = "<!ENTITY\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>" PEDECL = "<!ENTITY\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
GEDECL = "<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>" GEDECL = "<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
ENTITYDECL = /\s*(?:#{GEDECL})|(?:#{PEDECL})/um ENTITYDECL = /\s*(?:#{GEDECL})|(?:#{PEDECL})/um
EREFERENCE = /&(?!#{NAME};)/ EREFERENCE = /&(?!#{NAME};)/
DEFAULT_ENTITIES = { DEFAULT_ENTITIES = {
'gt' => [/&gt;/, '&gt;', '>', />/], 'gt' => [/&gt;/, '&gt;', '>', />/],
'lt' => [/&lt;/, '&lt;', '<', /</], 'lt' => [/&lt;/, '&lt;', '<', /</],
'quot' => [/&quot;/, '&quot;', '"', /"/], 'quot' => [/&quot;/, '&quot;', '"', /"/],
"apos" => [/&apos;/, "&apos;", "'", /'/] "apos" => [/&apos;/, "&apos;", "'", /'/]
} }
def initialize( source ) def initialize( source )
self.stream = source self.stream = source
end end
def add_listener( listener ) def add_listener( listener )
if !defined?(@listeners) or !@listeners if !defined?(@listeners) or !@listeners
@ -119,315 +119,320 @@ module REXML
attr_reader :source attr_reader :source
def stream=( source ) def stream=( source )
if source.kind_of? String @source = SourceFactory.create_from( source )
@source = Source.new(source) @closed = nil
elsif source.kind_of? IO @document_status = nil
@source = IOSource.new(source) @tags = []
elsif source.kind_of? Source @stack = []
@source = source @entities = []
elsif defined? StringIO and source.kind_of? StringIO end
@source = IOSource.new(source)
else
raise "#{source.class} is not a valid input stream. It must be \n"+
"either a String, IO, StringIO or Source."
end
@closed = nil
@document_status = nil
@tags = []
@stack = []
@entities = []
end
# Returns true if there are no more events def position
def empty? if @source.respond_to? :position
#puts "@source.empty? = #{@source.empty?}" @source.position
#puts "@stack.empty? = #{@stack.empty?}" else
# FIXME
0
end
end
# Returns true if there are no more events
def empty?
#STDERR.puts "@source.empty? = #{@source.empty?}"
#STDERR.puts "@stack.empty? = #{@stack.empty?}"
return (@source.empty? and @stack.empty?) return (@source.empty? and @stack.empty?)
end end
# Returns true if there are more events. Synonymous with !empty? # Returns true if there are more events. Synonymous with !empty?
def has_next? def has_next?
return !(@source.empty? and @stack.empty?) return !(@source.empty? and @stack.empty?)
end end
# Push an event back on the head of the stream. This method # Push an event back on the head of the stream. This method
# has (theoretically) infinite depth. # has (theoretically) infinite depth.
def unshift token def unshift token
@stack.unshift(token) @stack.unshift(token)
end end
# Peek at the +depth+ event in the stack. The first element on the stack # Peek at the +depth+ event in the stack. The first element on the stack
# is at depth 0. If +depth+ is -1, will parse to the end of the input # is at depth 0. If +depth+ is -1, will parse to the end of the input
# stream and return the last event, which is always :end_document. # stream and return the last event, which is always :end_document.
# Be aware that this causes the stream to be parsed up to the +depth+ # Be aware that this causes the stream to be parsed up to the +depth+
# event, so you can effectively pre-parse the entire document (pull the # event, so you can effectively pre-parse the entire document (pull the
# entire thing into memory) using this method. # entire thing into memory) using this method.
def peek depth=0 def peek depth=0
raise %Q[Illegal argument "#{depth}"] if depth < -1 raise %Q[Illegal argument "#{depth}"] if depth < -1
temp = [] temp = []
if depth == -1 if depth == -1
temp.push(pull()) until empty? temp.push(pull()) until empty?
else else
while @stack.size+temp.size < depth+1 while @stack.size+temp.size < depth+1
temp.push(pull()) temp.push(pull())
end end
end end
@stack += temp if temp.size > 0 @stack += temp if temp.size > 0
@stack[depth] @stack[depth]
end end
# Returns the next event. This is a +PullEvent+ object. # Returns the next event. This is a +PullEvent+ object.
def pull def pull
if @closed if @closed
x, @closed = @closed, nil x, @closed = @closed, nil
return [ :end_element, x ] return [ :end_element, x ]
end end
return [ :end_document ] if empty? return [ :end_document ] if empty?
return @stack.shift if @stack.size > 0 return @stack.shift if @stack.size > 0
@source.read if @source.buffer.size<2 @source.read if @source.buffer.size<2
if @document_status == nil #STDERR.puts "BUFFER = #{@source.buffer.inspect}"
@source.consume( /^\s*/um ) if @document_status == nil
word = @source.match( /(<[^>]*)>/um ) #@source.consume( /^\s*/um )
word = word[1] unless word.nil? word = @source.match( /^((?:\s+)|(?:<[^>]*>))/um )
case word word = word[1] unless word.nil?
when COMMENT_START #STDERR.puts "WORD = #{word.inspect}"
return [ :comment, @source.match( COMMENT_PATTERN, true )[1] ] case word
when XMLDECL_START when COMMENT_START
results = @source.match( XMLDECL_PATTERN, true )[1] return [ :comment, @source.match( COMMENT_PATTERN, true )[1] ]
version = VERSION.match( results ) when XMLDECL_START
version = version[1] unless version.nil? #STDERR.puts "XMLDECL"
encoding = ENCODING.match(results) results = @source.match( XMLDECL_PATTERN, true )[1]
encoding = encoding[1] unless encoding.nil? version = VERSION.match( results )
@source.encoding = encoding version = version[1] unless version.nil?
standalone = STANDALONE.match(results) encoding = ENCODING.match(results)
standalone = standalone[1] unless standalone.nil? encoding = encoding[1] unless encoding.nil?
return [ :xmldecl, version, encoding, standalone] @source.encoding = encoding
when INSTRUCTION_START standalone = STANDALONE.match(results)
return [ :processing_instruction, *@source.match(INSTRUCTION_PATTERN, true)[1,2] ] standalone = standalone[1] unless standalone.nil?
when DOCTYPE_START return [ :xmldecl, version, encoding, standalone ]
md = @source.match( DOCTYPE_PATTERN, true ) when INSTRUCTION_START
identity = md[1] return [ :processing_instruction, *@source.match(INSTRUCTION_PATTERN, true)[1,2] ]
close = md[2] when DOCTYPE_START
identity =~ IDENTITY md = @source.match( DOCTYPE_PATTERN, true )
name = $1 identity = md[1]
raise REXML::ParseException("DOCTYPE is missing a name") if name.nil? close = md[2]
pub_sys = $2.nil? ? nil : $2.strip identity =~ IDENTITY
long_name = $3.nil? ? nil : $3.strip name = $1
uri = $4.nil? ? nil : $4.strip raise REXML::ParseException("DOCTYPE is missing a name") if name.nil?
args = [ :start_doctype, name, pub_sys, long_name, uri ] pub_sys = $2.nil? ? nil : $2.strip
if close == ">" long_name = $3.nil? ? nil : $3.strip
@document_status = :after_doctype uri = $4.nil? ? nil : $4.strip
@source.read if @source.buffer.size<2 args = [ :start_doctype, name, pub_sys, long_name, uri ]
md = @source.match(/^\s*/um, true) if close == ">"
@stack << [ :end_doctype ] @document_status = :after_doctype
else @source.read if @source.buffer.size<2
@document_status = :in_doctype md = @source.match(/^\s*/um, true)
end @stack << [ :end_doctype ]
return args else
else @document_status = :in_doctype
@document_status = :after_doctype end
@source.read if @source.buffer.size<2 return args
md = @source.match(/\s*/um, true) when /^\s+/
end else
end @document_status = :after_doctype
if @document_status == :in_doctype @source.read if @source.buffer.size<2
md = @source.match(/\s*(.*?>)/um) md = @source.match(/\s*/um, true)
case md[1] end
when SYSTEMENTITY end
match = @source.match( SYSTEMENTITY, true )[1] if @document_status == :in_doctype
return [ :externalentity, match ] md = @source.match(/\s*(.*?>)/um)
case md[1]
when SYSTEMENTITY
match = @source.match( SYSTEMENTITY, true )[1]
return [ :externalentity, match ]
when ELEMENTDECL_START when ELEMENTDECL_START
return [ :elementdecl, @source.match( ELEMENTDECL_PATTERN, true )[1] ] return [ :elementdecl, @source.match( ELEMENTDECL_PATTERN, true )[1] ]
when ENTITY_START when ENTITY_START
match = @source.match( ENTITYDECL, true ).to_a.compact match = @source.match( ENTITYDECL, true ).to_a.compact
match[0] = :entitydecl match[0] = :entitydecl
ref = false ref = false
if match[1] == '%' if match[1] == '%'
ref = true ref = true
match.delete_at 1 match.delete_at 1
end end
# Now we have to sort out what kind of entity reference this is # Now we have to sort out what kind of entity reference this is
if match[2] == 'SYSTEM' if match[2] == 'SYSTEM'
# External reference # External reference
match[3] = match[3][1..-2] # PUBID match[3] = match[3][1..-2] # PUBID
match.delete_at(4) if match.size > 4 # Chop out NDATA decl match.delete_at(4) if match.size > 4 # Chop out NDATA decl
# match is [ :entity, name, SYSTEM, pubid(, ndata)? ] # match is [ :entity, name, SYSTEM, pubid(, ndata)? ]
elsif match[2] == 'PUBLIC' elsif match[2] == 'PUBLIC'
# External reference # External reference
match[3] = match[3][1..-2] # PUBID match[3] = match[3][1..-2] # PUBID
match[4] = match[4][1..-2] # HREF match[4] = match[4][1..-2] # HREF
# match is [ :entity, name, PUBLIC, pubid, href ] # match is [ :entity, name, PUBLIC, pubid, href ]
else else
match[2] = match[2][1..-2] match[2] = match[2][1..-2]
match.pop if match.size == 4 match.pop if match.size == 4
# match is [ :entity, name, value ] # match is [ :entity, name, value ]
end end
match << '%' if ref match << '%' if ref
return match return match
when ATTLISTDECL_START when ATTLISTDECL_START
md = @source.match( ATTLISTDECL_PATTERN, true ) md = @source.match( ATTLISTDECL_PATTERN, true )
raise REXML::ParseException.new( "Bad ATTLIST declaration!", @source ) if md.nil? raise REXML::ParseException.new( "Bad ATTLIST declaration!", @source ) if md.nil?
element = md[1] element = md[1]
contents = md[0] contents = md[0]
pairs = {} pairs = {}
values = md[0].scan( ATTDEF_RE ) values = md[0].scan( ATTDEF_RE )
values.each do |attdef| values.each do |attdef|
unless attdef[3] == "#IMPLIED" unless attdef[3] == "#IMPLIED"
attdef.compact! attdef.compact!
val = attdef[3] val = attdef[3]
val = attdef[4] if val == "#FIXED " val = attdef[4] if val == "#FIXED "
pairs[attdef[0]] = val pairs[attdef[0]] = val
end end
end end
return [ :attlistdecl, element, pairs, contents ] return [ :attlistdecl, element, pairs, contents ]
when NOTATIONDECL_START when NOTATIONDECL_START
md = nil md = nil
if @source.match( PUBLIC ) if @source.match( PUBLIC )
md = @source.match( PUBLIC, true ) md = @source.match( PUBLIC, true )
elsif @source.match( SYSTEM ) vals = [md[1],md[2],md[4],md[6]]
md = @source.match( SYSTEM, true ) elsif @source.match( SYSTEM )
else md = @source.match( SYSTEM, true )
raise REXML::ParseException.new( "error parsing notation: no matching pattern", @source ) vals = [md[1],md[2],nil,md[4]]
end else
return [ :notationdecl, md[1], md[2], md[3] ] raise REXML::ParseException.new( "error parsing notation: no matching pattern", @source )
when CDATA_END end
@document_status = :after_doctype return [ :notationdecl, *vals ]
@source.match( CDATA_END, true ) when CDATA_END
return [ :end_doctype ] @document_status = :after_doctype
end @source.match( CDATA_END, true )
end return [ :end_doctype ]
begin end
if @source.buffer[0] == ?< end
if @source.buffer[1] == ?/ begin
last_tag = @tags.pop if @source.buffer[0] == ?<
#md = @source.match_to_consume( '>', CLOSE_MATCH) if @source.buffer[1] == ?/
md = @source.match( CLOSE_MATCH, true ) last_tag = @tags.pop
raise REXML::ParseException.new( "Missing end tag for "+ #md = @source.match_to_consume( '>', CLOSE_MATCH)
md = @source.match( CLOSE_MATCH, true )
raise REXML::ParseException.new( "Missing end tag for "+
"'#{last_tag}' (got \"#{md[1]}\")", "'#{last_tag}' (got \"#{md[1]}\")",
@source) unless last_tag == md[1] @source) unless last_tag == md[1]
return [ :end_element, last_tag ] return [ :end_element, last_tag ]
elsif @source.buffer[1] == ?! elsif @source.buffer[1] == ?!
md = @source.match(/\A(\s*[^>]*>)/um) md = @source.match(/\A(\s*[^>]*>)/um)
#puts "SOURCE BUFFER = #{source.buffer}, #{source.buffer.size}" #STDERR.puts "SOURCE BUFFER = #{source.buffer}, #{source.buffer.size}"
raise REXML::ParseException.new("Malformed node", @source) unless md raise REXML::ParseException.new("Malformed node", @source) unless md
if md[0][2] == ?- if md[0][2] == ?-
md = @source.match( COMMENT_PATTERN, true ) md = @source.match( COMMENT_PATTERN, true )
return [ :comment, md[1] ] if md return [ :comment, md[1] ] if md
else else
md = @source.match( CDATA_PATTERN, true ) md = @source.match( CDATA_PATTERN, true )
return [ :cdata, md[1] ] if md return [ :cdata, md[1] ] if md
end end
raise REXML::ParseException.new( "Declarations can only occur "+ raise REXML::ParseException.new( "Declarations can only occur "+
"in the doctype declaration.", @source) "in the doctype declaration.", @source)
elsif @source.buffer[1] == ?? elsif @source.buffer[1] == ??
md = @source.match( INSTRUCTION_PATTERN, true ) md = @source.match( INSTRUCTION_PATTERN, true )
return [ :processing_instruction, md[1], md[2] ] if md return [ :processing_instruction, md[1], md[2] ] if md
raise REXML::ParseException.new( "Bad instruction declaration", raise REXML::ParseException.new( "Bad instruction declaration",
@source) @source)
else else
# Get the next tag # Get the next tag
md = @source.match(TAG_MATCH, true) md = @source.match(TAG_MATCH, true)
raise REXML::ParseException.new("malformed XML: missing tag start", @source) unless md raise REXML::ParseException.new("malformed XML: missing tag start", @source) unless md
attrs = [] attrs = []
if md[2].size > 0 if md[2].size > 0
attrs = md[2].scan( ATTRIBUTE_PATTERN ) attrs = md[2].scan( ATTRIBUTE_PATTERN )
raise REXML::ParseException.new( "error parsing attributes: [#{attrs.join ', '}], excess = \"#$'\"", @source) if $' and $'.strip.size > 0 raise REXML::ParseException.new( "error parsing attributes: [#{attrs.join ', '}], excess = \"#$'\"", @source) if $' and $'.strip.size > 0
end end
if md[4] if md[4]
@closed = md[1] @closed = md[1]
else else
@tags.push( md[1] ) @tags.push( md[1] )
end end
attributes = {} attributes = {}
attrs.each { |a,b,c| attributes[a] = c } attrs.each { |a,b,c| attributes[a] = c }
return [ :start_element, md[1], attributes ] return [ :start_element, md[1], attributes ]
end end
else else
md = @source.match( TEXT_PATTERN, true ) md = @source.match( TEXT_PATTERN, true )
if md[0].length == 0 if md[0].length == 0
#puts "EMPTY = #{empty?}" puts "EMPTY = #{empty?}"
#puts "BUFFER = \"#{@source.buffer}\"" puts "BUFFER = \"#{@source.buffer}\""
@source.match( /(\s+)/, true ) @source.match( /(\s+)/, true )
end end
#STDERR.puts "GOT #{md[1].inspect}" unless md[0].length == 0
#return [ :text, "" ] if md[0].length == 0 #return [ :text, "" ] if md[0].length == 0
# unnormalized = Text::unnormalize( md[1], self ) # unnormalized = Text::unnormalize( md[1], self )
# return PullEvent.new( :text, md[1], unnormalized ) # return PullEvent.new( :text, md[1], unnormalized )
return [ :text, md[1] ] return [ :text, md[1] ]
end end
rescue REXML::ParseException rescue REXML::ParseException
raise raise
rescue Exception, NameError => error rescue Exception, NameError => error
raise REXML::ParseException.new( "Exception parsing", raise REXML::ParseException.new( "Exception parsing",
@source, self, (error ? error : $!) ) @source, self, (error ? error : $!) )
end end
return [ :dummy ] return [ :dummy ]
end end
def entity( reference, entities ) def entity( reference, entities )
value = nil value = nil
value = entities[ reference ] if entities value = entities[ reference ] if entities
if not value if not value
value = DEFAULT_ENTITIES[ reference ] value = DEFAULT_ENTITIES[ reference ]
value = value[2] if value value = value[2] if value
end end
unnormalize( value, entities ) if value unnormalize( value, entities ) if value
end end
# Escapes all possible entities # Escapes all possible entities
def normalize( input, entities=nil, entity_filter=nil ) def normalize( input, entities=nil, entity_filter=nil )
copy = input.clone copy = input.clone
# Doing it like this rather than in a loop improves the speed # Doing it like this rather than in a loop improves the speed
copy.gsub!( EREFERENCE, '&amp;' ) copy.gsub!( EREFERENCE, '&amp;' )
entities.each do |key, value| entities.each do |key, value|
copy.gsub!( value, "&#{key};" ) unless entity_filter and copy.gsub!( value, "&#{key};" ) unless entity_filter and
entity_filter.include?(entity) entity_filter.include?(entity)
end if entities end if entities
copy.gsub!( EREFERENCE, '&amp;' ) copy.gsub!( EREFERENCE, '&amp;' )
DEFAULT_ENTITIES.each do |key, value| DEFAULT_ENTITIES.each do |key, value|
copy.gsub!( value[3], value[1] ) copy.gsub!( value[3], value[1] )
end end
copy copy
end end
# Unescapes all possible entities # Unescapes all possible entities
def unnormalize( string, entities=nil, filter=nil ) def unnormalize( string, entities=nil, filter=nil )
rv = string.clone rv = string.clone
rv.gsub!( /\r\n?/, "\n" ) rv.gsub!( /\r\n?/, "\n" )
matches = rv.scan( REFERENCE_RE ) matches = rv.scan( REFERENCE_RE )
return rv if matches.size == 0 return rv if matches.size == 0
rv.gsub!( /&#0*((?:\d+)|(?:x[a-fA-F0-9]+));/ ) {|m| rv.gsub!( /&#0*((?:\d+)|(?:x[a-fA-F0-9]+));/ ) {|m|
m=$1 m=$1
m = "0#{m}" if m[0] == ?x m = "0#{m}" if m[0] == ?x
[Integer(m)].pack('U*') [Integer(m)].pack('U*')
} }
matches.collect!{|x|x[0]}.compact! matches.collect!{|x|x[0]}.compact!
if matches.size > 0 if matches.size > 0
matches.each do |entity_reference| matches.each do |entity_reference|
unless filter and filter.include?(entity_reference) unless filter and filter.include?(entity_reference)
entity_value = entity( entity_reference, entities ) entity_value = entity( entity_reference, entities )
if entity_value if entity_value
re = /&#{entity_reference};/ re = /&#{entity_reference};/
rv.gsub!( re, entity_value ) rv.gsub!( re, entity_value )
end end
end end
end end
matches.each do |entity_reference| matches.each do |entity_reference|
unless filter and filter.include?(entity_reference) unless filter and filter.include?(entity_reference)
er = DEFAULT_ENTITIES[entity_reference] er = DEFAULT_ENTITIES[entity_reference]
rv.gsub!( er[0], er[2] ) if er rv.gsub!( er[0], er[2] ) if er
end end
end end
rv.gsub!( /&amp;/, '&' ) rv.gsub!( /&amp;/, '&' )
end end
rv rv
end end
end end
end end
end end
=begin =begin

View file

@ -1,96 +1,100 @@
require 'forwardable'
require 'rexml/parseexception' require 'rexml/parseexception'
require 'rexml/parsers/baseparser' require 'rexml/parsers/baseparser'
require 'rexml/xmltokens' require 'rexml/xmltokens'
module REXML module REXML
module Parsers module Parsers
# = Using the Pull Parser # = Using the Pull Parser
# <em>This API is experimental, and subject to change.</em> # <em>This API is experimental, and subject to change.</em>
# parser = PullParser.new( "<a>text<b att='val'/>txet</a>" ) # parser = PullParser.new( "<a>text<b att='val'/>txet</a>" )
# while parser.has_next? # while parser.has_next?
# res = parser.next # res = parser.next
# puts res[1]['att'] if res.start_tag? and res[0] == 'b' # puts res[1]['att'] if res.start_tag? and res[0] == 'b'
# end # end
# See the PullEvent class for information on the content of the results. # See the PullEvent class for information on the content of the results.
# The data is identical to the arguments passed for the various events to # The data is identical to the arguments passed for the various events to
# the StreamListener API. # the StreamListener API.
# #
# Notice that: # Notice that:
# parser = PullParser.new( "<a>BAD DOCUMENT" ) # parser = PullParser.new( "<a>BAD DOCUMENT" )
# while parser.has_next? # while parser.has_next?
# res = parser.next # res = parser.next
# raise res[1] if res.error? # raise res[1] if res.error?
# end # end
# #
# Nat Price gave me some good ideas for the API. # Nat Price gave me some good ideas for the API.
class PullParser class PullParser
include XMLTokens include XMLTokens
extend Forwardable
def initialize stream def_delegators( :@parser, :has_next? )
@entities = {} def_delegators( :@parser, :entity )
def_delegators( :@parser, :empty? )
def_delegators( :@parser, :source )
def initialize stream
@entities = {}
@listeners = nil @listeners = nil
@parser = BaseParser.new( stream ) @parser = BaseParser.new( stream )
end @my_stack = []
end
def add_listener( listener ) def add_listener( listener )
@listeners = [] unless @listeners @listeners = [] unless @listeners
@listeners << listener @listeners << listener
end end
def each def each
while has_next? while has_next?
yield self.pull yield self.pull
end end
end
def peek depth=0
PullEvent.new(@parser.peek(depth))
end
def has_next?
@parser.has_next?
end end
def pull def peek depth=0
event = @parser.pull if @my_stack.length <= depth
case event[0] (depth - @my_stack.length + 1).times {
when :entitydecl e = PullEvent.new(@parser.pull)
@entities[ event[1] ] = @my_stack.push(e)
event[2] unless event[2] =~ /PUBLIC|SYSTEM/ }
when :text end
unnormalized = @parser.unnormalize( event[1], @entities ) @my_stack[depth]
event << unnormalized end
end
PullEvent.new( event ) def pull
end return @my_stack.shift if @my_stack.length > 0
event = @parser.pull
case event[0]
when :entitydecl
@entities[ event[1] ] =
event[2] unless event[2] =~ /PUBLIC|SYSTEM/
when :text
unnormalized = @parser.unnormalize( event[1], @entities )
event << unnormalized
end
PullEvent.new( event )
end
def unshift token def unshift token
@parser.unshift token @my_stack.unshift token
end end
end
def entity reference # A parsing event. The contents of the event are accessed as an +Array?,
@parser.entity( reference ) # and the type is given either by the ...? methods, or by accessing the
# +type+ accessor. The contents of this object vary from event to event,
# but are identical to the arguments passed to +StreamListener+s for each
# event.
class PullEvent
# The type of this event. Will be one of :tag_start, :tag_end, :text,
# :processing_instruction, :comment, :doctype, :attlistdecl, :entitydecl,
# :notationdecl, :entity, :cdata, :xmldecl, or :error.
def initialize(arg)
@contents = arg
end end
def empty?
@parser.empty?
end
end
# A parsing event. The contents of the event are accessed as an +Array?,
# and the type is given either by the ...? methods, or by accessing the
# +type+ accessor. The contents of this object vary from event to event,
# but are identical to the arguments passed to +StreamListener+s for each
# event.
class PullEvent
# The type of this event. Will be one of :tag_start, :tag_end, :text,
# :processing_instruction, :comment, :doctype, :attlistdecl, :entitydecl,
# :notationdecl, :entity, :cdata, :xmldecl, or :error.
def initialize(arg)
@contents = arg
end
def []( start, endd=nil) def []( start, endd=nil)
if start.kind_of? Range if start.kind_of? Range
@contents.slice( start.begin+1 .. start.end ) @contents.slice( start.begin+1 .. start.end )
@ -103,90 +107,90 @@ module REXML
else else
raise "Illegal argument #{start.inspect} (#{start.class})" raise "Illegal argument #{start.inspect} (#{start.class})"
end end
end end
def event_type def event_type
@contents[0] @contents[0]
end end
# Content: [ String tag_name, Hash attributes ] # Content: [ String tag_name, Hash attributes ]
def start_element? def start_element?
@contents[0] == :start_element @contents[0] == :start_element
end end
# Content: [ String tag_name ] # Content: [ String tag_name ]
def end_element? def end_element?
@contents[0] == :end_element @contents[0] == :end_element
end end
# Content: [ String raw_text, String unnormalized_text ] # Content: [ String raw_text, String unnormalized_text ]
def text? def text?
@contents[0] == :text @contents[0] == :text
end end
# Content: [ String text ] # Content: [ String text ]
def instruction? def instruction?
@contents[0] == :processing_instruction @contents[0] == :processing_instruction
end end
# Content: [ String text ] # Content: [ String text ]
def comment? def comment?
@contents[0] == :comment @contents[0] == :comment
end end
# Content: [ String name, String pub_sys, String long_name, String uri ] # Content: [ String name, String pub_sys, String long_name, String uri ]
def doctype? def doctype?
@contents[0] == :start_doctype @contents[0] == :start_doctype
end end
# Content: [ String text ] # Content: [ String text ]
def attlistdecl? def attlistdecl?
@contents[0] == :attlistdecl @contents[0] == :attlistdecl
end end
# Content: [ String text ] # Content: [ String text ]
def elementdecl? def elementdecl?
@contents[0] == :elementdecl @contents[0] == :elementdecl
end end
# Due to the wonders of DTDs, an entity declaration can be just about # Due to the wonders of DTDs, an entity declaration can be just about
# anything. There's no way to normalize it; you'll have to interpret the # anything. There's no way to normalize it; you'll have to interpret the
# content yourself. However, the following is true: # content yourself. However, the following is true:
# #
# * If the entity declaration is an internal entity: # * If the entity declaration is an internal entity:
# [ String name, String value ] # [ String name, String value ]
# Content: [ String text ] # Content: [ String text ]
def entitydecl? def entitydecl?
@contents[0] == :entitydecl @contents[0] == :entitydecl
end end
# Content: [ String text ] # Content: [ String text ]
def notationdecl? def notationdecl?
@contents[0] == :notationdecl @contents[0] == :notationdecl
end end
# Content: [ String text ] # Content: [ String text ]
def entity? def entity?
@contents[0] == :entity @contents[0] == :entity
end end
# Content: [ String text ] # Content: [ String text ]
def cdata? def cdata?
@contents[0] == :cdata @contents[0] == :cdata
end end
# Content: [ String version, String encoding, String standalone ] # Content: [ String version, String encoding, String standalone ]
def xmldecl? def xmldecl?
@contents[0] == :xmldecl @contents[0] == :xmldecl
end end
def error? def error?
@contents[0] == :error @contents[0] == :error
end end
def inspect def inspect
@contents[0].to_s + ": " + @contents[1..-1].inspect @contents[0].to_s + ": " + @contents[1..-1].inspect
end end
end end
end end
end end

View file

@ -1,9 +1,11 @@
require 'rexml/parsers/baseparser' require 'rexml/parsers/baseparser'
require 'rexml/parseexception' require 'rexml/parseexception'
require 'rexml/namespace' require 'rexml/namespace'
require 'rexml/text'
module REXML module REXML
module Parsers module Parsers
# SAX2Parser
class SAX2Parser class SAX2Parser
def initialize source def initialize source
@parser = BaseParser.new(source) @parser = BaseParser.new(source)
@ -36,6 +38,10 @@ module REXML
# :start_prefix_mapping, :end_prefix_mapping, :characters, # :start_prefix_mapping, :end_prefix_mapping, :characters,
# :processing_instruction, :doctype, :attlistdecl, :elementdecl, # :processing_instruction, :doctype, :attlistdecl, :elementdecl,
# :entitydecl, :notationdecl, :cdata, :xmldecl, :comment # :entitydecl, :notationdecl, :cdata, :xmldecl, :comment
#
# There is an additional symbol that can be listened for: :progress.
# This will be called for every event generated, passing in the current
# stream position.
# #
# Array contains regular expressions or strings which will be matched # Array contains regular expressions or strings which will be matched
# against fully qualified element names. # against fully qualified element names.
@ -161,6 +167,7 @@ module REXML
:elementdecl, :cdata, :notationdecl, :xmldecl :elementdecl, :cdata, :notationdecl, :xmldecl
handle( *event ) handle( *event )
end end
handle( :progress, @parser.position )
end end
end end

View file

@ -1,42 +1,46 @@
module REXML module REXML
module Parsers module Parsers
class StreamParser class StreamParser
def initialize source, listener def initialize source, listener
@listener = listener @listener = listener
@parser = BaseParser.new( source ) @parser = BaseParser.new( source )
end end
def add_listener( listener ) def add_listener( listener )
@parser.add_listener( listener ) @parser.add_listener( listener )
end end
def parse def parse
# entity string # entity string
while true while true
event = @parser.pull event = @parser.pull
case event[0] case event[0]
when :end_document when :end_document
return return
when :start_element when :start_element
attrs = event[2].each do |n, v| attrs = event[2].each do |n, v|
event[2][n] = @parser.unnormalize( v ) event[2][n] = @parser.unnormalize( v )
end end
@listener.tag_start( event[1], attrs ) @listener.tag_start( event[1], attrs )
when :end_element when :end_element
@listener.tag_end( event[1] ) @listener.tag_end( event[1] )
when :text when :text
normalized = @parser.unnormalize( event[1] ) normalized = @parser.unnormalize( event[1] )
@listener.text( normalized ) @listener.text( normalized )
when :processing_instruction when :processing_instruction
@listener.instruction( *event[1,2] ) @listener.instruction( *event[1,2] )
when :start_doctype when :start_doctype
@listener.doctype( *event[1..-1] ) @listener.doctype( *event[1..-1] )
when :comment, :attlistdecl, :notationdecl, :elementdecl, when :end_doctype
:entitydecl, :cdata, :xmldecl, :attlistdecl # FIXME: remove this condition for milestone:3.2
@listener.send( event[0].to_s, *event[1..-1] ) @listener.doctype_end if @listener.respond_to? :doctype_end
end when :comment, :attlistdecl, :cdata, :xmldecl, :elementdecl
end @listener.send( event[0].to_s, *event[1..-1] )
end when :entitydecl, :notationdecl
end @listener.send( event[0].to_s, event[1..-1] )
end end
end
end
end
end
end end

View file

@ -19,8 +19,12 @@ module REXML
begin begin
while true while true
event = @parser.pull event = @parser.pull
#STDERR.puts "TREEPARSER GOT #{event.inspect}"
case event[0] case event[0]
when :end_document when :end_document
unless tag_stack.empty?
raise ParseException.new("No close tag for #{tag_stack.inspect}")
end
return return
when :start_element when :start_element
tag_stack.push(event[1]) tag_stack.push(event[1])
@ -35,10 +39,10 @@ module REXML
@build_context[-1] << event[1] @build_context[-1] << event[1]
else else
@build_context.add( @build_context.add(
Text.new( event[1], @build_context.whitespace, nil, true ) Text.new(event[1], @build_context.whitespace, nil, true)
) unless ( ) unless (
event[1].strip.size==0 and @build_context.ignore_whitespace_nodes and
@build_context.ignore_whitespace_nodes event[1].strip.size==0
) )
end end
end end

View file

@ -10,8 +10,8 @@
# #
# Main page:: http://www.germane-software.com/software/rexml # Main page:: http://www.germane-software.com/software/rexml
# Author:: Sean Russell <serATgermaneHYPHENsoftwareDOTcom> # Author:: Sean Russell <serATgermaneHYPHENsoftwareDOTcom>
# Version:: 3.1.3 # Version:: 3.1.4
# Date:: +2005/139 # Date:: 2006/104
# #
# This API documentation can be downloaded from the REXML home page, or can # This API documentation can be downloaded from the REXML home page, or can
# be accessed online[http://www.germane-software.com/software/rexml_doc] # be accessed online[http://www.germane-software.com/software/rexml_doc]
@ -20,7 +20,10 @@
# or can be accessed # or can be accessed
# online[http://www.germane-software.com/software/rexml/docs/tutorial.html] # online[http://www.germane-software.com/software/rexml/docs/tutorial.html]
module REXML module REXML
Copyright = "Copyright © 2001-2005 Sean Russell <ser@germane-software.com>" COPYRIGHT = "Copyright © 2001-2006 Sean Russell <ser@germane-software.com>"
Date = "+2005/139" DATE = "2006/104"
Version = "3.1.3" VERSION = "3.1.4"
Copyright = COPYRIGHT
Version = VERSION
end end

View file

@ -84,11 +84,14 @@ module REXML
# @p version the version attribute value. EG, "1.0" # @p version the version attribute value. EG, "1.0"
# @p encoding the encoding attribute value, or nil. EG, "utf" # @p encoding the encoding attribute value, or nil. EG, "utf"
# @p standalone the standalone attribute value, or nil. EG, nil # @p standalone the standalone attribute value, or nil. EG, nil
# @p spaced the declaration is followed by a line break
def xmldecl version, encoding, standalone def xmldecl version, encoding, standalone
end end
# Called when a comment is encountered. # Called when a comment is encountered.
# @p comment The content of the comment # @p comment The content of the comment
def comment comment def comment comment
end end
def progress position
end
end end
end end

View file

@ -7,12 +7,19 @@ module REXML
# @param arg Either a String, or an IO # @param arg Either a String, or an IO
# @return a Source, or nil if a bad argument was given # @return a Source, or nil if a bad argument was given
def SourceFactory::create_from arg#, slurp=true def SourceFactory::create_from arg#, slurp=true
if arg.kind_of? String if arg.kind_of? String
source = Source.new(arg) Source.new(arg)
elsif arg.kind_of? IO elsif arg.respond_to? :read and
source = IOSource.new(arg) arg.respond_to? :readline and
end arg.respond_to? :nil? and
source arg.respond_to? :eof?
IOSource.new(arg)
elsif arg.kind_of? Source
arg
else
raise "#{source.class} is not a valid input stream. It must walk \n"+
"like either a String, IO, or Source."
end
end end
end end
@ -98,6 +105,10 @@ module REXML
@buffer == "" @buffer == ""
end end
def position
@orig.index( @buffer )
end
# @return the current line in the source # @return the current line in the source
def current_line def current_line
lines = @orig.split lines = @orig.split
@ -194,6 +205,10 @@ module REXML
super and ( @source.nil? || @source.eof? ) super and ( @source.nil? || @source.eof? )
end end
def position
@er_source.stat.pipe? ? 0 : @er_source.pos
end
# @return the current line in the source # @return the current line in the source
def current_line def current_line
begin begin

View file

@ -39,6 +39,9 @@ module REXML
# @p uri the uri of the doctype, or nil. EG, "bar" # @p uri the uri of the doctype, or nil. EG, "bar"
def doctype name, pub_sys, long_name, uri def doctype name, pub_sys, long_name, uri
end end
# Called when the doctype is done
def doctype_end
end
# If a doctype includes an ATTLIST declaration, it will cause this # If a doctype includes an ATTLIST declaration, it will cause this
# method to be called. The content is the declaration itself, unparsed. # method to be called. The content is the declaration itself, unparsed.
# EG, <!ATTLIST el attr CDATA #REQUIRED> will come to this method as "el # EG, <!ATTLIST el attr CDATA #REQUIRED> will come to this method as "el

View file

@ -39,8 +39,10 @@ module REXML
# text. If this value is nil (the default), then the raw value of the # text. If this value is nil (the default), then the raw value of the
# parent will be used as the raw value for this node. If there is no raw # parent will be used as the raw value for this node. If there is no raw
# value for the parent, and no value is supplied, the default is false. # value for the parent, and no value is supplied, the default is false.
# Use this field if you have entities defined for some text, and you don't
# want REXML to escape that text in output.
# Text.new( "<&", false, nil, false ) #-> "&lt;&amp;" # Text.new( "<&", false, nil, false ) #-> "&lt;&amp;"
# Text.new( "<&", false, nil, true ) #-> IllegalArgumentException # Text.new( "<&", false, nil, true ) #-> Parse exception
# Text.new( "&lt;&amp;", false, nil, true ) #-> "&lt;&amp;" # Text.new( "&lt;&amp;", false, nil, true ) #-> "&lt;&amp;"
# # Assume that the entity "s" is defined to be "sean" # # Assume that the entity "s" is defined to be "sean"
# # and that the entity "r" is defined to be "russell" # # and that the entity "r" is defined to be "russell"
@ -156,11 +158,11 @@ module REXML
# # Assume that the entity "s" is defined to be "sean", and that the # # Assume that the entity "s" is defined to be "sean", and that the
# # entity "r" is defined to be "russell" # # entity "r" is defined to be "russell"
# t = Text.new( "< & sean russell", false, nil, false, ['s'] ) # t = Text.new( "< & sean russell", false, nil, false, ['s'] )
# t.string #-> "< & sean russell" # t.value #-> "< & sean russell"
# t = Text.new( "< & &s; russell", false, nil, false ) # t = Text.new( "< & &s; russell", false, nil, false )
# t.string #-> "< & sean russell" # t.value #-> "< & sean russell"
# u = Text.new( "sean russell", false, nil, true ) # u = Text.new( "sean russell", false, nil, true )
# u.string #-> "sean russell" # u.value #-> "sean russell"
def value def value
@unnormalized if @unnormalized @unnormalized if @unnormalized
doctype = nil doctype = nil
@ -282,9 +284,10 @@ module REXML
EREFERENCE = /&(?!#{Entity::NAME};)/ EREFERENCE = /&(?!#{Entity::NAME};)/
# Escapes all possible entities # Escapes all possible entities
def Text::normalize( input, doctype=nil, entity_filter=nil ) def Text::normalize( input, doctype=nil, entity_filter=nil )
copy = input.clone copy = input
# Doing it like this rather than in a loop improves the speed # Doing it like this rather than in a loop improves the speed
if doctype if doctype
# Replace all ampersands that aren't part of an entity
copy = copy.gsub( EREFERENCE, '&amp;' ) copy = copy.gsub( EREFERENCE, '&amp;' )
doctype.entities.each_value do |entity| doctype.entities.each_value do |entity|
copy = copy.gsub( entity.value, copy = copy.gsub( entity.value,
@ -292,6 +295,7 @@ module REXML
not( entity_filter and entity_filter.include?(entity) ) not( entity_filter and entity_filter.include?(entity) )
end end
else else
# Replace all ampersands that aren't part of an entity
copy = copy.gsub( EREFERENCE, '&amp;' ) copy = copy.gsub( EREFERENCE, '&amp;' )
DocType::DEFAULT_ENTITIES.each_value do |entity| DocType::DEFAULT_ENTITIES.each_value do |entity|
copy = copy.gsub(entity.value, "&#{entity.name};" ) copy = copy.gsub(entity.value, "&#{entity.name};" )

View file

@ -82,10 +82,13 @@ module REXML
@event_arg = event_arg @event_arg = event_arg
end end
attr_reader :done?
attr_reader :event_type attr_reader :event_type
attr_accessor :event_arg attr_accessor :event_arg
def done?
@done
end
def single? def single?
return (@event_type != :start_element and @event_type != :start_attribute) return (@event_type != :start_element and @event_type != :start_attribute)
end end

View file

@ -2,71 +2,71 @@ require 'rexml/encoding'
require 'rexml/source' require 'rexml/source'
module REXML module REXML
# NEEDS DOCUMENTATION # NEEDS DOCUMENTATION
class XMLDecl < Child class XMLDecl < Child
include Encoding include Encoding
DEFAULT_VERSION = "1.0"; DEFAULT_VERSION = "1.0";
DEFAULT_ENCODING = "UTF-8"; DEFAULT_ENCODING = "UTF-8";
DEFAULT_STANDALONE = "no"; DEFAULT_STANDALONE = "no";
START = '<\?xml'; START = '<\?xml';
STOP = '\?>'; STOP = '\?>';
attr_accessor :version, :standalone attr_accessor :version, :standalone
attr_reader :writeencoding attr_reader :writeencoding
def initialize(version=DEFAULT_VERSION, encoding=nil, standalone=nil) def initialize(version=DEFAULT_VERSION, encoding=nil, standalone=nil)
@writethis = true @writethis = true
@writeencoding = !encoding.nil? @writeencoding = !encoding.nil?
if version.kind_of? XMLDecl if version.kind_of? XMLDecl
super() super()
@version = version.version @version = version.version
self.encoding = version.encoding self.encoding = version.encoding
@writeencoding = version.writeencoding @writeencoding = version.writeencoding
@standalone = version.standalone @standalone = version.standalone
else else
super() super()
@version = version @version = version
self.encoding = encoding self.encoding = encoding
@standalone = standalone @standalone = standalone
end end
@version = DEFAULT_VERSION if @version.nil? @version = DEFAULT_VERSION if @version.nil?
end end
def clone def clone
XMLDecl.new(self) XMLDecl.new(self)
end end
def write writer, indent_level=-1, transitive=false, ie_hack=false def write writer, indent=-1, transitive=false, ie_hack=false
return nil unless @writethis or writer.kind_of? Output return nil unless @writethis or writer.kind_of? Output
indent( writer, indent_level ) indent( writer, indent )
writer << START.sub(/\\/u, '') writer << START.sub(/\\/u, '')
if writer.kind_of? Output if writer.kind_of? Output
writer << " #{content writer.encoding}" writer << " #{content writer.encoding}"
else else
writer << " #{content encoding}" writer << " #{content encoding}"
end end
writer << STOP.sub(/\\/u, '') writer << STOP.sub(/\\/u, '')
end end
def ==( other ) def ==( other )
other.kind_of?(XMLDecl) and other.kind_of?(XMLDecl) and
other.version == @version and other.version == @version and
other.encoding == self.encoding and other.encoding == self.encoding and
other.standalone == @standalone other.standalone == @standalone
end end
def xmldecl version, encoding, standalone def xmldecl version, encoding, standalone
@version = version @version = version
self.encoding = encoding self.encoding = encoding
@standalone = standalone @standalone = standalone
end end
def node_type def node_type
:xmldecl :xmldecl
end end
alias :stand_alone? :standalone alias :stand_alone? :standalone
alias :old_enc= :encoding= alias :old_enc= :encoding=
def encoding=( enc ) def encoding=( enc )
@ -80,6 +80,11 @@ module REXML
self.dowrite self.dowrite
end end
# Only use this if you do not want the XML declaration to be written;
# this object is ignored by the XML writer. Otherwise, instantiate your
# own XMLDecl and add it to the document.
#
# Note that XML 1.1 documents *must* include an XML declaration
def XMLDecl.default def XMLDecl.default
rv = XMLDecl.new( "1.0" ) rv = XMLDecl.new( "1.0" )
rv.nowrite rv.nowrite
@ -98,12 +103,12 @@ module REXML
START.sub(/\\/u, '') + " ... " + STOP.sub(/\\/u, '') START.sub(/\\/u, '') + " ... " + STOP.sub(/\\/u, '')
end end
private private
def content(enc) def content(enc)
rv = "version='#@version'" rv = "version='#@version'"
rv << " encoding='#{enc}'" if @writeencoding || enc !~ /utf-8/i rv << " encoding='#{enc}'" if @writeencoding || enc !~ /utf-8/i
rv << " standalone='#@standalone'" if @standalone rv << " standalone='#@standalone'" if @standalone
rv rv
end end
end end
end end

View file

@ -2,76 +2,65 @@ require 'rexml/functions'
require 'rexml/xpath_parser' require 'rexml/xpath_parser'
module REXML module REXML
# Wrapper class. Use this class to access the XPath functions. # Wrapper class. Use this class to access the XPath functions.
class XPath class XPath
include Functions include Functions
EMPTY_HASH = {} EMPTY_HASH = {}
# Finds and returns the first node that matches the supplied xpath. # Finds and returns the first node that matches the supplied xpath.
# element:: # element::
# The context element # The context element
# path:: # path::
# The xpath to search for. If not supplied or nil, returns the first # The xpath to search for. If not supplied or nil, returns the first
# node matching '*'. # node matching '*'.
# namespaces:: # namespaces::
# If supplied, a Hash which defines a namespace mapping. # If supplied, a Hash which defines a namespace mapping.
# #
# XPath.first( node ) # XPath.first( node )
# XPath.first( doc, "//b"} ) # XPath.first( doc, "//b"} )
# XPath.first( node, "a/x:b", { "x"=>"http://doofus" } ) # XPath.first( node, "a/x:b", { "x"=>"http://doofus" } )
def XPath::first element, path=nil, namespaces={}, variables={} def XPath::first element, path=nil, namespaces={}, variables={}
=begin
raise "The namespaces argument, if supplied, must be a hash object." unless namespaces.kind_of? Hash raise "The namespaces argument, if supplied, must be a hash object." unless namespaces.kind_of? Hash
raise "The variables argument, if supplied, must be a hash object." unless variables.kind_of? Hash raise "The variables argument, if supplied, must be a hash object." unless variables.kind_of? Hash
parser = XPathParser.new parser = XPathParser.new
parser.namespaces = namespaces parser.namespaces = namespaces
parser.variables = variables parser.variables = variables
path = "*" unless path path = "*" unless path
parser.first( path, element ); element = [element] unless element.kind_of? Array
=end parser.parse(path, element).flatten[0]
#=begin end
raise "The namespaces argument, if supplied, must be a hash object." unless namespaces.kind_of? Hash
raise "The variables argument, if supplied, must be a hash object." unless variables.kind_of? Hash
parser = XPathParser.new
parser.namespaces = namespaces
parser.variables = variables
path = "*" unless path
element = [element] unless element.kind_of? Array
parser.parse(path, element).flatten[0]
#=end
end
# Itterates over nodes that match the given path, calling the supplied # Itterates over nodes that match the given path, calling the supplied
# block with the match. # block with the match.
# element:: # element::
# The context element # The context element
# path:: # path::
# The xpath to search for. If not supplied or nil, defaults to '*' # The xpath to search for. If not supplied or nil, defaults to '*'
# namespaces:: # namespaces::
# If supplied, a Hash which defines a namespace mapping # If supplied, a Hash which defines a namespace mapping
# #
# XPath.each( node ) { |el| ... } # XPath.each( node ) { |el| ... }
# XPath.each( node, '/*[@attr='v']' ) { |el| ... } # XPath.each( node, '/*[@attr='v']' ) { |el| ... }
# XPath.each( node, 'ancestor::x' ) { |el| ... } # XPath.each( node, 'ancestor::x' ) { |el| ... }
def XPath::each element, path=nil, namespaces={}, variables={}, &block def XPath::each element, path=nil, namespaces={}, variables={}, &block
raise "The namespaces argument, if supplied, must be a hash object." unless namespaces.kind_of? Hash raise "The namespaces argument, if supplied, must be a hash object." unless namespaces.kind_of? Hash
raise "The variables argument, if supplied, must be a hash object." unless variables.kind_of? Hash raise "The variables argument, if supplied, must be a hash object." unless variables.kind_of? Hash
parser = XPathParser.new parser = XPathParser.new
parser.namespaces = namespaces parser.namespaces = namespaces
parser.variables = variables parser.variables = variables
path = "*" unless path path = "*" unless path
element = [element] unless element.kind_of? Array element = [element] unless element.kind_of? Array
parser.parse(path, element).each( &block ) parser.parse(path, element).each( &block )
end end
# Returns an array of nodes matching a given XPath. # Returns an array of nodes matching a given XPath.
def XPath::match element, path=nil, namespaces={}, variables={} def XPath::match element, path=nil, namespaces={}, variables={}
parser = XPathParser.new parser = XPathParser.new
parser.namespaces = namespaces parser.namespaces = namespaces
parser.variables = variables parser.variables = variables
path = "*" unless path path = "*" unless path
element = [element] unless element.kind_of? Array element = [element] unless element.kind_of? Array
parser.parse(path,element) parser.parse(path,element)
end end
end end
end end

View file

@ -76,6 +76,8 @@ module REXML
# Performs a depth-first (document order) XPath search, and returns the # Performs a depth-first (document order) XPath search, and returns the
# first match. This is the fastest, lightest way to return a single result. # first match. This is the fastest, lightest way to return a single result.
#
# FIXME: This method is incomplete!
def first( path_stack, node ) def first( path_stack, node )
#puts "#{depth}) Entering match( #{path.inspect}, #{tree.inspect} )" #puts "#{depth}) Entering match( #{path.inspect}, #{tree.inspect} )"
return nil if path.size == 0 return nil if path.size == 0
@ -123,14 +125,6 @@ module REXML
r = expr( path_stack, nodeset ) r = expr( path_stack, nodeset )
#puts "MAIN EXPR => #{r.inspect}" #puts "MAIN EXPR => #{r.inspect}"
r r
#while ( path_stack.size > 0 and nodeset.size > 0 )
# #puts "MATCH: #{path_stack.inspect} '#{nodeset.collect{|n|n.class}.inspect}'"
# nodeset = expr( path_stack, nodeset )
# #puts "NODESET: #{nodeset.inspect}"
# #puts "PATH_STACK: #{path_stack.inspect}"
#end
#nodeset
end end
private private
@ -158,9 +152,10 @@ module REXML
#puts "IN QNAME" #puts "IN QNAME"
prefix = path_stack.shift prefix = path_stack.shift
name = path_stack.shift name = path_stack.shift
ns = @namespaces[prefix] default_ns = @namespaces[prefix]
ns = ns ? ns : '' default_ns = default_ns ? default_ns : ''
nodeset.delete_if do |node| nodeset.delete_if do |node|
ns = default_ns
# FIXME: This DOUBLES the time XPath searches take # FIXME: This DOUBLES the time XPath searches take
ns = node.namespace( prefix ) if node.node_type == :element and ns == '' ns = node.namespace( prefix ) if node.node_type == :element and ns == ''
#puts "NS = #{ns.inspect}" #puts "NS = #{ns.inspect}"
@ -353,7 +348,7 @@ module REXML
preceding_siblings = all_siblings[ 0 .. current_index-1 ].reverse preceding_siblings = all_siblings[ 0 .. current_index-1 ].reverse
#results += expr( path_stack.dclone, preceding_siblings ) #results += expr( path_stack.dclone, preceding_siblings )
end end
nodeset = preceding_siblings nodeset = preceding_siblings || []
node_types = ELEMENTS node_types = ELEMENTS
when :preceding when :preceding
@ -385,10 +380,13 @@ module REXML
return @variables[ var_name ] return @variables[ var_name ]
# :and, :or, :eq, :neq, :lt, :lteq, :gt, :gteq # :and, :or, :eq, :neq, :lt, :lteq, :gt, :gteq
# TODO: Special case for :or and :and -- not evaluate the right
# operand if the left alone determines result (i.e. is true for
# :or and false for :and).
when :eq, :neq, :lt, :lteq, :gt, :gteq, :and, :or when :eq, :neq, :lt, :lteq, :gt, :gteq, :and, :or
left = expr( path_stack.shift, nodeset, context ) left = expr( path_stack.shift, nodeset.dup, context )
#puts "LEFT => #{left.inspect} (#{left.class.name})" #puts "LEFT => #{left.inspect} (#{left.class.name})"
right = expr( path_stack.shift, nodeset, context ) right = expr( path_stack.shift, nodeset.dup, context )
#puts "RIGHT => #{right.inspect} (#{right.class.name})" #puts "RIGHT => #{right.inspect} (#{right.class.name})"
res = equality_relational_compare( left, op, right ) res = equality_relational_compare( left, op, right )
#puts "RES => #{res.inspect}" #puts "RES => #{res.inspect}"
@ -472,8 +470,11 @@ module REXML
def descendant_or_self( path_stack, nodeset ) def descendant_or_self( path_stack, nodeset )
rs = [] rs = []
#puts "#"*80
#puts "PATH_STACK = #{path_stack.inspect}"
#puts "NODESET = #{nodeset.collect{|n|n.inspect}.inspect}"
d_o_s( path_stack, nodeset, rs ) d_o_s( path_stack, nodeset, rs )
#puts "RS = #{rs.collect{|n|n.to_s}.inspect}" #puts "RS = #{rs.collect{|n|n.inspect}.inspect}"
document_order(rs.flatten.compact) document_order(rs.flatten.compact)
#rs.flatten.compact #rs.flatten.compact
end end