1
0
Fork 0
mirror of https://github.com/ruby/ruby.git synced 2022-11-09 12:17:21 -05:00
ruby--ruby/lib/rexml/parsers/pullparser.rb
ser 21e8df5c10 Merged in development from the main REXML repository.
* Fixed bug #34, typo in xpath_parser.
* Previous fix, (include? -> includes?) was incorrect.
* Added another test for encoding
* Started AnyName support in RelaxNG
* Added Element#Attributes#to_a, so that it does something intelligent.
  This was needed by XPath, for '@*'
* Fixed XPath so that @* works.
* Added xmlgrep to the bin/ directory.  A little tool allowing you to grep
  for XPaths in an XML document.
* Fixed a CDATA pretty-printing bug. (#39)
* Fixed a buffering bug in Source.rb that affected the SAX parser
  This bug was related to how REXML determines the encoding of a file, and
  evinced itself by hanging on input when using the SAX parser.
* The unit test for the previous patch.  Forgot to commit it.
* Minor pretty printing fix.
* Applied Curt Sampson's optimization improvements
* Issue #9; 3.1.3: The SAX parser was not denormalizing entity references
  in incoming text.  All declared internal entities, as well as numeric
  entities, should now be denormalized.  There was a related bug in that the
  SAX parser was actually double-encoding entities; this is also fixed.
* bin/* programs should now be executable.  Setting bin apps to executable
* Issue 14; 3.1.3: DTD events are now all being passed by StreamParser
  Some of the DTD events were not being passed through by the stream parser.
* #26: Element#add_element(nil) now raises an error Changed XPath searches so
  that if a non-Hash is passed, an error is raised Fixed a spurrious undefined
  method error in encoding.  #29: XPath ordering bug fixed by Mark Williams.
  Incidentally, Mark supplied a superlative bug report, including a full unit
  test.  Then he went ahead and fixed the bug.  It doesn't get any better than
  this, folks.
* Fixed a broken link.  Thanks to Dick Davies for pointing it out.  Added
  functions courtesy of Michael Neumann <mneumann@xxxx.de>.
  Example code to follow.
* Added Michael's sample code.  Merged the changes in from branches/xpath_V
* Fixed preceding:: and following:: axis Fixed the ordering bug that Martin
  Fowler reported.
* Uncommented some code commented for testing Applied Nobu's changes to the
  Encoding infrastructure, which should fix potential threading issues.
* Added more tests, and the missing syncenumerator class.  Fixed the
  inheritance bug in the pull parser that James Britt found.  Indentation
  changes, and changed some exceptions to runtime
  exceptions.
* Changes by Matz, mostly of indent -> indent_level, to avoid
  function/variable naming conflicts
* Tabs -> spaces (whitespace)

Note the addition of syncenumerator.rb.  This is a stopgap, until I can work on
the class enough to get it accepted as a replacement for the SyncEnumerator
that comes with the Generator class.  My version is orders of magnitude faster
than the Generator SyncEnumerator, but is currently missing a couple of
features of the original.  Eventually, I expect this class to migrate to
another part of the source tree.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@8483 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2005-05-19 02:58:11 +00:00

192 lines
4.4 KiB
Ruby

require 'rexml/parseexception'
require 'rexml/parsers/baseparser'
require 'rexml/xmltokens'
module REXML
module Parsers
# = Using the Pull Parser
# <em>This API is experimental, and subject to change.</em>
# parser = PullParser.new( "<a>text<b att='val'/>txet</a>" )
# while parser.has_next?
# res = parser.next
# puts res[1]['att'] if res.start_tag? and res[0] == 'b'
# end
# See the PullEvent class for information on the content of the results.
# The data is identical to the arguments passed for the various events to
# the StreamListener API.
#
# Notice that:
# parser = PullParser.new( "<a>BAD DOCUMENT" )
# while parser.has_next?
# res = parser.next
# raise res[1] if res.error?
# end
#
# Nat Price gave me some good ideas for the API.
class PullParser
include XMLTokens
def initialize stream
@entities = {}
@listeners = nil
@parser = BaseParser.new( stream )
end
def add_listener( listener )
@listeners = [] unless @listeners
@listeners << listener
end
def each
while has_next?
yield self.pull
end
end
def peek depth=0
PullEvent.new(@parser.peek(depth))
end
def has_next?
@parser.has_next?
end
def pull
event = @parser.pull
case event[0]
when :entitydecl
@entities[ event[1] ] =
event[2] unless event[2] =~ /PUBLIC|SYSTEM/
when :text
unnormalized = @parser.unnormalize( event[1], @entities )
event << unnormalized
end
PullEvent.new( event )
end
def unshift token
@parser.unshift token
end
def entity reference
@parser.entity( reference )
end
def empty?
@parser.empty?
end
end
# A parsing event. The contents of the event are accessed as an +Array?,
# and the type is given either by the ...? methods, or by accessing the
# +type+ accessor. The contents of this object vary from event to event,
# but are identical to the arguments passed to +StreamListener+s for each
# event.
class PullEvent
# The type of this event. Will be one of :tag_start, :tag_end, :text,
# :processing_instruction, :comment, :doctype, :attlistdecl, :entitydecl,
# :notationdecl, :entity, :cdata, :xmldecl, or :error.
def initialize(arg)
@contents = arg
end
def []( start, endd=nil)
if start.kind_of? Range
@contents.slice( start.begin+1 .. start.end )
elsif start.kind_of? Numeric
if endd.nil?
@contents.slice( start+1 )
else
@contents.slice( start+1, endd )
end
else
raise "Illegal argument #{start.inspect} (#{start.class})"
end
end
def event_type
@contents[0]
end
# Content: [ String tag_name, Hash attributes ]
def start_element?
@contents[0] == :start_element
end
# Content: [ String tag_name ]
def end_element?
@contents[0] == :end_element
end
# Content: [ String raw_text, String unnormalized_text ]
def text?
@contents[0] == :text
end
# Content: [ String text ]
def instruction?
@contents[0] == :processing_instruction
end
# Content: [ String text ]
def comment?
@contents[0] == :comment
end
# Content: [ String name, String pub_sys, String long_name, String uri ]
def doctype?
@contents[0] == :start_doctype
end
# Content: [ String text ]
def attlistdecl?
@contents[0] == :attlistdecl
end
# Content: [ String text ]
def elementdecl?
@contents[0] == :elementdecl
end
# Due to the wonders of DTDs, an entity declaration can be just about
# anything. There's no way to normalize it; you'll have to interpret the
# content yourself. However, the following is true:
#
# * If the entity declaration is an internal entity:
# [ String name, String value ]
# Content: [ String text ]
def entitydecl?
@contents[0] == :entitydecl
end
# Content: [ String text ]
def notationdecl?
@contents[0] == :notationdecl
end
# Content: [ String text ]
def entity?
@contents[0] == :entity
end
# Content: [ String text ]
def cdata?
@contents[0] == :cdata
end
# Content: [ String version, String encoding, String standalone ]
def xmldecl?
@contents[0] == :xmldecl
end
def error?
@contents[0] == :error
end
def inspect
@contents[0].to_s + ": " + @contents[1..-1].inspect
end
end
end
end