rails--rails/actionpack/lib/action_controller/vendor/html-scanner/html/tokenizer.rb

require 'strscan'

module HTML #:nodoc:
  
  # A simple HTML tokenizer. It simply breaks a stream of text into tokens, where each
  # token is a string. Each string represents either "text", or an HTML element.
  #
  # This currently assumes valid XHTML, which means no free < or > characters.
  #
  # Usage:
  #
  #   tokenizer = HTML::Tokenizer.new(text)
  #   while token = tokenizer.next
  #     p token
  #   end
  class Tokenizer #:nodoc:
    
    # The current (byte) position in the text
    attr_reader :position
    
    # The current line number
    attr_reader :line
    
    # Create a new Tokenizer for the given text.
    def initialize(text)
      @scanner = StringScanner.new(text)
      @position = 0
      @line = 0
      @current_line = 1
    end

    # Return the next token in the sequence, or +nil+ if there are no more tokens in
    # the stream.
    def next
      return nil if @scanner.eos?
      @position = @scanner.pos
      @line = @current_line
      if @scanner.check(/<\S/)
        update_current_line(scan_tag)
      else
        update_current_line(scan_text)
      end
    end
  
    private

      # Treat the text at the current position as a tag, and scan it. Supports
      # comments, doctype tags, and regular tags, and ignores less-than and
      # greater-than characters within quoted strings.
      def scan_tag
        tag = @scanner.getch
        if @scanner.scan(/!--/) # comment
          tag << @scanner.matched
          tag << (@scanner.scan_until(/--\s*>/) || @scanner.scan_until(/\Z/))
        elsif @scanner.scan(/!/) # doctype
          tag << @scanner.matched
          tag << consume_quoted_regions
        else
          tag << consume_quoted_regions
        end
        tag
      end

      # Scan all text up to the next < character and return it.
      def scan_text
        "#{@scanner.getch}#{@scanner.scan(/[^<]*/)}"
      end
      
      # Counts the number of newlines in the text and updates the current line
      # accordingly.
      def update_current_line(text)
        text.scan(/\r?\n/) { @current_line += 1 }
      end
      
      # Skips over quoted strings, so that less-than and greater-than characters
      # within the strings are ignored.
      def consume_quoted_regions
        text = ""
        loop do
          match = @scanner.scan_until(/['"<>]/) or break

          delim = @scanner.matched
          if delim == "<"
            match = match.chop
            @scanner.pos -= 1
          end

          text << match
          break if delim == "<" || delim == ">"

          # consume the quoted region
          while match = @scanner.scan_until(/[\\#{delim}]/)
            text << match
            break if @scanner.matched == delim
            text << @scanner.getch # skip the escaped character
          end
        end
        text
      end
  end
  
end
Added assert_tag and assert_no_tag as a much improved alternative to the deprecated assert_template_xpath_match #1126 [Jamis Buck] git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1195 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-04-17 12:43:48 -04:00			`require 'strscan'`

Updated vendor copy of html-scanner lib, for bug fixes and optimizations git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1416 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-06-14 06:30:36 -04:00			`module HTML #:nodoc:`
Added assert_tag and assert_no_tag as a much improved alternative to the deprecated assert_template_xpath_match #1126 [Jamis Buck] git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1195 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-04-17 12:43:48 -04:00
			`# A simple HTML tokenizer. It simply breaks a stream of text into tokens, where each`
			`# token is a string. Each string represents either "text", or an HTML element.`
			`#`
			`# This currently assumes valid XHTML, which means no free < or > characters.`
			`#`
			`# Usage:`
			`#`
			`# tokenizer = HTML::Tokenizer.new(text)`
			`# while token = tokenizer.next`
			`# p token`
			`# end`
Updated vendor copy of html-scanner lib, for bug fixes and optimizations git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1416 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-06-14 06:30:36 -04:00			`class Tokenizer #:nodoc:`
Added assert_tag and assert_no_tag as a much improved alternative to the deprecated assert_template_xpath_match #1126 [Jamis Buck] git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1195 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-04-17 12:43:48 -04:00
			`# The current (byte) position in the text`
			`attr_reader :position`

			`# The current line number`
			`attr_reader :line`

			`# Create a new Tokenizer for the given text.`
			`def initialize(text)`
			`@scanner = StringScanner.new(text)`
			`@position = 0`
			`@line = 0`
			`@current_line = 1`
			`end`

			`# Return the next token in the sequence, or +nil+ if there are no more tokens in`
			`# the stream.`
			`def next`
			`return nil if @scanner.eos?`
			`@position = @scanner.pos`
			`@line = @current_line`
			`if @scanner.check(/<\S/)`
			`update_current_line(scan_tag)`
			`else`
			`update_current_line(scan_text)`
			`end`
			`end`

			`private`

			`# Treat the text at the current position as a tag, and scan it. Supports`
			`# comments, doctype tags, and regular tags, and ignores less-than and`
			`# greater-than characters within quoted strings.`
			`def scan_tag`
			`tag = @scanner.getch`
			`if @scanner.scan(/!--/) # comment`
			`tag << @scanner.matched`
Updated vendor copy of html-scanner lib, for bug fixes and optimizations git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1416 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-06-14 06:30:36 -04:00			`tag << (@scanner.scan_until(/--\s*>/) \|\| @scanner.scan_until(/\Z/))`
Added assert_tag and assert_no_tag as a much improved alternative to the deprecated assert_template_xpath_match #1126 [Jamis Buck] git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1195 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-04-17 12:43:48 -04:00			`elsif @scanner.scan(/!/) # doctype`
			`tag << @scanner.matched`
			`tag << consume_quoted_regions`
			`else`
			`tag << consume_quoted_regions`
			`end`
			`tag`
			`end`

			`# Scan all text up to the next < character and return it.`
			`def scan_text`
Updated vendor copy of html-scanner lib, for bug fixes and optimizations git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1416 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-06-14 06:30:36 -04:00			`"#{@scanner.getch}#{@scanner.scan(/[^<]*/)}"`
Added assert_tag and assert_no_tag as a much improved alternative to the deprecated assert_template_xpath_match #1126 [Jamis Buck] git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1195 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-04-17 12:43:48 -04:00			`end`

			`# Counts the number of newlines in the text and updates the current line`
			`# accordingly.`
			`def update_current_line(text)`
Updated vendor copy of html-scanner lib, for bug fixes and optimizations git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1416 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-06-14 06:30:36 -04:00			`text.scan(/\r?\n/) { @current_line += 1 }`
Added assert_tag and assert_no_tag as a much improved alternative to the deprecated assert_template_xpath_match #1126 [Jamis Buck] git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1195 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-04-17 12:43:48 -04:00			`end`

			`# Skips over quoted strings, so that less-than and greater-than characters`
			`# within the strings are ignored.`
			`def consume_quoted_regions`
			`text = ""`
			`loop do`
Fixed the HTML scanner used by assert_tag where a infinite loop could be caused by a stray less-than sign in the input #1270 [Jamis Buck] git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1297 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-05-09 07:20:19 -04:00			`match = @scanner.scan_until(/['"<>]/) or break`

			`delim = @scanner.matched`
			`if delim == "<"`
			`match = match.chop`
			`@scanner.pos -= 1`
			`end`

Added assert_tag and assert_no_tag as a much improved alternative to the deprecated assert_template_xpath_match #1126 [Jamis Buck] git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1195 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-04-17 12:43:48 -04:00			`text << match`
Fixed the HTML scanner used by assert_tag where a infinite loop could be caused by a stray less-than sign in the input #1270 [Jamis Buck] git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1297 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-05-09 07:20:19 -04:00			`break if delim == "<" \|\| delim == ">"`

Updated vendor copy of html-scanner lib, for bug fixes and optimizations git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1416 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-06-14 06:30:36 -04:00			`# consume the quoted region`
Added assert_tag and assert_no_tag as a much improved alternative to the deprecated assert_template_xpath_match #1126 [Jamis Buck] git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1195 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-04-17 12:43:48 -04:00			`while match = @scanner.scan_until(/[\\#{delim}]/)`
			`text << match`
			`break if @scanner.matched == delim`
			`text << @scanner.getch # skip the escaped character`
			`end`
			`end`
			`text`
			`end`
			`end`

Added functionality to assert_tag, so you can now do tests on the siblings of a node, to assert that some element comes before or after the element in question, or just to assert that some element exists as a sibling #1226 [Jamis Buck] git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@1291 5ecf4fe2-1ee6-0310-87b1-e25e094e27de 2005-05-06 12:42:01 -04:00			`end`