mirror of
https://github.com/ruby/ruby.git
synced 2022-11-09 12:17:21 -05:00
478 lines
14 KiB
Ruby
478 lines
14 KiB
Ruby
|
# = Introduction
|
||
|
#
|
||
|
# SimpleMarkup parses plain text documents and attempts to decompose
|
||
|
# them into their constituent parts. Some of these parts are high-level:
|
||
|
# paragraphs, chunks of verbatim text, list entries and the like. Other
|
||
|
# parts happen at the character level: a piece of bold text, a word in
|
||
|
# code font. This markup is similar in spirit to that used on WikiWiki
|
||
|
# webs, where folks create web pages using a simple set of formatting
|
||
|
# rules.
|
||
|
#
|
||
|
# SimpleMarkup itself does no output formatting: this is left to a
|
||
|
# different set of classes.
|
||
|
#
|
||
|
# SimpleMarkup is extendable at runtime: you can add new markup
|
||
|
# elements to be recognised in the documents that SimpleMarkup parses.
|
||
|
#
|
||
|
# SimpleMarkup is intended to be the basis for a family of tools which
|
||
|
# share the common requirement that simple, plain-text should be
|
||
|
# rendered in a variety of different output formats and media. It is
|
||
|
# envisaged that SimpleMarkup could be the basis for formating RDoc
|
||
|
# style comment blocks, Wiki entries, and online FAQs.
|
||
|
#
|
||
|
# = Basic Formatting
|
||
|
#
|
||
|
# * SimpleMarkup looks for a document's natural left margin. This is
|
||
|
# used as the initial margin for the document.
|
||
|
#
|
||
|
# * Consecutive lines starting at this margin are considered to be a
|
||
|
# paragraph.
|
||
|
#
|
||
|
# * If a paragraph starts with a "*", "-", or with "<digit>.", then it is
|
||
|
# taken to be the start of a list. The margin in increased to be the
|
||
|
# first non-space following the list start flag. Subsequent lines
|
||
|
# should be indented to this new margin until the list ends. For
|
||
|
# example:
|
||
|
#
|
||
|
# * this is a list with three paragraphs in
|
||
|
# the first item. This is the first paragraph.
|
||
|
#
|
||
|
# And this is the second paragraph.
|
||
|
#
|
||
|
# 1. This is an indented, numbered list.
|
||
|
# 2. This is the second item in that list
|
||
|
#
|
||
|
# This is the third conventional paragraph in the
|
||
|
# first list item.
|
||
|
#
|
||
|
# * This is the second item in the original list
|
||
|
#
|
||
|
# * You can also construct labeled lists, sometimes called description
|
||
|
# or definition lists. Do this by putting the label in square brackets
|
||
|
# and indenting the list body:
|
||
|
#
|
||
|
# [cat] a small furry mammal
|
||
|
# that seems to sleep a lot
|
||
|
#
|
||
|
# [ant] a little insect that is known
|
||
|
# to enjoy picnics
|
||
|
#
|
||
|
# A minor variation on labeled lists uses two colons to separate the
|
||
|
# label from the list body:
|
||
|
#
|
||
|
# cat:: a small furry mammal
|
||
|
# that seems to sleep a lot
|
||
|
#
|
||
|
# ant:: a little insect that is known
|
||
|
# to enjoy picnics
|
||
|
#
|
||
|
# This latter style guarantees that the list bodies' left margins are
|
||
|
# aligned: think of them as a two column table.
|
||
|
#
|
||
|
# * Any line that starts to the right of the current margin is treated
|
||
|
# as verbatim text. This is useful for code listings. The example of a
|
||
|
# list above is also verbatim text.
|
||
|
#
|
||
|
# * A line starting with an equals sign (=) is treated as a
|
||
|
# heading. Level one headings have one equals sign, level two headings
|
||
|
# have two,and so on.
|
||
|
#
|
||
|
# * A line starting with three or more hyphens (at the current indent)
|
||
|
# generates a horizontal rule. THe more hyphens, the thicker the rule
|
||
|
# (within reason, and if supported by the output device)
|
||
|
#
|
||
|
# * You can use markup within text (except verbatim) to change the
|
||
|
# appearance of parts of that text. Out of the box, SimpleMarkup
|
||
|
# supports word-based and general markup.
|
||
|
#
|
||
|
# Word-based markup uses flag characters around individual words:
|
||
|
#
|
||
|
# [\*word*] displays word in a *bold* font
|
||
|
# [\_word_] displays word in an _emphasized_ font
|
||
|
# [\+word+] displays word in a +code+ font
|
||
|
#
|
||
|
# General markup affects text between a start delimiter and and end
|
||
|
# delimiter. Not surprisingly, these delimiters look like HTML markup.
|
||
|
#
|
||
|
# [\<b>text...</b>] displays word in a *bold* font
|
||
|
# [\<em>text...</em>] displays word in an _emphasized_ font
|
||
|
# [\<i>text...</i>] displays word in an _emphasized_ font
|
||
|
# [\<tt>text...</tt>] displays word in a +code+ font
|
||
|
#
|
||
|
# Unlike conventional Wiki markup, general markup can cross line
|
||
|
# boundaries. You can turn off the interpretation of markup by
|
||
|
# preceding the first character with a backslash, so \\\<b>bold
|
||
|
# text</b> and \\\*bold* produce \<b>bold text</b> and \*bold
|
||
|
# respectively.
|
||
|
#
|
||
|
# = Using SimpleMarkup
|
||
|
#
|
||
|
# For information on using SimpleMarkup programatically,
|
||
|
# see SM::SimpleMarkup.
|
||
|
#
|
||
|
# Author:: Dave Thomas, dave@pragmaticprogrammer.com
|
||
|
# Version:: 0.0
|
||
|
# License:: Ruby license
|
||
|
|
||
|
|
||
|
|
||
|
require 'rdoc/markup/simple_markup/fragments'
|
||
|
require 'rdoc/markup/simple_markup/lines.rb'
|
||
|
|
||
|
module SM #:nodoc:
|
||
|
|
||
|
# == Synopsis
|
||
|
#
|
||
|
# This code converts <tt>input_string</tt>, which is in the format
|
||
|
# described in markup/simple_markup.rb, to HTML. The conversion
|
||
|
# takes place in the +convert+ method, so you can use the same
|
||
|
# SimpleMarkup object to convert multiple input strings.
|
||
|
#
|
||
|
# require 'rdoc/markup/simple_markup'
|
||
|
# require 'rdoc/markup/simple_markup/to_html'
|
||
|
#
|
||
|
# p = SM::SimpleMarkup.new
|
||
|
# h = SM::ToHtml.new
|
||
|
#
|
||
|
# puts p.convert(input_string, h)
|
||
|
#
|
||
|
# You can extend the SimpleMarkup parser to recognise new markup
|
||
|
# sequences, and to add special processing for text that matches a
|
||
|
# regular epxression. Here we make WikiWords significant to the parser,
|
||
|
# and also make the sequences {word} and \<no>text...</no> signify
|
||
|
# strike-through text. When then subclass the HTML output class to deal
|
||
|
# with these:
|
||
|
#
|
||
|
# require 'rdoc/markup/simple_markup'
|
||
|
# require 'rdoc/markup/simple_markup/to_html'
|
||
|
#
|
||
|
# class WikiHtml < SM::ToHtml
|
||
|
# def handle_special_WIKIWORD(special)
|
||
|
# "<font color=red>" + special.text + "</font>"
|
||
|
# end
|
||
|
# end
|
||
|
#
|
||
|
# p = SM::SimpleMarkup.new
|
||
|
# p.add_word_pair("{", "}", :STRIKE)
|
||
|
# p.add_html("no", :STRIKE)
|
||
|
#
|
||
|
# p.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
|
||
|
#
|
||
|
# h = WikiHtml.new
|
||
|
# h.add_tag(:STRIKE, "<strike>", "</strike>")
|
||
|
#
|
||
|
# puts "<body>" + p.convert(ARGF.read, h) + "</body>"
|
||
|
#
|
||
|
# == Output Formatters
|
||
|
#
|
||
|
# _missing_
|
||
|
#
|
||
|
#
|
||
|
|
||
|
class SimpleMarkup
|
||
|
|
||
|
SPACE = ?\s
|
||
|
|
||
|
# List entries look like:
|
||
|
# * text
|
||
|
# 1. text
|
||
|
# [label] text
|
||
|
# label:: text
|
||
|
#
|
||
|
# Flag it as a list entry, and
|
||
|
# work out the indent for subsequent lines
|
||
|
|
||
|
SIMPLE_LIST_RE = /^(
|
||
|
( \* (?# bullet)
|
||
|
|- (?# bullet)
|
||
|
|\d+\. (?# numbered )
|
||
|
|[A-Za-z]\. (?# alphabetically numbered )
|
||
|
)
|
||
|
\s+
|
||
|
)\S/x
|
||
|
|
||
|
LABEL_LIST_RE = /^(
|
||
|
( \[.*?\] (?# labeled )
|
||
|
|\S.*:: (?# note )
|
||
|
)(?=\s|$)
|
||
|
\s*
|
||
|
)/x
|
||
|
|
||
|
|
||
|
##
|
||
|
# take a block of text and use various heuristics to determine
|
||
|
# it's structure (paragraphs, lists, and so on). Invoke an
|
||
|
# event handler as we identify significant chunks.
|
||
|
#
|
||
|
|
||
|
def initialize
|
||
|
@am = AttributeManager.new
|
||
|
@output = nil
|
||
|
end
|
||
|
|
||
|
##
|
||
|
# Add to the sequences used to add formatting to an individual word
|
||
|
# (such as *bold*). Matching entries will generate attibutes
|
||
|
# that the output formatters can recognize by their +name+
|
||
|
|
||
|
def add_word_pair(start, stop, name)
|
||
|
@am.add_word_pair(start, stop, name)
|
||
|
end
|
||
|
|
||
|
##
|
||
|
# Add to the sequences recognized as general markup
|
||
|
#
|
||
|
|
||
|
def add_html(tag, name)
|
||
|
@am.add_html(tag, name)
|
||
|
end
|
||
|
|
||
|
##
|
||
|
# Add to other inline sequences. For example, we could add
|
||
|
# WikiWords using something like:
|
||
|
#
|
||
|
# parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
|
||
|
#
|
||
|
# Each wiki word will be presented to the output formatter
|
||
|
# via the accept_special method
|
||
|
#
|
||
|
|
||
|
def add_special(pattern, name)
|
||
|
@am.add_special(pattern, name)
|
||
|
end
|
||
|
|
||
|
|
||
|
# We take a string, split it into lines, work out the type of
|
||
|
# each line, and from there deduce groups of lines (for example
|
||
|
# all lines in a paragraph). We then invoke the output formatter
|
||
|
# using a Visitor to display the result
|
||
|
|
||
|
def convert(str, op)
|
||
|
@lines = Lines.new(str.split(/\r?\n/).collect { |aLine|
|
||
|
Line.new(aLine) })
|
||
|
return "" if @lines.empty?
|
||
|
@lines.normalize
|
||
|
assign_types_to_lines
|
||
|
group = group_lines
|
||
|
# call the output formatter to handle the result
|
||
|
# group.to_a.each {|i| p i}
|
||
|
group.accept(@am, op)
|
||
|
end
|
||
|
|
||
|
|
||
|
#######
|
||
|
private
|
||
|
#######
|
||
|
|
||
|
|
||
|
##
|
||
|
# Look through the text at line indentation. We flag each line as being
|
||
|
# Blank, a paragraph, a list element, or verbatim text
|
||
|
#
|
||
|
|
||
|
def assign_types_to_lines(margin = 0, level = 0)
|
||
|
|
||
|
while line = @lines.next
|
||
|
if line.isBlank?
|
||
|
line.stamp(Line::BLANK, level)
|
||
|
next
|
||
|
end
|
||
|
|
||
|
# if a line contains non-blanks before the margin, then it must belong
|
||
|
# to an outer level
|
||
|
|
||
|
text = line.text
|
||
|
|
||
|
for i in 0...margin
|
||
|
if text[i] != SPACE
|
||
|
@lines.unget
|
||
|
return
|
||
|
end
|
||
|
end
|
||
|
|
||
|
active_line = text[margin..-1]
|
||
|
|
||
|
# Rules (horizontal lines) look like
|
||
|
#
|
||
|
# --- (three or more hyphens)
|
||
|
#
|
||
|
# The more hyphens, the thicker the rule
|
||
|
#
|
||
|
|
||
|
if /^(---+)\s*$/ =~ active_line
|
||
|
line.stamp(Line::RULE, level, $1.length-2)
|
||
|
next
|
||
|
end
|
||
|
|
||
|
# Then look for list entries. First the ones that have to have
|
||
|
# text following them (* xxx, - xxx, and dd. xxx)
|
||
|
|
||
|
if SIMPLE_LIST_RE =~ active_line
|
||
|
|
||
|
offset = margin + $1.length
|
||
|
prefix = $2
|
||
|
prefix_length = prefix.length
|
||
|
|
||
|
flag = case prefix
|
||
|
when "*","-" then ListBase::BULLET
|
||
|
when /^\d/ then ListBase::NUMBER
|
||
|
when /^[A-Z]/ then ListBase::UPPERALPHA
|
||
|
when /^[a-z]/ then ListBase::LOWERALPHA
|
||
|
else raise "Invalid List Type: #{self.inspect}"
|
||
|
end
|
||
|
|
||
|
line.stamp(Line::LIST, level+1, prefix, flag)
|
||
|
text[margin, prefix_length] = " " * prefix_length
|
||
|
assign_types_to_lines(offset, level + 1)
|
||
|
next
|
||
|
end
|
||
|
|
||
|
|
||
|
if LABEL_LIST_RE =~ active_line
|
||
|
offset = margin + $1.length
|
||
|
prefix = $2
|
||
|
prefix_length = prefix.length
|
||
|
|
||
|
next if handled_labeled_list(line, level, margin, offset, prefix)
|
||
|
end
|
||
|
|
||
|
# Headings look like
|
||
|
# = Main heading
|
||
|
# == Second level
|
||
|
# === Third
|
||
|
#
|
||
|
# Headings reset the level to 0
|
||
|
|
||
|
if active_line[0] == ?= and active_line =~ /^(=+)\s*(.*)/
|
||
|
prefix_length = $1.length
|
||
|
prefix_length = 6 if prefix_length > 6
|
||
|
line.stamp(Line::HEADING, 0, prefix_length)
|
||
|
line.strip_leading(margin + prefix_length)
|
||
|
next
|
||
|
end
|
||
|
|
||
|
# If the character's a space, then we have verbatim text,
|
||
|
# otherwise
|
||
|
|
||
|
if active_line[0] == SPACE
|
||
|
line.strip_leading(margin) if margin > 0
|
||
|
line.stamp(Line::VERBATIM, level)
|
||
|
else
|
||
|
line.stamp(Line::PARAGRAPH, level)
|
||
|
end
|
||
|
end
|
||
|
end
|
||
|
|
||
|
# Handle labeled list entries, We have a special case
|
||
|
# to deal with. Because the labels can be long, they force
|
||
|
# the remaining block of text over the to right:
|
||
|
#
|
||
|
# this is a long label that I wrote:: and here is the
|
||
|
# block of text with
|
||
|
# a silly margin
|
||
|
#
|
||
|
# So we allow the special case. If the label is followed
|
||
|
# by nothing, and if the following line is indented, then
|
||
|
# we take the indent of that line as the new margin
|
||
|
#
|
||
|
# this is a long label that I wrote::
|
||
|
# here is a more reasonably indented block which
|
||
|
# will ab attached to the label.
|
||
|
#
|
||
|
|
||
|
def handled_labeled_list(line, level, margin, offset, prefix)
|
||
|
prefix_length = prefix.length
|
||
|
text = line.text
|
||
|
flag = nil
|
||
|
case prefix
|
||
|
when /^\[/
|
||
|
flag = ListBase::LABELED
|
||
|
prefix = prefix[1, prefix.length-2]
|
||
|
when /:$/
|
||
|
flag = ListBase::NOTE
|
||
|
prefix.chop!
|
||
|
else raise "Invalid List Type: #{self.inspect}"
|
||
|
end
|
||
|
|
||
|
# body is on the next line
|
||
|
|
||
|
if text.length <= offset
|
||
|
original_line = line
|
||
|
line = @lines.next
|
||
|
return(false) unless line
|
||
|
text = line.text
|
||
|
|
||
|
for i in 0..margin
|
||
|
if text[i] != SPACE
|
||
|
@lines.unget
|
||
|
return false
|
||
|
end
|
||
|
end
|
||
|
i = margin
|
||
|
i += 1 while text[i] == SPACE
|
||
|
if i >= text.length
|
||
|
@lines.unget
|
||
|
return false
|
||
|
else
|
||
|
offset = i
|
||
|
prefix_length = 0
|
||
|
@lines.delete(original_line)
|
||
|
end
|
||
|
end
|
||
|
|
||
|
line.stamp(Line::LIST, level+1, prefix, flag)
|
||
|
text[margin, prefix_length] = " " * prefix_length
|
||
|
assign_types_to_lines(offset, level + 1)
|
||
|
return true
|
||
|
end
|
||
|
|
||
|
# Return a block consisting of fragments which are
|
||
|
# paragraphs, list entries or verbatim text. We merge consecutive
|
||
|
# lines of the same type and level together. We are also slightly
|
||
|
# tricky with lists: the lines following a list introduction
|
||
|
# look like paragraph lines at the next level, and we remap them
|
||
|
# into list entries instead
|
||
|
|
||
|
def group_lines
|
||
|
@lines.rewind
|
||
|
|
||
|
inList = false
|
||
|
wantedType = wantedLevel = nil
|
||
|
|
||
|
block = LineCollection.new
|
||
|
group = nil
|
||
|
|
||
|
while line = @lines.next
|
||
|
if line.level == wantedLevel and line.type == wantedType
|
||
|
group.add_text(line.text)
|
||
|
else
|
||
|
group = block.fragment_for(line)
|
||
|
block.add(group)
|
||
|
if line.type == Line::LIST
|
||
|
wantedType = Line::PARAGRAPH
|
||
|
else
|
||
|
wantedType = line.type
|
||
|
end
|
||
|
wantedLevel = line.level
|
||
|
end
|
||
|
end
|
||
|
|
||
|
block.normalize
|
||
|
block
|
||
|
end
|
||
|
|
||
|
## for debugging, we allow access to our line contents as text
|
||
|
def content
|
||
|
@lines.as_text
|
||
|
end
|
||
|
public :content
|
||
|
|
||
|
## for debugging, return the list of line types
|
||
|
def get_line_types
|
||
|
@lines.line_types
|
||
|
end
|
||
|
public :get_line_types
|
||
|
end
|
||
|
|
||
|
end
|