[Haml] Add support for a workaround for fake ASCII input strings.

Closes gh-3

This is a complicated issue, but I'll do my best to explain it here.
By default, Haml encodes its templates as Encoding.default_internal,
which is usually UTF-8. This means that strings printed to the
template should be either UTF-8 or UTF-8-compatible ASCII. So far, all
well and good.

Now, it's possible to have strings that are marked as ASCII-8bit, but
which aren't UTF-8 compatible. This includes valid UTF-8 strings that
are forced into an ASCII-8bit encoding. If one of these strings is
concatenated to a UTF-8 string, Ruby says "I don't know what to do
with these non-ASCII characters!" and throws an encoding error. I call
this sort of string "fake ASCII."

This is what was happening in the referenced GitHub issue (or at least
in the sample app Adam Salter created at
http://github.com/adamsalter/test-project/tree/haml_utf8). The
template was UTF-8 encoded, and it was being passed a fake ASCII
string, marked as ASCII-8bit but with UTF-8 byte sequences in it, and
it was choking.

The issue now becomes: where is this fake ASCII string coming from?
From the database. The database drivers used by Rails aren't Ruby 1.9
compatible. Despite storing UTF-8 strings in the database, the drivers
return fake ASCII strings.

The best solution to this is clearly to fix the database drivers, but
that will probably take some time. One stop-gap would be to call
`force_encoding("utf-8")` on all the database values somewhere, which
is still a little annoying. Finally, the solution provided in this
commit is to set `:encoding => "ascii-8bit"` for Haml. This makes the
Haml template itself fake ASCII, which is wrong but will help prevent
encoding errors.
This commit is contained in:
Nathan Weizenbaum 2009-11-08 15:59:52 -08:00
parent 44c86ec89e
commit 76bd406875
4 changed files with 28 additions and 1 deletions

View File

@ -3,6 +3,13 @@
* Table of contents
{:toc}
## 2.2.13 (Unreleased)
* Allow users to specify {file:HAML_REFERENCE.md#encoding_option `:encoding => "ascii-8bit"`}
even for templates that include non-ASCII byte sequences.
This makes Haml templates not crash when given non-ASCII input
that's marked as having an ASCII encoding.
## [2.2.12](http://github.com/nex3/haml/commit/2.2.12)
There were no changes made to Sass between versions 2.2.11 and 2.2.12.

View File

@ -192,6 +192,14 @@ Available options are:
before being passed into the Haml template.
Defaults to `Encoding.default_internal` or, if that's not set, `"utf-8"`.
Many Ruby database drivers are not yet Ruby 1.9 compatible;
in particular, they return strings marked as ASCII-encoded
even when those strings contain non-ASCII characters (such as UTF-8).
**This will cause encoding errors** if the Haml encoding isn't set to `"ascii-8bit"`.
To solve this, either call `#force_encoding` on all the strings returned from the database,
set `:encoding` to `"ascii-8bit"`, or try to get the authors of the database drivers
to make them Ruby 1.9 compatible.
## Plain Text
A substantial portion of any HTML document is its content,

View File

@ -58,7 +58,9 @@ module Haml
# @return [String]
def precompiled
return @precompiled if ruby1_8?
return @precompiled.encode(Encoding.find(@options[:encoding]))
encoding = Encoding.find(@options[:encoding])
return @precompiled.force_encoding(encoding) if encoding == Encoding::BINARY
return @precompiled.encode(encoding)
end
# Precompiles the Haml template.

View File

@ -1189,6 +1189,16 @@ HTML
HAML
end
def test_fake_ascii_encoding
assert_equal(<<HTML.force_encoding("ascii-8bit"), render(<<HAML, :encoding => "ascii-8bit"))
<p>bâr</p>
<p>föö</p>
HTML
%p bâr
%p föö
HAML
end
def test_convert_template_render_proc
assert_converts_template_properly {|e| e.render_proc.call}
end