1
0
Fork 0
mirror of https://github.com/jashkenas/coffeescript.git synced 2022-11-09 12:23:24 -05:00
Commit graph

16 commits

Author SHA1 Message Date
Alan Pierce
ce971b766f Change OUTDENT tokens to be positioned at the end of the previous token
This commit adds another post-processing step after normal lexing that sets the
locationData on all OUTDENT tokens to be at the last character of the previous
token. This does feel like a little bit of a hack. Ideally the location data
would be set correctly in the first place and not in a post-processing step, but
I tried that and some temporary intermediate tokens were causing problems, so I
decided to set the location data once those intermediate tokens were removed.
Also, having this as a separate processing step makes it more robust and
isolated.

This fixes the problem in https://github.com/decaffeinate/decaffeinate/issues/371 .
In that issue, the CoffeeScript tokens had three OUTDENT tokens in a row, and
the last two overlapped with the `]`. Since at least one of those OUTDENT tokens
was considered part of the function body, the function expression had an ending
position just after the end of the `]`.

OUTDENT tokens are sort of a weird case in the lexer anyway, since they often
don't correspond to an actual location in the source code. It seems like the
code in `lexer.coffee` makes an attempt at finding a good place for them, but in
some cases, it has a bad result. This seems hard to avoid in the general case.
For example, in this code:
```coffee
[->
  a]
```
There must be an OUTDENT between the `a` and the `]`, but CoffeeScript tokens
have an inclusive start and end, so they must always be at least one character
wide (I think). In this case, the lexer was choosing the `]` as the location,
and the parser ended up generating correct location data, I believe because
it ignores the outermost INDENT and OUTDENT tokens. However, with multiple
OUTDENT tokens in a row, the parser ends up producing location data that is
wrong.

It seems to me like there isn't a solid answer to "what location do OUTDENT
tokens have", since it hasn't mattered much, but for this commit, I'm defining
it: they always have the location of the last character of the previous token.
This should hopefully be fairly safe because tokens are still in the same order
relative to each other. Also, it's worth noting that this makes the start
location for OUTDENT tokens awkward. However, OUTDENT tokens are always used to
mark the end of something, so their `last_line` and `last_column` values are
always what matter when determining AST node bounds, so it is most important for
those to be correct.
2016-10-06 19:39:31 -07:00
Alan Pierce
feb42e5128 Add a test that tokens have locations that are in order 2016-08-01 20:28:56 -07:00
Simon Lydell
76c076db55 Fix #3597: Allow interpolations in object keys
The following is now allowed:

    o =
      a: 1
      b: 2
      "#{'c'}": 3
      "#{'d'}": 4
      e: 5
      "#{'f'}": 6
      g: 7

It compiles to:

    o = (
      obj = {
        a: 1,
        b: 2
      },
      obj["" + 'c'] = 3,
      obj["" + 'd'] = 4,
      obj.e = 5,
      obj["" + 'f'] = 6,
      obj.g = 7,
      obj
    );

- Closes #3039. Empty interpolations in object keys are now _supposed_ to be
  allowed.
- Closes #1131. No need to improve error messages for attempted key
  interpolation anymore.
- Implementing this required fixing the following bug: `("" + a): 1` used to
  error out on the colon, saying "unexpected colon". But really, it is the
  attempted object key that is unexpected. Now the error is on the opening
  parenthesis instead.
- However, the above fix broke some error message tests for regexes. The easiest
  way to fix this was to make a seemingly unrelated change: The error messages
  for unexpected identifiers, numbers, strings and regexes now say for example
  'unexpected string' instead of 'unexpected """some #{really long} string"""'.
  In other words, the tag _name_ is used instead of the tag _value_.
  This was way easier to implement, and is more helpful to the user. Using the
  tag value is good for operators, reserved words and the like, but not for
  tokens which can contain any text. For example, 'unexpected identifier' is
  better than 'unexpected expected' (if a variable called 'expected' was used
  erraneously).
- While writing tests for the above point I found a few minor bugs with string
  locations which have been fixed.
2015-02-09 17:32:37 +01:00
Simon Lydell
ffa25aae77 Improve error messages for unexpected regexes 2015-02-03 20:42:50 +01:00
Simon Lydell
f8c366c479 Fix #3822: Include delimiters in string/regex locations 2015-02-03 18:55:38 +01:00
Simon Lydell
05b3707506 Fix #1316: Interpolate interpolations safely
Instead of compiling to `"" + + (+"-");`, `"#{+}-"'` now gives an appropriate
error message:

    [stdin]:1:5: error: unexpected end of interpolation
    "#{+}-"
        ^

This is done by _always_ (instead of just sometimes) wrapping the interpolations
in parentheses in the lexer. Unnecessary parentheses won't be output anyway.

I got tired of updating the tests in test/location.coffee (which I had enough of
in #3770), which relies on implementation details (the exact amount of tokens
generated for a given string of code) to do their testing, so I refactored them
to be less fragile.
2015-01-16 17:19:42 +01:00
Simon Lydell
0dcff507fb Refactor interpolation (and string and regex) handling in lexer
- Fix #3394: Unclosed single-quoted strings (both regular ones and heredocs)
  used to pass through the lexer, causing a parsing error later, while
  double-quoted strings caused an error already in the lexing phase. Now both
  single and double-quoted unclosed strings error out in the lexer (which is the
  more logical option) with consistent error messages. This also fixes the last
  comment by @satyr in #3301.

- Similar to the above, unclosed heregexes also used to pass through the lexer
  and not error until in the parsing phase, which resulted in confusing error
  messages. This has been fixed, too.

- Fix #3348, by adding passing tests.

- Fix #3529: If a string starts with an interpolation, an empty string is no
  longer emitted before the interpolation (unless it is needed to coerce the
  interpolation into a string).

- Block comments cannot contain `*/`. Now the error message also shows exactly
  where the offending `*/`. This improvement might seem unrelated, but I had to
  touch that code anyway to refactor string and regex related code, and the
  change was very trivial. Moreover, it's consistent with the next two points.

- Regexes cannot start with `*`. Now the error message also shows exactly where
  the offending `*` is. (It might actually not be exatly at the start in
  heregexes.) It is a very minor improvement, but it was trivial to add.

- Octal escapes in strings are forbidden in CoffeeScript (just like in
  JavaScript strict mode). However, this used to be the case only for regular
  strings. Now they are also forbidden in heredocs. Moreover, the errors now
  point at the offending octal escape.

- Invalid regex flags are no longer allowed. This includes repeated modifiers
  and unknown ones. Moreover, invalid modifiers do not stop a heregex from
  being matched, which results in better error messages.

- Fix #3621: `///a#{1}///` compiles to `RegExp("a" + 1)`. So does
  `RegExp("a#{1}")`. Still, those two code snippets used to generate different
  tokens, which is a bit weird, but more importantly causes problems for
  coffeelint (see clutchski/coffeelint#340). This required lots of tests in
  test/location.coffee to be updated. Note that some updates to those tests are
  unrelated to this point; some have been updated to be more consistent (I
  discovered this because the refactored code happened to be seemingly more
  correct).

- Regular regex literals used to erraneously allow newlines to be escaped,
  causing invalid JavaScript output. This has been fixed.

- Heregexes may now be completely empty (`//////`), instead of erroring out with
  a confusing message.

- Fix #2388: Heredocs and heregexes used to be lexed simply, which meant that
  you couldn't nest a heredoc within a heredoc (double-quoted, that is) or a
  heregex inside a heregex.

- Fix #2321: If you used division inside interpolation and then a slash later in
  the string containing that interpolation, the division slash and the latter
  slash was erraneously matched as a regex. This has been fixed.

- Indentation inside interpolations in heredocs no longer affect how much
  indentation is removed from each line of the heredoc (which is more
  intuitive).

- Whitespace is now correctly trimmed from the start and end of strings in a few
  edge cases.

- Last but not least, the lexing of interpolated strings now seems to be more
  efficient. For a regular double-quoted string, we used to use a custom
  function to find the end of it (taking interpolations and interpolations
  within interpolations etc. into account). Then we used to re-find the
  interpolations and recursively lex their contents. In effect, the same string
  was processed twice, or even more in the case of deeper nesting of
  interpolations. Now the same string is processed just once.

- Code duplication between regular strings, heredocs, regular regexes and
  heregexes has been reduced.

- The above two points should result in more easily read code, too.
2015-01-04 07:47:09 +01:00
minodisk
deead4bfad Fix wrong location issue in heregex interpolation 2014-07-13 16:39:41 +09:00
minodisk
5920939e23 Fix wrong location issue in "string" interpolation 2014-07-03 13:11:20 +09:00
minodisk
2b539ebea8 Fix wrong location issue in string interpolation starting with line break 2014-07-01 11:28:21 +09:00
dabbler0
159d562230 Fix off-by-one issue with string interpolation in lexer 2014-07-01 10:55:17 +09:00
Marc Häfner
4fd5e9a3ab Better handling of initial indent at file start.
* Detect initial indentation before the first token and enforce it.
  * Don't add `INDENT` token (or the matching `OUTDENT, TERMINATOR`).
2013-06-14 00:28:45 +02:00
Marc Häfner
3c38a34ab2 Fix line numbers when first line is indented.
* Offset @chunkLine for inserted line break.
* Avoid line break insertion for blank lines.
2013-03-01 21:30:07 +01:00
Jason Walton
f67da27d2f Add unit tests, fix last_column reporting. 2013-01-14 17:11:07 -05:00
Jason Walton
97bc9f4730 Add quick unit test for location data. 2013-01-14 15:20:47 -05:00
Jason Walton
cee4f4ab6e Location test. 2012-12-24 08:34:16 -05:00