mirror of
https://github.com/ruby/ruby.git
synced 2022-11-09 12:17:21 -05:00
[ruby/csv] Add handling for ambiguous parsing options (https://github.com/ruby/csv/pull/226)
GitHub: fix GH-225
With Ruby 3.0.2 and csv 3.2.1, the file
```ruby
require "csv"
File.open("example.tsv", "w") { |f| f.puts("foo\t\tbar") }
CSV.read("example.tsv", col_sep: "\t", strip: true)
```
produces the error
```
lib/csv/parser.rb:935:in `parse_quotable_robust': TODO: Meaningful
message in line 1. (CSV::MalformedCSVError)
```
However, the CSV in this example is not malformed; instead, ambiguous
options were provided to the parser. It is not obvious (to me) whether
the string should be parsed as
- `["foo\t\tbar"]`,
- `["foo", "bar"]`,
- `["foo", "", "bar"]`, or
- `["foo", nil, "bar"]`.
This commit adds code that raises an exception when this situation is
encountered. Specifically, it checks if the column separator either ends
with or starts with the characters that would be stripped away.
This commit also adds unit tests and updates the documentation.
cc317dd42d
This commit is contained in:
parent
47c53af168
commit
c70dc3cafb
Notes:
git
2021-12-24 14:35:55 +09:00
3 changed files with 59 additions and 6 deletions
|
@ -361,6 +361,7 @@ class CSV
|
|||
prepare_skip_lines
|
||||
prepare_strip
|
||||
prepare_separators
|
||||
validate_strip_and_col_sep_options
|
||||
prepare_quoted
|
||||
prepare_unquoted
|
||||
prepare_line
|
||||
|
@ -531,6 +532,28 @@ class CSV
|
|||
@not_line_end = Regexp.new("[^\r\n]+".encode(@encoding))
|
||||
end
|
||||
|
||||
# This method verifies that there are no (obvious) ambiguities with the
|
||||
# provided +col_sep+ and +strip+ parsing options. For example, if +col_sep+
|
||||
# and +strip+ were both equal to +\t+, then there would be no clear way to
|
||||
# parse the input.
|
||||
def validate_strip_and_col_sep_options
|
||||
return unless @strip
|
||||
|
||||
if @strip.is_a?(String)
|
||||
if @column_separator.start_with?(@strip) || @column_separator.end_with?(@strip)
|
||||
raise ArgumentError,
|
||||
"The provided strip (#{@escaped_strip}) and " \
|
||||
"col_sep (#{@escaped_column_separator}) options are incompatible."
|
||||
end
|
||||
else
|
||||
if Regexp.new("\\A[#{@escaped_strip}]|[#{@escaped_strip}]\\z").match?(@column_separator)
|
||||
raise ArgumentError,
|
||||
"The provided strip (true) and " \
|
||||
"col_sep (#{@escaped_column_separator}) options are incompatible."
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
def prepare_quoted
|
||||
if @quote_character
|
||||
@quotes = Regexp.new(@escaped_quote_character +
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue