2009-08-16 02:31:13 -04:00
|
|
|
# encoding: US-ASCII
|
2015-12-01 21:40:02 -05:00
|
|
|
# frozen_string_literal: true
|
2007-12-24 21:46:26 -05:00
|
|
|
# = csv.rb -- CSV Reading and Writing
|
|
|
|
#
|
2018-05-09 00:39:16 -04:00
|
|
|
# Created by James Edward Gray II on 2005-10-31.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# See CSV for documentation.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# == Description
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# Welcome to the new and improved CSV.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2019-10-12 01:03:21 -04:00
|
|
|
# This version of the CSV library began its life as FasterCSV. FasterCSV was
|
|
|
|
# intended as a replacement to Ruby's then standard CSV library. It was
|
2007-12-24 21:46:26 -05:00
|
|
|
# designed to address concerns users of that library had and it had three
|
|
|
|
# primary goals:
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# 1. Be significantly faster than CSV while remaining a pure Ruby library.
|
2019-10-12 01:03:21 -04:00
|
|
|
# 2. Use a smaller and easier to maintain code base. (FasterCSV eventually
|
|
|
|
# grew larger, was also but considerably richer in features. The parsing
|
2007-12-24 21:46:26 -05:00
|
|
|
# core remains quite small.)
|
|
|
|
# 3. Improve on the CSV interface.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2019-10-12 01:03:21 -04:00
|
|
|
# Obviously, the last one is subjective. I did try to defer to the original
|
2007-12-24 21:46:26 -05:00
|
|
|
# interface whenever I didn't have a compelling reason to change it though, so
|
|
|
|
# hopefully this won't be too radically different.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# We must have met our goals because FasterCSV was renamed to CSV and replaced
|
2011-05-26 09:32:40 -04:00
|
|
|
# the original library as of Ruby 1.9. If you are migrating code from 1.8 or
|
|
|
|
# earlier, you may have to change your code to comply with the new interface.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2019-10-12 01:03:21 -04:00
|
|
|
# == What's the Different From the Old CSV?
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# I'm sure I'll miss something, but I'll try to mention most of the major
|
|
|
|
# differences I am aware of, to help others quickly get up to speed:
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# === \CSV Parsing
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2019-10-12 01:03:21 -04:00
|
|
|
# * This parser is m17n aware. See CSV for full details.
|
2007-12-24 21:46:26 -05:00
|
|
|
# * This library has a stricter parser and will throw MalformedCSVErrors on
|
|
|
|
# problematic data.
|
2019-10-12 01:03:21 -04:00
|
|
|
# * This library has a less liberal idea of a line ending than CSV. What you
|
|
|
|
# set as the <tt>:row_sep</tt> is law. It can auto-detect your line endings
|
2007-12-24 21:46:26 -05:00
|
|
|
# though.
|
2019-10-12 01:03:21 -04:00
|
|
|
# * The old library returned empty lines as <tt>[nil]</tt>. This library calls
|
2007-12-24 21:46:26 -05:00
|
|
|
# them <tt>[]</tt>.
|
|
|
|
# * This library has a much faster parser.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# === Interface
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# * CSV now uses Hash-style parameters to set options.
|
|
|
|
# * CSV no longer has generate_row() or parse_row().
|
|
|
|
# * The old CSV's Reader and Writer classes have been dropped.
|
|
|
|
# * CSV::open() is now more like Ruby's open().
|
|
|
|
# * CSV objects now support most standard IO methods.
|
|
|
|
# * CSV now has a new() method used to wrap objects like String and IO for
|
|
|
|
# reading and writing.
|
|
|
|
# * CSV::generate() is different from the old method.
|
2019-10-12 01:03:21 -04:00
|
|
|
# * CSV no longer supports partial reads. It works line-by-line.
|
2007-12-24 21:46:26 -05:00
|
|
|
# * CSV no longer allows the instance methods to override the separators for
|
2019-10-12 01:03:21 -04:00
|
|
|
# performance reasons. They must be set in the constructor.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# If you use this library and find yourself missing any functionality I have
|
|
|
|
# trimmed, please {let me know}[mailto:james@grayproductions.net].
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# == Documentation
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# See CSV for documentation.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# == What is CSV, really?
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# CSV maintains a pretty strict definition of CSV taken directly from
|
2019-10-12 01:03:21 -04:00
|
|
|
# {the RFC}[http://www.ietf.org/rfc/rfc4180.txt]. I relax the rules in only one
|
|
|
|
# place and that is to make using this library easier. CSV will parse all valid
|
2007-12-24 21:46:26 -05:00
|
|
|
# CSV.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2019-10-12 01:03:21 -04:00
|
|
|
# What you don't want to do is to feed CSV invalid data. Because of the way the
|
2007-12-24 21:46:26 -05:00
|
|
|
# CSV format works, it's common for a parser to need to read until the end of
|
2019-10-12 01:03:21 -04:00
|
|
|
# the file to be sure a field is invalid. This consumes a lot of time and memory.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# Luckily, when working with invalid CSV, Ruby's built-in methods will almost
|
2019-10-12 01:03:21 -04:00
|
|
|
# always be superior in every way. For example, parsing non-quoted fields is as
|
2007-12-24 21:46:26 -05:00
|
|
|
# easy as:
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# data.split(",")
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# == Questions and/or Comments
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# Feel free to email {James Edward Gray II}[mailto:james@grayproductions.net]
|
|
|
|
# with any questions.
|
|
|
|
|
|
|
|
require "forwardable"
|
|
|
|
require "date"
|
|
|
|
require "stringio"
|
2018-05-09 00:39:16 -04:00
|
|
|
|
2018-12-23 02:00:35 -05:00
|
|
|
require_relative "csv/fields_converter"
|
2021-09-11 18:34:15 -04:00
|
|
|
require_relative "csv/input_record_separator"
|
2018-12-23 02:00:35 -05:00
|
|
|
require_relative "csv/match_p"
|
|
|
|
require_relative "csv/parser"
|
|
|
|
require_relative "csv/row"
|
|
|
|
require_relative "csv/table"
|
|
|
|
require_relative "csv/writer"
|
2018-05-09 00:39:16 -04:00
|
|
|
|
2018-12-23 02:00:35 -05:00
|
|
|
using CSV::MatchP if CSV.const_defined?(:MatchP)
|
2007-12-24 21:46:26 -05:00
|
|
|
|
2020-06-30 21:30:49 -04:00
|
|
|
# == \CSV
|
|
|
|
# \CSV (comma-separated variables) data is a text representation of a table:
|
|
|
|
# - A _row_ _separator_ delimits table rows.
|
|
|
|
# A common row separator is the newline character <tt>"\n"</tt>.
|
|
|
|
# - A _column_ _separator_ delimits fields in a row.
|
|
|
|
# A common column separator is the comma character <tt>","</tt>.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# This \CSV \String, with row separator <tt>"\n"</tt>
|
|
|
|
# and column separator <tt>","</tt>,
|
|
|
|
# has three rows and two columns:
|
|
|
|
# "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
#
|
|
|
|
# Despite the name \CSV, a \CSV representation can use different separators.
|
|
|
|
#
|
2020-08-20 17:21:36 -04:00
|
|
|
# For more about tables, see the Wikipedia article
|
|
|
|
# "{Table (information)}[https://en.wikipedia.org/wiki/Table_(information)]",
|
|
|
|
# especially its section
|
|
|
|
# "{Simple table}[https://en.wikipedia.org/wiki/Table_(information)#Simple_table]"
|
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# == \Class \CSV
|
|
|
|
#
|
|
|
|
# Class \CSV provides methods for:
|
|
|
|
# - Parsing \CSV data from a \String object, a \File (via its file path), or an \IO object.
|
|
|
|
# - Generating \CSV data to a \String object.
|
|
|
|
#
|
|
|
|
# To make \CSV available:
|
2020-05-26 17:13:05 -04:00
|
|
|
# require 'csv'
|
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# All examples here assume that this has been done.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# == Keeping It Simple
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# A \CSV object has dozens of instance methods that offer fine-grained control
|
|
|
|
# of parsing and generating \CSV data.
|
|
|
|
# For many needs, though, simpler approaches will do.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# This section summarizes the singleton methods in \CSV
|
|
|
|
# that allow you to parse and generate without explicitly
|
|
|
|
# creating \CSV objects.
|
|
|
|
# For details, follow the links.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# === Simple Parsing
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# Parsing methods commonly return either of:
|
|
|
|
# - An \Array of Arrays of Strings:
|
|
|
|
# - The outer \Array is the entire "table".
|
|
|
|
# - Each inner \Array is a row.
|
|
|
|
# - Each \String is a field.
|
|
|
|
# - A CSV::Table object. For details, see
|
|
|
|
# {\CSV with Headers}[#class-CSV-label-CSV+with+Headers].
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# ==== Parsing a \String
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# The input to be parsed can be a string:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# \Method CSV.parse returns the entire \CSV data:
|
|
|
|
# CSV.parse(string) # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# \Method CSV.parse_line returns only the first row:
|
|
|
|
# CSV.parse_line(string) # => ["foo", "0"]
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# \CSV extends class \String with instance method String#parse_csv,
|
|
|
|
# which also returns only the first row:
|
|
|
|
# string.parse_csv # => ["foo", "0"]
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# ==== Parsing Via a \File Path
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# The input to be parsed can be in a file:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
#
|
|
|
|
# \Method CSV.read returns the entire \CSV data:
|
|
|
|
# CSV.read(path) # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
|
|
|
#
|
|
|
|
# \Method CSV.foreach iterates, passing each row to the given block:
|
|
|
|
# CSV.foreach(path) do |row|
|
|
|
|
# p row
|
|
|
|
# end
|
|
|
|
# Output:
|
|
|
|
# ["foo", "0"]
|
|
|
|
# ["bar", "1"]
|
|
|
|
# ["baz", "2"]
|
|
|
|
#
|
|
|
|
# \Method CSV.table returns the entire \CSV data as a CSV::Table object:
|
|
|
|
# CSV.table(path) # => #<CSV::Table mode:col_or_row row_count:3>
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# ==== Parsing from an Open \IO Stream
|
|
|
|
#
|
|
|
|
# The input to be parsed can be in an open \IO stream:
|
|
|
|
#
|
|
|
|
# \Method CSV.read returns the entire \CSV data:
|
|
|
|
# File.open(path) do |file|
|
|
|
|
# CSV.read(file)
|
|
|
|
# end # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
|
|
|
#
|
|
|
|
# As does method CSV.parse:
|
|
|
|
# File.open(path) do |file|
|
|
|
|
# CSV.parse(file)
|
|
|
|
# end # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
|
|
|
#
|
|
|
|
# \Method CSV.parse_line returns only the first row:
|
|
|
|
# File.open(path) do |file|
|
|
|
|
# CSV.parse_line(file)
|
|
|
|
# end # => ["foo", "0"]
|
|
|
|
#
|
|
|
|
# \Method CSV.foreach iterates, passing each row to the given block:
|
|
|
|
# File.open(path) do |file|
|
|
|
|
# CSV.foreach(file) do |row|
|
|
|
|
# p row
|
|
|
|
# end
|
2007-12-24 21:46:26 -05:00
|
|
|
# end
|
2020-06-30 21:30:49 -04:00
|
|
|
# Output:
|
|
|
|
# ["foo", "0"]
|
|
|
|
# ["bar", "1"]
|
|
|
|
# ["baz", "2"]
|
|
|
|
#
|
|
|
|
# \Method CSV.table returns the entire \CSV data as a CSV::Table object:
|
|
|
|
# File.open(path) do |file|
|
|
|
|
# CSV.table(file)
|
|
|
|
# end # => #<CSV::Table mode:col_or_row row_count:3>
|
|
|
|
#
|
|
|
|
# === Simple Generating
|
|
|
|
#
|
|
|
|
# \Method CSV.generate returns a \String;
|
|
|
|
# this example uses method CSV#<< to append the rows
|
|
|
|
# that are to be generated:
|
|
|
|
# output_string = CSV.generate do |csv|
|
|
|
|
# csv << ['foo', 0]
|
|
|
|
# csv << ['bar', 1]
|
|
|
|
# csv << ['baz', 2]
|
|
|
|
# end
|
|
|
|
# output_string # => "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
#
|
|
|
|
# \Method CSV.generate_line returns a \String containing the single row
|
|
|
|
# constructed from an \Array:
|
|
|
|
# CSV.generate_line(['foo', '0']) # => "foo,0\n"
|
|
|
|
#
|
|
|
|
# \CSV extends class \Array with instance method <tt>Array#to_csv</tt>,
|
|
|
|
# which forms an \Array into a \String:
|
|
|
|
# ['foo', '0'].to_csv # => "foo,0\n"
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# === "Filtering" \CSV
|
|
|
|
#
|
|
|
|
# \Method CSV.filter provides a Unix-style filter for \CSV data.
|
|
|
|
# The input data is processed to form the output data:
|
|
|
|
# in_string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# out_string = ''
|
|
|
|
# CSV.filter(in_string, out_string) do |row|
|
|
|
|
# row[0] = row[0].upcase
|
|
|
|
# row[1] *= 4
|
|
|
|
# end
|
|
|
|
# out_string # => "FOO,0000\nBAR,1111\nBAZ,2222\n"
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# == \CSV Objects
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# There are three ways to create a \CSV object:
|
|
|
|
# - \Method CSV.new returns a new \CSV object.
|
|
|
|
# - \Method CSV.instance returns a new or cached \CSV object.
|
|
|
|
# - \Method \CSV() also returns a new or cached \CSV object.
|
2011-04-26 11:57:04 -04:00
|
|
|
#
|
2020-07-01 16:15:13 -04:00
|
|
|
# === Instance Methods
|
|
|
|
#
|
|
|
|
# \CSV has three groups of instance methods:
|
|
|
|
# - Its own internally defined instance methods.
|
|
|
|
# - Methods included by module Enumerable.
|
|
|
|
# - Methods delegated to class IO. See below.
|
|
|
|
#
|
|
|
|
# ==== Delegated Methods
|
2020-06-11 17:31:52 -04:00
|
|
|
#
|
|
|
|
# For convenience, a CSV object will delegate to many methods in class IO.
|
|
|
|
# (A few have wrapper "guard code" in \CSV.) You may call:
|
|
|
|
# * IO#binmode
|
|
|
|
# * #binmode?
|
|
|
|
# * IO#close
|
|
|
|
# * IO#close_read
|
|
|
|
# * IO#close_write
|
|
|
|
# * IO#closed?
|
|
|
|
# * #eof
|
|
|
|
# * #eof?
|
|
|
|
# * IO#external_encoding
|
|
|
|
# * IO#fcntl
|
|
|
|
# * IO#fileno
|
|
|
|
# * #flock
|
|
|
|
# * IO#flush
|
|
|
|
# * IO#fsync
|
|
|
|
# * IO#internal_encoding
|
|
|
|
# * #ioctl
|
|
|
|
# * IO#isatty
|
|
|
|
# * #path
|
|
|
|
# * IO#pid
|
|
|
|
# * IO#pos
|
|
|
|
# * IO#pos=
|
|
|
|
# * IO#reopen
|
|
|
|
# * #rewind
|
|
|
|
# * IO#seek
|
|
|
|
# * #stat
|
|
|
|
# * IO#string
|
|
|
|
# * IO#sync
|
|
|
|
# * IO#sync=
|
|
|
|
# * IO#tell
|
|
|
|
# * #to_i
|
|
|
|
# * #to_io
|
|
|
|
# * IO#truncate
|
|
|
|
# * IO#tty?
|
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# === Options
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
|
|
|
# The default values for options are:
|
|
|
|
# DEFAULT_OPTIONS = {
|
|
|
|
# # For both parsing and generating.
|
|
|
|
# col_sep: ",",
|
|
|
|
# row_sep: :auto,
|
|
|
|
# quote_char: '"',
|
|
|
|
# # For parsing.
|
|
|
|
# field_size_limit: nil,
|
|
|
|
# converters: nil,
|
|
|
|
# unconverted_fields: nil,
|
|
|
|
# headers: false,
|
|
|
|
# return_headers: false,
|
|
|
|
# header_converters: nil,
|
|
|
|
# skip_blanks: false,
|
|
|
|
# skip_lines: nil,
|
|
|
|
# liberal_parsing: false,
|
|
|
|
# nil_value: nil,
|
|
|
|
# empty_value: "",
|
[ruby/csv] Add handling for ambiguous parsing options (https://github.com/ruby/csv/pull/226)
GitHub: fix GH-225
With Ruby 3.0.2 and csv 3.2.1, the file
```ruby
require "csv"
File.open("example.tsv", "w") { |f| f.puts("foo\t\tbar") }
CSV.read("example.tsv", col_sep: "\t", strip: true)
```
produces the error
```
lib/csv/parser.rb:935:in `parse_quotable_robust': TODO: Meaningful
message in line 1. (CSV::MalformedCSVError)
```
However, the CSV in this example is not malformed; instead, ambiguous
options were provided to the parser. It is not obvious (to me) whether
the string should be parsed as
- `["foo\t\tbar"]`,
- `["foo", "bar"]`,
- `["foo", "", "bar"]`, or
- `["foo", nil, "bar"]`.
This commit adds code that raises an exception when this situation is
encountered. Specifically, it checks if the column separator either ends
with or starts with the characters that would be stripped away.
This commit also adds unit tests and updates the documentation.
https://github.com/ruby/csv/commit/cc317dd42d
2021-11-18 16:20:09 -05:00
|
|
|
# strip: false,
|
2020-05-12 17:42:45 -04:00
|
|
|
# # For generating.
|
|
|
|
# write_headers: nil,
|
|
|
|
# quote_empty: true,
|
|
|
|
# force_quotes: false,
|
|
|
|
# write_converters: nil,
|
|
|
|
# write_nil_value: nil,
|
|
|
|
# write_empty_value: "",
|
|
|
|
# }
|
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# ==== Options for Parsing
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Options for parsing, described in detail below, include:
|
|
|
|
# - +row_sep+: Specifies the row separator; used to delimit rows.
|
|
|
|
# - +col_sep+: Specifies the column separator; used to delimit fields.
|
|
|
|
# - +quote_char+: Specifies the quote character; used to quote fields.
|
|
|
|
# - +field_size_limit+: Specifies the maximum field size allowed.
|
|
|
|
# - +converters+: Specifies the field converters to be used.
|
|
|
|
# - +unconverted_fields+: Specifies whether unconverted fields are to be available.
|
|
|
|
# - +headers+: Specifies whether data contains headers,
|
|
|
|
# or specifies the headers themselves.
|
|
|
|
# - +return_headers+: Specifies whether headers are to be returned.
|
|
|
|
# - +header_converters+: Specifies the header converters to be used.
|
|
|
|
# - +skip_blanks+: Specifies whether blanks lines are to be ignored.
|
|
|
|
# - +skip_lines+: Specifies how comments lines are to be recognized.
|
[ruby/csv] Add handling for ambiguous parsing options (https://github.com/ruby/csv/pull/226)
GitHub: fix GH-225
With Ruby 3.0.2 and csv 3.2.1, the file
```ruby
require "csv"
File.open("example.tsv", "w") { |f| f.puts("foo\t\tbar") }
CSV.read("example.tsv", col_sep: "\t", strip: true)
```
produces the error
```
lib/csv/parser.rb:935:in `parse_quotable_robust': TODO: Meaningful
message in line 1. (CSV::MalformedCSVError)
```
However, the CSV in this example is not malformed; instead, ambiguous
options were provided to the parser. It is not obvious (to me) whether
the string should be parsed as
- `["foo\t\tbar"]`,
- `["foo", "bar"]`,
- `["foo", "", "bar"]`, or
- `["foo", nil, "bar"]`.
This commit adds code that raises an exception when this situation is
encountered. Specifically, it checks if the column separator either ends
with or starts with the characters that would be stripped away.
This commit also adds unit tests and updates the documentation.
https://github.com/ruby/csv/commit/cc317dd42d
2021-11-18 16:20:09 -05:00
|
|
|
# - +strip+: Specifies whether leading and trailing whitespace are to be
|
|
|
|
# stripped from fields. This must be compatible with +col_sep+; if it is not,
|
|
|
|
# then an +ArgumentError+ exception will be raised.
|
2020-07-02 22:06:26 -04:00
|
|
|
# - +liberal_parsing+: Specifies whether \CSV should attempt to parse
|
|
|
|
# non-compliant data.
|
|
|
|
# - +nil_value+: Specifies the object that is to be substituted for each null (no-text) field.
|
|
|
|
# - +empty_value+: Specifies the object that is to be substituted for each empty field.
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/common/row_sep.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/common/col_sep.rdoc
|
2020-07-02 22:06:26 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/common/quote_char.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/field_size_limit.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/converters.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/unconverted_fields.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/headers.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/return_headers.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/header_converters.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/skip_blanks.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/skip_lines.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/strip.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/liberal_parsing.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/nil_value.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/parsing/empty_value.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# ==== Options for Generating
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Options for generating, described in detail below, include:
|
|
|
|
# - +row_sep+: Specifies the row separator; used to delimit rows.
|
|
|
|
# - +col_sep+: Specifies the column separator; used to delimit fields.
|
|
|
|
# - +quote_char+: Specifies the quote character; used to quote fields.
|
|
|
|
# - +write_headers+: Specifies whether headers are to be written.
|
|
|
|
# - +force_quotes+: Specifies whether each output field is to be quoted.
|
|
|
|
# - +quote_empty+: Specifies whether each empty output field is to be quoted.
|
|
|
|
# - +write_converters+: Specifies the field converters to be used in writing.
|
|
|
|
# - +write_nil_value+: Specifies the object that is to be substituted for each +nil+-valued field.
|
|
|
|
# - +write_empty_value+: Specifies the object that is to be substituted for each empty field.
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/common/row_sep.rdoc
|
2018-05-09 00:39:16 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/common/col_sep.rdoc
|
2020-07-02 22:06:26 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/common/quote_char.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/generating/write_headers.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/generating/force_quotes.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/generating/quote_empty.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/generating/write_converters.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/generating/write_nil_value.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/options/generating/write_empty_value.rdoc
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# === \CSV with Headers
|
2018-05-09 00:39:16 -04:00
|
|
|
#
|
|
|
|
# CSV allows to specify column names of CSV file, whether they are in data, or
|
2019-10-12 01:03:21 -04:00
|
|
|
# provided separately. If headers are specified, reading methods return an instance
|
2018-05-09 00:39:16 -04:00
|
|
|
# of CSV::Table, consisting of CSV::Row.
|
|
|
|
#
|
|
|
|
# # Headers are part of data
|
|
|
|
# data = CSV.parse(<<~ROWS, headers: true)
|
|
|
|
# Name,Department,Salary
|
2018-12-23 02:00:35 -05:00
|
|
|
# Bob,Engineering,1000
|
2018-05-09 00:39:16 -04:00
|
|
|
# Jane,Sales,2000
|
|
|
|
# John,Management,5000
|
|
|
|
# ROWS
|
2011-04-26 11:57:04 -04:00
|
|
|
#
|
2018-05-09 00:39:16 -04:00
|
|
|
# data.class #=> CSV::Table
|
2018-12-23 02:00:35 -05:00
|
|
|
# data.first #=> #<CSV::Row "Name":"Bob" "Department":"Engineering" "Salary":"1000">
|
|
|
|
# data.first.to_h #=> {"Name"=>"Bob", "Department"=>"Engineering", "Salary"=>"1000"}
|
2011-04-26 11:57:04 -04:00
|
|
|
#
|
2018-05-09 00:39:16 -04:00
|
|
|
# # Headers provided by developer
|
2019-08-13 02:27:46 -04:00
|
|
|
# data = CSV.parse('Bob,Engineering,1000', headers: %i[name department salary])
|
2018-12-23 02:00:35 -05:00
|
|
|
# data.first #=> #<CSV::Row name:"Bob" department:"Engineering" salary:"1000">
|
2018-05-09 00:39:16 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# === \Converters
|
|
|
|
#
|
|
|
|
# By default, each value (field or header) parsed by \CSV is formed into a \String.
|
|
|
|
# You can use a _field_ _converter_ or _header_ _converter_
|
|
|
|
# to intercept and modify the parsed values:
|
|
|
|
# - See {Field Converters}[#class-CSV-label-Field+Converters].
|
|
|
|
# - See {Header Converters}[#class-CSV-label-Header+Converters].
|
|
|
|
#
|
|
|
|
# Also by default, each value to be written during generation is written 'as-is'.
|
|
|
|
# You can use a _write_ _converter_ to modify values before writing.
|
|
|
|
# - See {Write Converters}[#class-CSV-label-Write+Converters].
|
|
|
|
#
|
|
|
|
# ==== Specifying \Converters
|
|
|
|
#
|
|
|
|
# You can specify converters for parsing or generating in the +options+
|
|
|
|
# argument to various \CSV methods:
|
|
|
|
# - Option +converters+ for converting parsed field values.
|
|
|
|
# - Option +header_converters+ for converting parsed header values.
|
|
|
|
# - Option +write_converters+ for converting values to be written (generated).
|
|
|
|
#
|
|
|
|
# There are three forms for specifying converters:
|
|
|
|
# - A converter proc: executable code to be used for conversion.
|
|
|
|
# - A converter name: the name of a stored converter.
|
|
|
|
# - A converter list: an array of converter procs, converter names, and converter lists.
|
|
|
|
#
|
|
|
|
# ===== Converter Procs
|
|
|
|
#
|
|
|
|
# This converter proc, +strip_converter+, accepts a value +field+
|
|
|
|
# and returns <tt>field.strip</tt>:
|
|
|
|
# strip_converter = proc {|field| field.strip }
|
|
|
|
# In this call to <tt>CSV.parse</tt>,
|
|
|
|
# the keyword argument <tt>converters: string_converter</tt>
|
|
|
|
# specifies that:
|
|
|
|
# - \Proc +string_converter+ is to be called for each parsed field.
|
|
|
|
# - The converter's return value is to replace the +field+ value.
|
|
|
|
# Example:
|
|
|
|
# string = " foo , 0 \n bar , 1 \n baz , 2 \n"
|
|
|
|
# array = CSV.parse(string, converters: strip_converter)
|
|
|
|
# array # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
|
|
|
#
|
|
|
|
# A converter proc can receive a second argument, +field_info+,
|
|
|
|
# that contains details about the field.
|
|
|
|
# This modified +strip_converter+ displays its arguments:
|
|
|
|
# strip_converter = proc do |field, field_info|
|
|
|
|
# p [field, field_info]
|
|
|
|
# field.strip
|
|
|
|
# end
|
|
|
|
# string = " foo , 0 \n bar , 1 \n baz , 2 \n"
|
|
|
|
# array = CSV.parse(string, converters: strip_converter)
|
|
|
|
# array # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
|
|
|
# Output:
|
|
|
|
# [" foo ", #<struct CSV::FieldInfo index=0, line=1, header=nil>]
|
|
|
|
# [" 0 ", #<struct CSV::FieldInfo index=1, line=1, header=nil>]
|
|
|
|
# [" bar ", #<struct CSV::FieldInfo index=0, line=2, header=nil>]
|
|
|
|
# [" 1 ", #<struct CSV::FieldInfo index=1, line=2, header=nil>]
|
|
|
|
# [" baz ", #<struct CSV::FieldInfo index=0, line=3, header=nil>]
|
|
|
|
# [" 2 ", #<struct CSV::FieldInfo index=1, line=3, header=nil>]
|
2021-08-05 16:40:05 -04:00
|
|
|
# Each CSV::FieldInfo object shows:
|
2020-07-15 16:37:17 -04:00
|
|
|
# - The 0-based field index.
|
|
|
|
# - The 1-based line index.
|
|
|
|
# - The field header, if any.
|
|
|
|
#
|
|
|
|
# ===== Stored \Converters
|
|
|
|
#
|
|
|
|
# A converter may be given a name and stored in a structure where
|
|
|
|
# the parsing methods can find it by name.
|
|
|
|
#
|
|
|
|
# The storage structure for field converters is the \Hash CSV::Converters.
|
|
|
|
# It has several built-in converter procs:
|
|
|
|
# - <tt>:integer</tt>: converts each \String-embedded integer into a true \Integer.
|
|
|
|
# - <tt>:float</tt>: converts each \String-embedded float into a true \Float.
|
|
|
|
# - <tt>:date</tt>: converts each \String-embedded date into a true \Date.
|
|
|
|
# - <tt>:date_time</tt>: converts each \String-embedded date-time into a true \DateTime
|
|
|
|
# .
|
|
|
|
# This example creates a converter proc, then stores it:
|
|
|
|
# strip_converter = proc {|field| field.strip }
|
|
|
|
# CSV::Converters[:strip] = strip_converter
|
|
|
|
# Then the parsing method call can refer to the converter
|
|
|
|
# by its name, <tt>:strip</tt>:
|
|
|
|
# string = " foo , 0 \n bar , 1 \n baz , 2 \n"
|
|
|
|
# array = CSV.parse(string, converters: :strip)
|
|
|
|
# array # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
|
|
|
#
|
|
|
|
# The storage structure for header converters is the \Hash CSV::HeaderConverters,
|
|
|
|
# which works in the same way.
|
|
|
|
# It also has built-in converter procs:
|
|
|
|
# - <tt>:downcase</tt>: Downcases each header.
|
|
|
|
# - <tt>:symbol</tt>: Converts each header to a \Symbol.
|
|
|
|
#
|
|
|
|
# There is no such storage structure for write headers.
|
|
|
|
#
|
2021-10-10 22:21:42 -04:00
|
|
|
# In order for the parsing methods to access stored converters in non-main-Ractors, the
|
|
|
|
# storage structure must be made shareable first.
|
|
|
|
# Therefore, <tt>Ractor.make_shareable(CSV::Converters)</tt> and
|
|
|
|
# <tt>Ractor.make_shareable(CSV::HeaderConverters)</tt> must be called before the creation
|
|
|
|
# of Ractors that use the converters stored in these structures. (Since making the storage
|
|
|
|
# structures shareable involves freezing them, any custom converters that are to be used
|
|
|
|
# must be added first.)
|
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# ===== Converter Lists
|
|
|
|
#
|
|
|
|
# A _converter_ _list_ is an \Array that may include any assortment of:
|
|
|
|
# - Converter procs.
|
|
|
|
# - Names of stored converters.
|
|
|
|
# - Nested converter lists.
|
|
|
|
#
|
|
|
|
# Examples:
|
|
|
|
# numeric_converters = [:integer, :float]
|
|
|
|
# date_converters = [:date, :date_time]
|
|
|
|
# [numeric_converters, strip_converter]
|
|
|
|
# [strip_converter, date_converters, :float]
|
|
|
|
#
|
|
|
|
# Like a converter proc, a converter list may be named and stored in either
|
|
|
|
# \CSV::Converters or CSV::HeaderConverters:
|
|
|
|
# CSV::Converters[:custom] = [strip_converter, date_converters, :float]
|
|
|
|
# CSV::HeaderConverters[:custom] = [:downcase, :symbol]
|
|
|
|
#
|
|
|
|
# There are two built-in converter lists:
|
|
|
|
# CSV::Converters[:numeric] # => [:integer, :float]
|
|
|
|
# CSV::Converters[:all] # => [:date_time, :numeric]
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# ==== Field \Converters
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# With no conversion, all parsed fields in all rows become Strings:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# ary = CSV.parse(string)
|
|
|
|
# ary # => # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
|
|
|
#
|
|
|
|
# When you specify a field converter, each parsed field is passed to the converter;
|
|
|
|
# its return value becomes the stored value for the field.
|
2020-05-12 17:42:45 -04:00
|
|
|
# A converter might, for example, convert an integer embedded in a \String
|
|
|
|
# into a true \Integer.
|
|
|
|
# (In fact, that's what built-in field converter +:integer+ does.)
|
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# There are three ways to use field \converters.
|
|
|
|
#
|
|
|
|
# - Using option {converters}[#class-CSV-label-Option+converters] with a parsing method:
|
|
|
|
# ary = CSV.parse(string, converters: :integer)
|
|
|
|
# ary # => [0, 1, 2] # => [["foo", 0], ["bar", 1], ["baz", 2]]
|
|
|
|
# - Using option {converters}[#class-CSV-label-Option+converters] with a new \CSV instance:
|
|
|
|
# csv = CSV.new(string, converters: :integer)
|
|
|
|
# # Field converters in effect:
|
|
|
|
# csv.converters # => [:integer]
|
|
|
|
# csv.read # => [["foo", 0], ["bar", 1], ["baz", 2]]
|
|
|
|
# - Using method #convert to add a field converter to a \CSV instance:
|
|
|
|
# csv = CSV.new(string)
|
|
|
|
# # Add a converter.
|
|
|
|
# csv.convert(:integer)
|
|
|
|
# csv.converters # => [:integer]
|
|
|
|
# csv.read # => [["foo", 0], ["bar", 1], ["baz", 2]]
|
|
|
|
#
|
|
|
|
# Installing a field converter does not affect already-read rows:
|
|
|
|
# csv = CSV.new(string)
|
|
|
|
# csv.shift # => ["foo", "0"]
|
2020-05-12 17:42:45 -04:00
|
|
|
# # Add a converter.
|
|
|
|
# csv.convert(:integer)
|
|
|
|
# csv.converters # => [:integer]
|
2020-07-15 16:37:17 -04:00
|
|
|
# csv.read # => [["bar", 1], ["baz", 2]]
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# There are additional built-in \converters, and custom \converters are also supported.
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# ===== Built-In Field \Converters
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# The built-in field converters are in \Hash CSV::Converters:
|
|
|
|
# - Each key is a field converter name.
|
|
|
|
# - Each value is one of:
|
|
|
|
# - A \Proc field converter.
|
|
|
|
# - An \Array of field converter names.
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# Display:
|
|
|
|
# CSV::Converters.each_pair do |name, value|
|
|
|
|
# if value.kind_of?(Proc)
|
|
|
|
# p [name, value.class]
|
|
|
|
# else
|
|
|
|
# p [name, value]
|
|
|
|
# end
|
|
|
|
# end
|
|
|
|
# Output:
|
|
|
|
# [:integer, Proc]
|
|
|
|
# [:float, Proc]
|
|
|
|
# [:numeric, [:integer, :float]]
|
|
|
|
# [:date, Proc]
|
|
|
|
# [:date_time, Proc]
|
|
|
|
# [:all, [:date_time, :numeric]]
|
|
|
|
#
|
|
|
|
# Each of these converters transcodes values to UTF-8 before attempting conversion.
|
|
|
|
# If a value cannot be transcoded to UTF-8 the conversion will
|
|
|
|
# fail and the value will remain unconverted.
|
|
|
|
#
|
|
|
|
# Converter +:integer+ converts each field that Integer() accepts:
|
2020-05-12 17:42:45 -04:00
|
|
|
# data = '0,1,2,x'
|
|
|
|
# # Without the converter
|
|
|
|
# csv = CSV.parse_line(data)
|
|
|
|
# csv # => ["0", "1", "2", "x"]
|
|
|
|
# # With the converter
|
|
|
|
# csv = CSV.parse_line(data, converters: :integer)
|
|
|
|
# csv # => [0, 1, 2, "x"]
|
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# Converter +:float+ converts each field that Float() accepts:
|
2020-05-12 17:42:45 -04:00
|
|
|
# data = '1.0,3.14159,x'
|
|
|
|
# # Without the converter
|
|
|
|
# csv = CSV.parse_line(data)
|
|
|
|
# csv # => ["1.0", "3.14159", "x"]
|
|
|
|
# # With the converter
|
|
|
|
# csv = CSV.parse_line(data, converters: :float)
|
|
|
|
# csv # => [1.0, 3.14159, "x"]
|
2018-05-09 00:39:16 -04:00
|
|
|
#
|
2020-05-12 17:42:45 -04:00
|
|
|
# Converter +:numeric+ converts with both +:integer+ and +:float+..
|
2018-05-09 00:39:16 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# Converter +:date+ converts each field that Date::parse accepts:
|
2020-05-12 17:42:45 -04:00
|
|
|
# data = '2001-02-03,x'
|
|
|
|
# # Without the converter
|
|
|
|
# csv = CSV.parse_line(data)
|
|
|
|
# csv # => ["2001-02-03", "x"]
|
|
|
|
# # With the converter
|
|
|
|
# csv = CSV.parse_line(data, converters: :date)
|
|
|
|
# csv # => [#<Date: 2001-02-03 ((2451944j,0s,0n),+0s,2299161j)>, "x"]
|
2018-05-09 00:39:16 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# Converter +:date_time+ converts each field that DateTime::parse accepts:
|
2020-05-12 17:42:45 -04:00
|
|
|
# data = '2020-05-07T14:59:00-05:00,x'
|
|
|
|
# # Without the converter
|
|
|
|
# csv = CSV.parse_line(data)
|
|
|
|
# csv # => ["2020-05-07T14:59:00-05:00", "x"]
|
|
|
|
# # With the converter
|
|
|
|
# csv = CSV.parse_line(data, converters: :date_time)
|
|
|
|
# csv # => [#<DateTime: 2020-05-07T14:59:00-05:00 ((2458977j,71940s,0n),-18000s,2299161j)>, "x"]
|
2018-05-09 00:39:16 -04:00
|
|
|
#
|
2020-05-12 17:42:45 -04:00
|
|
|
# Converter +:numeric+ converts with both +:date_time+ and +:numeric+..
|
|
|
|
#
|
|
|
|
# As seen above, method #convert adds \converters to a \CSV instance,
|
|
|
|
# and method #converters returns an \Array of the \converters in effect:
|
|
|
|
# csv = CSV.new('0,1,2')
|
|
|
|
# csv.converters # => []
|
|
|
|
# csv.convert(:integer)
|
|
|
|
# csv.converters # => [:integer]
|
|
|
|
# csv.convert(:date)
|
|
|
|
# csv.converters # => [:integer, :date]
|
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# ===== Custom Field \Converters
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# You can define a custom field converter:
|
|
|
|
# strip_converter = proc {|field| field.strip }
|
|
|
|
# string = " foo , 0 \n bar , 1 \n baz , 2 \n"
|
|
|
|
# array = CSV.parse(string, converters: strip_converter)
|
|
|
|
# array # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
2020-09-21 18:11:33 -04:00
|
|
|
# You can register the converter in \Converters \Hash,
|
|
|
|
# which allows you to refer to it by name:
|
|
|
|
# CSV::Converters[:strip] = strip_converter
|
|
|
|
# string = " foo , 0 \n bar , 1 \n baz , 2 \n"
|
|
|
|
# array = CSV.parse(string, converters: :strip)
|
|
|
|
# array # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-06-30 21:30:49 -04:00
|
|
|
# ==== Header \Converters
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
|
|
|
# Header converters operate only on headers (and not on other rows).
|
|
|
|
#
|
|
|
|
# There are three ways to use header \converters;
|
|
|
|
# these examples use built-in header converter +:dowhcase+,
|
|
|
|
# which downcases each parsed header.
|
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# - Option +header_converters+ with a singleton parsing method:
|
|
|
|
# string = "Name,Count\nFoo,0\n,Bar,1\nBaz,2"
|
|
|
|
# tbl = CSV.parse(string, headers: true, header_converters: :downcase)
|
|
|
|
# tbl.class # => CSV::Table
|
|
|
|
# tbl.headers # => ["name", "count"]
|
|
|
|
#
|
|
|
|
# - Option +header_converters+ with a new \CSV instance:
|
|
|
|
# csv = CSV.new(string, header_converters: :downcase)
|
|
|
|
# # Header converters in effect:
|
|
|
|
# csv.header_converters # => [:downcase]
|
|
|
|
# tbl = CSV.parse(string, headers: true)
|
|
|
|
# tbl.headers # => ["Name", "Count"]
|
|
|
|
#
|
|
|
|
# - Method #header_convert adds a header converter to a \CSV instance:
|
|
|
|
# csv = CSV.new(string)
|
|
|
|
# # Add a header converter.
|
|
|
|
# csv.header_convert(:downcase)
|
|
|
|
# csv.header_converters # => [:downcase]
|
|
|
|
# tbl = CSV.parse(string, headers: true)
|
|
|
|
# tbl.headers # => ["Name", "Count"]
|
|
|
|
#
|
|
|
|
# ===== Built-In Header \Converters
|
|
|
|
#
|
|
|
|
# The built-in header \converters are in \Hash CSV::HeaderConverters.
|
|
|
|
# The keys there are the names of the \converters:
|
2020-05-12 17:42:45 -04:00
|
|
|
# CSV::HeaderConverters.keys # => [:downcase, :symbol]
|
|
|
|
#
|
|
|
|
# Converter +:downcase+ converts each header by downcasing it:
|
2020-07-15 16:37:17 -04:00
|
|
|
# string = "Name,Count\nFoo,0\n,Bar,1\nBaz,2"
|
|
|
|
# tbl = CSV.parse(string, headers: true, header_converters: :downcase)
|
2020-05-12 17:42:45 -04:00
|
|
|
# tbl.class # => CSV::Table
|
|
|
|
# tbl.headers # => ["name", "count"]
|
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# Converter +:symbol+ converts each header by making it into a \Symbol:
|
|
|
|
# string = "Name,Count\nFoo,0\n,Bar,1\nBaz,2"
|
|
|
|
# tbl = CSV.parse(string, headers: true, header_converters: :symbol)
|
2020-05-12 17:42:45 -04:00
|
|
|
# tbl.headers # => [:name, :count]
|
|
|
|
# Details:
|
|
|
|
# - Strips leading and trailing whitespace.
|
|
|
|
# - Downcases the header.
|
|
|
|
# - Replaces embedded spaces with underscores.
|
|
|
|
# - Removes non-word characters.
|
|
|
|
# - Makes the string into a \Symbol.
|
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# ===== Custom Header \Converters
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# You can define a custom header converter:
|
|
|
|
# upcase_converter = proc {|header| header.upcase }
|
|
|
|
# string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
|
2020-09-21 18:11:33 -04:00
|
|
|
# table = CSV.parse(string, headers: true, header_converters: upcase_converter)
|
2020-07-15 16:37:17 -04:00
|
|
|
# table # => #<CSV::Table mode:col_or_row row_count:4>
|
2020-09-21 18:11:33 -04:00
|
|
|
# table.headers # => ["NAME", "VALUE"]
|
|
|
|
# You can register the converter in \HeaderConverters \Hash,
|
|
|
|
# which allows you to refer to it by name:
|
|
|
|
# CSV::HeaderConverters[:upcase] = upcase_converter
|
|
|
|
# table = CSV.parse(string, headers: true, header_converters: :upcase)
|
|
|
|
# table # => #<CSV::Table mode:col_or_row row_count:4>
|
|
|
|
# table.headers # => ["NAME", "VALUE"]
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# ===== Write \Converters
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# When you specify a write converter for generating \CSV,
|
|
|
|
# each field to be written is passed to the converter;
|
|
|
|
# its return value becomes the new value for the field.
|
|
|
|
# A converter might, for example, strip whitespace from a field.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-09-21 18:11:33 -04:00
|
|
|
# Using no write converter (all fields unmodified):
|
|
|
|
# output_string = CSV.generate do |csv|
|
|
|
|
# csv << [' foo ', 0]
|
|
|
|
# csv << [' bar ', 1]
|
|
|
|
# csv << [' baz ', 2]
|
|
|
|
# end
|
|
|
|
# output_string # => " foo ,0\n bar ,1\n baz ,2\n"
|
|
|
|
# Using option +write_converters+ with two custom write converters:
|
|
|
|
# strip_converter = proc {|field| field.respond_to?(:strip) ? field.strip : field }
|
|
|
|
# upcase_converter = proc {|field| field.respond_to?(:upcase) ? field.upcase : field }
|
|
|
|
# write_converters = [strip_converter, upcase_converter]
|
|
|
|
# output_string = CSV.generate(write_converters: write_converters) do |csv|
|
|
|
|
# csv << [' foo ', 0]
|
|
|
|
# csv << [' bar ', 1]
|
|
|
|
# csv << [' baz ', 2]
|
|
|
|
# end
|
|
|
|
# output_string # => "FOO,0\nBAR,1\nBAZ,2\n"
|
2020-07-15 16:37:17 -04:00
|
|
|
#
|
|
|
|
# === Character Encodings (M17n or Multilingualization)
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# This new CSV parser is m17n savvy. The parser works in the Encoding of the IO
|
2019-10-12 01:03:21 -04:00
|
|
|
# or String object being read from or written to. Your data is never transcoded
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# (unless you ask Ruby to transcode it for you) and will literally be parsed in
|
2019-10-12 01:03:21 -04:00
|
|
|
# the Encoding it is in. Thus CSV will return Arrays or Rows of Strings in the
|
|
|
|
# Encoding of your data. This is accomplished by transcoding the parser itself
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# into your Encoding.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# Some transcoding must take place, of course, to accomplish this multiencoding
|
2019-10-12 01:03:21 -04:00
|
|
|
# support. For example, <tt>:col_sep</tt>, <tt>:row_sep</tt>, and
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# <tt>:quote_char</tt> must be transcoded to match your data. Hopefully this
|
|
|
|
# makes the entire process feel transparent, since CSV's defaults should just
|
2019-10-12 01:03:21 -04:00
|
|
|
# magically work for your data. However, you can set these values manually in
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# the target Encoding to avoid the translation.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# It's also important to note that while all of CSV's core parser is now
|
2019-10-12 01:03:21 -04:00
|
|
|
# Encoding agnostic, some features are not. For example, the built-in
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# converters will try to transcode data to UTF-8 before making conversions.
|
|
|
|
# Again, you can provide custom converters that are aware of your Encodings to
|
2019-10-12 01:03:21 -04:00
|
|
|
# avoid this translation. It's just too hard for me to support native
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# conversions in all of Ruby's Encodings.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2019-10-12 01:03:21 -04:00
|
|
|
# Anyway, the practical side of this is simple: make sure IO and String objects
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# passed into CSV have the proper Encoding set and everything should just work.
|
|
|
|
# CSV methods that allow you to open IO objects (CSV::foreach(), CSV::open(),
|
|
|
|
# CSV::read(), and CSV::readlines()) do allow you to specify the Encoding.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# One minor exception comes when generating CSV into a String with an Encoding
|
2019-10-12 01:03:21 -04:00
|
|
|
# that is not ASCII compatible. There's no existing data for CSV to use to
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# prepare itself and thus you will probably need to manually specify the desired
|
2019-10-12 01:03:21 -04:00
|
|
|
# Encoding for most of those cases. It will try to guess using the fields in a
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# row of output though, when using CSV::generate_line() or Array#to_csv().
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# I try to point out any other Encoding issues in the documentation of methods
|
|
|
|
# as they come up.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# This has been tested to the best of my ability with all non-"dummy" Encodings
|
2019-10-12 01:03:21 -04:00
|
|
|
# Ruby ships with. However, it is brave new code and may have some bugs.
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# Please feel free to {report}[mailto:james@grayproductions.net] any issues you
|
|
|
|
# find with it.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
class CSV
|
|
|
|
|
2018-05-09 00:39:16 -04:00
|
|
|
# The error thrown when the parser encounters illegal CSV formatting.
|
|
|
|
class MalformedCSVError < RuntimeError
|
|
|
|
attr_reader :line_number
|
|
|
|
alias_method :lineno, :line_number
|
|
|
|
def initialize(message, line_number)
|
|
|
|
@line_number = line_number
|
|
|
|
super("#{message} in line #{line_number}.")
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
end
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# A FieldInfo Struct contains details about a field's position in the data
|
|
|
|
# source it was read from. CSV will pass this Struct to some blocks that make
|
|
|
|
# decisions based on field structure. See CSV.convert_fields() for an
|
|
|
|
# example.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# <b><tt>index</tt></b>:: The zero-based index of the field in its row.
|
|
|
|
# <b><tt>line</tt></b>:: The line of the data source this row is from.
|
|
|
|
# <b><tt>header</tt></b>:: The header for the column, when available.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
FieldInfo = Struct.new(:index, :line, :header)
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2007-12-24 21:46:26 -05:00
|
|
|
# A Regexp used to find and convert some common Date formats.
|
|
|
|
DateMatcher = / \A(?: (\w+,?\s+)?\w+\s+\d{1,2},?\s+\d{2,4} |
|
|
|
|
\d{4}-\d{2}-\d{2} )\z /x
|
|
|
|
# A Regexp used to find and convert some common DateTime formats.
|
|
|
|
DateTimeMatcher =
|
|
|
|
/ \A(?: (\w+,?\s+)?\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2},?\s+\d{2,4} |
|
2018-05-09 00:39:16 -04:00
|
|
|
\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} |
|
|
|
|
# ISO-8601
|
|
|
|
\d{4}-\d{2}-\d{2}
|
|
|
|
(?:T\d{2}:\d{2}(?::\d{2}(?:\.\d+)?(?:[+-]\d{2}(?::\d{2})|Z)?)?)?
|
|
|
|
)\z /x
|
2009-03-05 22:56:38 -05:00
|
|
|
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# The encoding used by all converters.
|
|
|
|
ConverterEncoding = Encoding.find("UTF-8")
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-07-15 16:37:17 -04:00
|
|
|
# A \Hash containing the names and \Procs for the built-in field converters.
|
|
|
|
# See {Built-In Field Converters}[#class-CSV-label-Built-In+Field+Converters].
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# This \Hash is intentionally left unfrozen, and may be extended with
|
|
|
|
# custom field converters.
|
|
|
|
# See {Custom Field Converters}[#class-CSV-label-Custom+Field+Converters].
|
2015-12-17 22:55:29 -05:00
|
|
|
Converters = {
|
|
|
|
integer: lambda { |f|
|
|
|
|
Integer(f.encode(ConverterEncoding)) rescue f
|
|
|
|
},
|
|
|
|
float: lambda { |f|
|
|
|
|
Float(f.encode(ConverterEncoding)) rescue f
|
|
|
|
},
|
|
|
|
numeric: [:integer, :float],
|
|
|
|
date: lambda { |f|
|
|
|
|
begin
|
|
|
|
e = f.encode(ConverterEncoding)
|
Improve CSV performance
If it will not use special variables (like $1, $&, $`...),
it can improve the performance by using Regexp#match? or String#match? instead of Regexp#=~ or String#=~.
This patch is same idea as https://github.com/ruby/ruby/pull/1836
[Fix GH-1842]
## Environment
* OS : Ubuntu 17.10
* Compiler : gcc version 7.2.0
* CPU : Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
* Memory : 16 GB
## TL;DR
Methods | Before | After | Speed up
----------- | ------ | ------ | --------
CSV.foreach | 44.825 | 48.201 | 7.5%
CSV#shift | 45.200 | 49.584 | 9.7%
CSV.read | 42.968 | 46.853 | 9.0%
CSV.table | 10.933 | 11.277 | 3.1%
## Before
```
Calculating -------------------------------------
CSV.foreach 44.825 (± 0.0%) i/s - 228.000 in 5.086576s
CSV#shift 45.200 (± 0.0%) i/s - 228.000 in 5.044297s
CSV.read 42.968 (± 0.0%) i/s - 216.000 in 5.027504s
CSV.table 10.933 (± 0.0%) i/s - 55.000 in 5.031098s
```
## After
```
Calculating -------------------------------------
CSV.foreach 48.201 (± 0.0%) i/s - 244.000 in 5.062256s
CSV#shift 49.584 (± 0.0%) i/s - 248.000 in 5.001652s
CSV.read 46.853 (± 0.0%) i/s - 236.000 in 5.037044s
CSV.table 11.277 (± 0.0%) i/s - 57.000 in 5.054694s
```
## Benchmark code
```ruby
require 'csv'
require 'benchmark/ips'
CSV.open("/tmp/file.csv", "w") do |csv|
csv << ["player", "gameA", "gameB"]
1000.times do
csv << ['"Alice"', "84.0", "79.5"]
csv << ['"Bob"', "20.0", "56.5"]
end
end
Benchmark.ips do |x|
x.report "CSV.foreach" do
CSV.foreach("/tmp/file.csv") do |row|
end
end
x.report "CSV#shift" do
CSV.open("/tmp/file.csv") do |csv|
while line = csv.shift
end
end
end
x.report "CSV.read" do
CSV.read("/tmp/file.csv")
end
x.report "CSV.table" do
CSV.table("/tmp/file.csv")
end
end
```
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62806 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-18 06:28:58 -04:00
|
|
|
e.match?(DateMatcher) ? Date.parse(e) : f
|
2015-12-17 22:55:29 -05:00
|
|
|
rescue # encoding conversion or date parse errors
|
|
|
|
f
|
|
|
|
end
|
|
|
|
},
|
|
|
|
date_time: lambda { |f|
|
|
|
|
begin
|
|
|
|
e = f.encode(ConverterEncoding)
|
Improve CSV performance
If it will not use special variables (like $1, $&, $`...),
it can improve the performance by using Regexp#match? or String#match? instead of Regexp#=~ or String#=~.
This patch is same idea as https://github.com/ruby/ruby/pull/1836
[Fix GH-1842]
## Environment
* OS : Ubuntu 17.10
* Compiler : gcc version 7.2.0
* CPU : Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
* Memory : 16 GB
## TL;DR
Methods | Before | After | Speed up
----------- | ------ | ------ | --------
CSV.foreach | 44.825 | 48.201 | 7.5%
CSV#shift | 45.200 | 49.584 | 9.7%
CSV.read | 42.968 | 46.853 | 9.0%
CSV.table | 10.933 | 11.277 | 3.1%
## Before
```
Calculating -------------------------------------
CSV.foreach 44.825 (± 0.0%) i/s - 228.000 in 5.086576s
CSV#shift 45.200 (± 0.0%) i/s - 228.000 in 5.044297s
CSV.read 42.968 (± 0.0%) i/s - 216.000 in 5.027504s
CSV.table 10.933 (± 0.0%) i/s - 55.000 in 5.031098s
```
## After
```
Calculating -------------------------------------
CSV.foreach 48.201 (± 0.0%) i/s - 244.000 in 5.062256s
CSV#shift 49.584 (± 0.0%) i/s - 248.000 in 5.001652s
CSV.read 46.853 (± 0.0%) i/s - 236.000 in 5.037044s
CSV.table 11.277 (± 0.0%) i/s - 57.000 in 5.054694s
```
## Benchmark code
```ruby
require 'csv'
require 'benchmark/ips'
CSV.open("/tmp/file.csv", "w") do |csv|
csv << ["player", "gameA", "gameB"]
1000.times do
csv << ['"Alice"', "84.0", "79.5"]
csv << ['"Bob"', "20.0", "56.5"]
end
end
Benchmark.ips do |x|
x.report "CSV.foreach" do
CSV.foreach("/tmp/file.csv") do |row|
end
end
x.report "CSV#shift" do
CSV.open("/tmp/file.csv") do |csv|
while line = csv.shift
end
end
end
x.report "CSV.read" do
CSV.read("/tmp/file.csv")
end
x.report "CSV.table" do
CSV.table("/tmp/file.csv")
end
end
```
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62806 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-18 06:28:58 -04:00
|
|
|
e.match?(DateTimeMatcher) ? DateTime.parse(e) : f
|
2015-12-17 22:55:29 -05:00
|
|
|
rescue # encoding conversion or date parse errors
|
|
|
|
f
|
|
|
|
end
|
|
|
|
},
|
|
|
|
all: [:date_time, :numeric],
|
|
|
|
}
|
2007-12-24 21:46:26 -05:00
|
|
|
|
2020-07-15 16:37:17 -04:00
|
|
|
# A \Hash containing the names and \Procs for the built-in header converters.
|
|
|
|
# See {Built-In Header Converters}[#class-CSV-label-Built-In+Header+Converters].
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# This \Hash is intentionally left unfrozen, and may be extended with
|
|
|
|
# custom field converters.
|
|
|
|
# See {Custom Header Converters}[#class-CSV-label-Custom+Header+Converters].
|
2007-12-24 21:46:26 -05:00
|
|
|
HeaderConverters = {
|
2008-10-10 11:09:34 -04:00
|
|
|
downcase: lambda { |h| h.encode(ConverterEncoding).downcase },
|
|
|
|
symbol: lambda { |h|
|
2017-05-20 21:01:10 -04:00
|
|
|
h.encode(ConverterEncoding).downcase.gsub(/[^\s\w]+/, "").strip.
|
2017-05-16 05:32:32 -04:00
|
|
|
gsub(/\s+/, "_").to_sym
|
2007-12-24 21:46:26 -05:00
|
|
|
}
|
|
|
|
}
|
2021-10-10 22:21:42 -04:00
|
|
|
|
2020-05-12 17:42:45 -04:00
|
|
|
# Default values for method options.
|
2015-12-17 22:55:29 -05:00
|
|
|
DEFAULT_OPTIONS = {
|
2020-05-12 17:42:45 -04:00
|
|
|
# For both parsing and generating.
|
2015-12-17 22:55:29 -05:00
|
|
|
col_sep: ",",
|
|
|
|
row_sep: :auto,
|
|
|
|
quote_char: '"',
|
2020-05-12 17:42:45 -04:00
|
|
|
# For parsing.
|
2015-12-17 22:55:29 -05:00
|
|
|
field_size_limit: nil,
|
|
|
|
converters: nil,
|
|
|
|
unconverted_fields: nil,
|
|
|
|
headers: false,
|
|
|
|
return_headers: false,
|
|
|
|
header_converters: nil,
|
|
|
|
skip_blanks: false,
|
|
|
|
skip_lines: nil,
|
2015-12-31 21:44:48 -05:00
|
|
|
liberal_parsing: false,
|
2020-05-12 17:42:45 -04:00
|
|
|
nil_value: nil,
|
|
|
|
empty_value: "",
|
[ruby/csv] Add handling for ambiguous parsing options (https://github.com/ruby/csv/pull/226)
GitHub: fix GH-225
With Ruby 3.0.2 and csv 3.2.1, the file
```ruby
require "csv"
File.open("example.tsv", "w") { |f| f.puts("foo\t\tbar") }
CSV.read("example.tsv", col_sep: "\t", strip: true)
```
produces the error
```
lib/csv/parser.rb:935:in `parse_quotable_robust': TODO: Meaningful
message in line 1. (CSV::MalformedCSVError)
```
However, the CSV in this example is not malformed; instead, ambiguous
options were provided to the parser. It is not obvious (to me) whether
the string should be parsed as
- `["foo\t\tbar"]`,
- `["foo", "bar"]`,
- `["foo", "", "bar"]`, or
- `["foo", nil, "bar"]`.
This commit adds code that raises an exception when this situation is
encountered. Specifically, it checks if the column separator either ends
with or starts with the characters that would be stripped away.
This commit also adds unit tests and updates the documentation.
https://github.com/ruby/csv/commit/cc317dd42d
2021-11-18 16:20:09 -05:00
|
|
|
strip: false,
|
2020-05-12 17:42:45 -04:00
|
|
|
# For generating.
|
|
|
|
write_headers: nil,
|
2019-01-25 01:49:59 -05:00
|
|
|
quote_empty: true,
|
2020-05-12 17:42:45 -04:00
|
|
|
force_quotes: false,
|
|
|
|
write_converters: nil,
|
|
|
|
write_nil_value: nil,
|
|
|
|
write_empty_value: "",
|
2015-12-17 22:55:29 -05:00
|
|
|
}.freeze
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2019-10-12 01:03:21 -04:00
|
|
|
class << self
|
2020-06-11 17:31:52 -04:00
|
|
|
# :call-seq:
|
|
|
|
# instance(string, **options)
|
|
|
|
# instance(io = $stdout, **options)
|
|
|
|
# instance(string, **options) {|csv| ... }
|
|
|
|
# instance(io = $stdout, **options) {|csv| ... }
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-11 17:31:52 -04:00
|
|
|
# Creates or retrieves cached \CSV objects.
|
|
|
|
# For arguments and options, see CSV.new.
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2021-10-10 22:21:42 -04:00
|
|
|
# This API is not Ractor-safe.
|
|
|
|
#
|
2020-06-11 17:31:52 -04:00
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# With no block given, returns a \CSV object.
|
|
|
|
#
|
|
|
|
# The first call to +instance+ creates and caches a \CSV object:
|
|
|
|
# s0 = 's0'
|
|
|
|
# csv0 = CSV.instance(s0)
|
|
|
|
# csv0.class # => CSV
|
|
|
|
#
|
|
|
|
# Subsequent calls to +instance+ with that _same_ +string+ or +io+
|
|
|
|
# retrieve that same cached object:
|
|
|
|
# csv1 = CSV.instance(s0)
|
|
|
|
# csv1.class # => CSV
|
|
|
|
# csv1.equal?(csv0) # => true # Same CSV object
|
|
|
|
#
|
|
|
|
# A subsequent call to +instance+ with a _different_ +string+ or +io+
|
|
|
|
# creates and caches a _different_ \CSV object.
|
|
|
|
# s1 = 's1'
|
|
|
|
# csv2 = CSV.instance(s1)
|
|
|
|
# csv2.equal?(csv0) # => false # Different CSV object
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-06-11 17:31:52 -04:00
|
|
|
# All the cached objects remains available:
|
|
|
|
# csv3 = CSV.instance(s0)
|
|
|
|
# csv3.equal?(csv0) # true # Same CSV object
|
|
|
|
# csv4 = CSV.instance(s1)
|
|
|
|
# csv4.equal?(csv2) # true # Same CSV object
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-11 17:31:52 -04:00
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# When a block is given, calls the block with the created or retrieved
|
|
|
|
# \CSV object; returns the block's return value:
|
|
|
|
# CSV.instance(s0) {|csv| :foo } # => :foo
|
2019-10-12 01:03:21 -04:00
|
|
|
def instance(data = $stdout, **options)
|
|
|
|
# create a _signature_ for this method call, data object and options
|
|
|
|
sig = [data.object_id] +
|
|
|
|
options.values_at(*DEFAULT_OPTIONS.keys.sort_by { |sym| sym.to_s })
|
|
|
|
|
|
|
|
# fetch or create the instance for this signature
|
|
|
|
@@instances ||= Hash.new
|
|
|
|
instance = (@@instances[sig] ||= new(data, **options))
|
|
|
|
|
|
|
|
if block_given?
|
|
|
|
yield instance # run block, if given, returning result
|
|
|
|
else
|
|
|
|
instance # or return the instance
|
|
|
|
end
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
end
|
|
|
|
|
2019-10-12 01:03:21 -04:00
|
|
|
# :call-seq:
|
2020-06-28 17:25:31 -04:00
|
|
|
# filter(**options) {|row| ... }
|
|
|
|
# filter(in_string, **options) {|row| ... }
|
|
|
|
# filter(in_io, **options) {|row| ... }
|
|
|
|
# filter(in_string, out_string, **options) {|row| ... }
|
|
|
|
# filter(in_string, out_io, **options) {|row| ... }
|
|
|
|
# filter(in_io, out_string, **options) {|row| ... }
|
|
|
|
# filter(in_io, out_io, **options) {|row| ... }
|
|
|
|
#
|
|
|
|
# Reads \CSV input and writes \CSV output.
|
|
|
|
#
|
|
|
|
# For each input row:
|
|
|
|
# - Forms the data into:
|
|
|
|
# - A CSV::Row object, if headers are in use.
|
|
|
|
# - An \Array of Arrays, otherwise.
|
|
|
|
# - Calls the block with that object.
|
|
|
|
# - Appends the block's return value to the output.
|
|
|
|
#
|
|
|
|
# Arguments:
|
|
|
|
# * \CSV source:
|
|
|
|
# * Argument +in_string+, if given, should be a \String object;
|
|
|
|
# it will be put into a new StringIO object positioned at the beginning.
|
|
|
|
# * Argument +in_io+, if given, should be an IO object that is
|
|
|
|
# open for reading; on return, the IO object will be closed.
|
|
|
|
# * If neither +in_string+ nor +in_io+ is given,
|
|
|
|
# the input stream defaults to {ARGF}[https://ruby-doc.org/core/ARGF.html].
|
|
|
|
# * \CSV output:
|
|
|
|
# * Argument +out_string+, if given, should be a \String object;
|
|
|
|
# it will be put into a new StringIO object positioned at the beginning.
|
|
|
|
# * Argument +out_io+, if given, should be an IO object that is
|
|
|
|
# ppen for writing; on return, the IO object will be closed.
|
|
|
|
# * If neither +out_string+ nor +out_io+ is given,
|
|
|
|
# the output stream defaults to <tt>$stdout</tt>.
|
|
|
|
# * Argument +options+ should be keyword arguments.
|
|
|
|
# - Each argument name that is prefixed with +in_+ or +input_+
|
|
|
|
# is stripped of its prefix and is treated as an option
|
|
|
|
# for parsing the input.
|
|
|
|
# Option +input_row_sep+ defaults to <tt>$INPUT_RECORD_SEPARATOR</tt>.
|
|
|
|
# - Each argument name that is prefixed with +out_+ or +output_+
|
|
|
|
# is stripped of its prefix and is treated as an option
|
|
|
|
# for generating the output.
|
|
|
|
# Option +output_row_sep+ defaults to <tt>$INPUT_RECORD_SEPARATOR</tt>.
|
|
|
|
# - Each argument not prefixed as above is treated as an option
|
|
|
|
# both for parsing the input and for generating the output.
|
|
|
|
# - See {Options for Parsing}[#class-CSV-label-Options+for+Parsing]
|
|
|
|
# and {Options for Generating}[#class-CSV-label-Options+for+Generating].
|
|
|
|
#
|
|
|
|
# Example:
|
|
|
|
# in_string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# out_string = ''
|
|
|
|
# CSV.filter(in_string, out_string) do |row|
|
|
|
|
# row[0] = row[0].upcase
|
|
|
|
# row[1] *= 4
|
|
|
|
# end
|
|
|
|
# out_string # => "FOO,0000\nBAR,1111\nBAZ,2222\n"
|
2019-10-12 01:03:21 -04:00
|
|
|
def filter(input=nil, output=nil, **options)
|
|
|
|
# parse options for input, output, or both
|
2021-09-11 18:34:15 -04:00
|
|
|
in_options, out_options = Hash.new, {row_sep: InputRecordSeparator.value}
|
2019-10-12 01:03:21 -04:00
|
|
|
options.each do |key, value|
|
|
|
|
case key.to_s
|
|
|
|
when /\Ain(?:put)?_(.+)\Z/
|
|
|
|
in_options[$1.to_sym] = value
|
|
|
|
when /\Aout(?:put)?_(.+)\Z/
|
|
|
|
out_options[$1.to_sym] = value
|
|
|
|
else
|
|
|
|
in_options[key] = value
|
|
|
|
out_options[key] = value
|
|
|
|
end
|
|
|
|
end
|
2020-09-11 17:36:01 -04:00
|
|
|
|
2019-10-12 01:03:21 -04:00
|
|
|
# build input and output wrappers
|
2020-09-11 17:36:01 -04:00
|
|
|
input = new(input || ARGF, **in_options)
|
2019-10-12 01:03:21 -04:00
|
|
|
output = new(output || $stdout, **out_options)
|
|
|
|
|
2020-09-11 17:36:01 -04:00
|
|
|
# process headers
|
|
|
|
need_manual_header_output =
|
|
|
|
(in_options[:headers] and
|
|
|
|
out_options[:headers] == true and
|
|
|
|
out_options[:write_headers])
|
|
|
|
if need_manual_header_output
|
|
|
|
first_row = input.shift
|
|
|
|
if first_row
|
|
|
|
if first_row.is_a?(Row)
|
|
|
|
headers = first_row.headers
|
|
|
|
yield headers
|
|
|
|
output << headers
|
|
|
|
end
|
|
|
|
yield first_row
|
|
|
|
output << first_row
|
|
|
|
end
|
|
|
|
end
|
|
|
|
|
2019-10-12 01:03:21 -04:00
|
|
|
# read, yield, write
|
|
|
|
input.each do |row|
|
|
|
|
yield row
|
|
|
|
output << row
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
end
|
2019-10-12 01:03:21 -04:00
|
|
|
|
|
|
|
#
|
2020-05-26 17:13:05 -04:00
|
|
|
# :call-seq:
|
2020-06-28 17:25:31 -04:00
|
|
|
# foreach(path, mode='r', **options) {|row| ... )
|
|
|
|
# foreach(io, mode='r', **options {|row| ... )
|
|
|
|
# foreach(path, mode='r', headers: ..., **options) {|row| ... )
|
|
|
|
# foreach(io, mode='r', headers: ..., **options {|row| ... )
|
2020-05-26 17:13:05 -04:00
|
|
|
# foreach(path, mode='r', **options) -> new_enumerator
|
|
|
|
# foreach(io, mode='r', **options -> new_enumerator
|
|
|
|
#
|
|
|
|
# Calls the block with each row read from source +path+ or +io+.
|
|
|
|
#
|
|
|
|
# * Argument +path+, if given, must be the path to a file.
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/arguments/io.rdoc
|
2020-05-26 17:13:05 -04:00
|
|
|
# * Argument +mode+, if given, must be a \File mode
|
|
|
|
# See {Open Mode}[IO.html#method-c-new-label-Open+Mode].
|
|
|
|
# * Arguments <tt>**options</tt> must be keyword options.
|
|
|
|
# See {Options for Parsing}[#class-CSV-label-Options+for+Parsing].
|
|
|
|
# * This method optionally accepts an additional <tt>:encoding</tt> option
|
|
|
|
# that you can use to specify the Encoding of the data read from +path+ or +io+.
|
|
|
|
# You must provide this unless your data is in the encoding
|
|
|
|
# given by <tt>Encoding::default_external</tt>.
|
|
|
|
# Parsing will use this to determine how to parse the data.
|
|
|
|
# You may provide a second Encoding to
|
|
|
|
# have the data transcoded as it is read. For example,
|
|
|
|
# encoding: 'UTF-32BE:UTF-8'
|
|
|
|
# would read +UTF-32BE+ data from the file
|
|
|
|
# but transcode it to +UTF-8+ before parsing.
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-18 16:21:37 -04:00
|
|
|
# ====== Without Option +headers+
|
|
|
|
#
|
|
|
|
# Without option +headers+, returns each row as an \Array object.
|
2020-05-26 17:13:05 -04:00
|
|
|
#
|
|
|
|
# These examples assume prior execution of:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
#
|
|
|
|
# Read rows from a file at +path+:
|
2020-06-28 17:25:31 -04:00
|
|
|
# CSV.foreach(path) {|row| p row }
|
2020-05-26 17:13:05 -04:00
|
|
|
# Output:
|
|
|
|
# ["foo", "0"]
|
|
|
|
# ["bar", "1"]
|
|
|
|
# ["baz", "2"]
|
|
|
|
#
|
|
|
|
# Read rows from an \IO object:
|
2020-06-14 20:09:58 -04:00
|
|
|
# File.open(path) do |file|
|
2020-06-28 17:25:31 -04:00
|
|
|
# CSV.foreach(file) {|row| p row }
|
2020-06-14 20:09:58 -04:00
|
|
|
# end
|
|
|
|
#
|
2020-05-26 17:13:05 -04:00
|
|
|
# Output:
|
|
|
|
# ["foo", "0"]
|
|
|
|
# ["bar", "1"]
|
|
|
|
# ["baz", "2"]
|
|
|
|
#
|
|
|
|
# Returns a new \Enumerator if no block given:
|
|
|
|
# CSV.foreach(path) # => #<Enumerator: CSV:foreach("t.csv", "r")>
|
2020-06-13 14:26:30 -04:00
|
|
|
# CSV.foreach(File.open(path)) # => #<Enumerator: CSV:foreach(#<File:t.csv>, "r")>
|
2020-05-26 17:13:05 -04:00
|
|
|
#
|
|
|
|
# Issues a warning if an encoding is unsupported:
|
2020-06-28 17:25:31 -04:00
|
|
|
# CSV.foreach(File.open(path), encoding: 'foo:bar') {|row| }
|
2020-05-26 17:13:05 -04:00
|
|
|
# Output:
|
|
|
|
# warning: Unsupported encoding foo ignored
|
|
|
|
# warning: Unsupported encoding bar ignored
|
|
|
|
#
|
2020-06-18 16:21:37 -04:00
|
|
|
# ====== With Option +headers+
|
|
|
|
#
|
|
|
|
# With {option +headers+}[#class-CSV-label-Option+headers],
|
|
|
|
# returns each row as a CSV::Row object.
|
|
|
|
#
|
|
|
|
# These examples assume prior execution of:
|
|
|
|
# string = "Name,Count\nfoo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
#
|
|
|
|
# Read rows from a file at +path+:
|
2020-06-28 17:25:31 -04:00
|
|
|
# CSV.foreach(path, headers: true) {|row| p row }
|
2020-06-18 16:21:37 -04:00
|
|
|
#
|
|
|
|
# Output:
|
|
|
|
# #<CSV::Row "Name":"foo" "Count":"0">
|
|
|
|
# #<CSV::Row "Name":"bar" "Count":"1">
|
|
|
|
# #<CSV::Row "Name":"baz" "Count":"2">
|
|
|
|
#
|
|
|
|
# Read rows from an \IO object:
|
|
|
|
# File.open(path) do |file|
|
2020-06-28 17:25:31 -04:00
|
|
|
# CSV.foreach(file, headers: true) {|row| p row }
|
2020-06-18 16:21:37 -04:00
|
|
|
# end
|
|
|
|
#
|
|
|
|
# Output:
|
|
|
|
# #<CSV::Row "Name":"foo" "Count":"0">
|
|
|
|
# #<CSV::Row "Name":"bar" "Count":"1">
|
|
|
|
# #<CSV::Row "Name":"baz" "Count":"2">
|
|
|
|
#
|
2020-05-26 17:13:05 -04:00
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Raises an exception if +path+ is a \String, but not the path to a readable file:
|
|
|
|
# # Raises Errno::ENOENT (No such file or directory @ rb_sysopen - nosuch.csv):
|
|
|
|
# CSV.foreach('nosuch.csv') {|row| }
|
|
|
|
#
|
|
|
|
# Raises an exception if +io+ is an \IO object, but not open for reading:
|
|
|
|
# io = File.open(path, 'w') {|row| }
|
|
|
|
# # Raises TypeError (no implicit conversion of nil into String):
|
|
|
|
# CSV.foreach(io) {|row| }
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-05-26 17:13:05 -04:00
|
|
|
# Raises an exception if +mode+ is invalid:
|
2020-06-13 14:26:30 -04:00
|
|
|
# # Raises ArgumentError (invalid access mode nosuch):
|
|
|
|
# CSV.foreach(path, 'nosuch') {|row| }
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
|
|
|
def foreach(path, mode="r", **options, &block)
|
|
|
|
return to_enum(__method__, path, mode, **options) unless block_given?
|
|
|
|
open(path, mode, **options) do |csv|
|
|
|
|
csv.each(&block)
|
|
|
|
end
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
|
|
|
# :call-seq:
|
2020-05-26 17:13:05 -04:00
|
|
|
# generate(csv_string, **options) {|csv| ... }
|
|
|
|
# generate(**options) {|csv| ... }
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-05-26 17:13:05 -04:00
|
|
|
# * Argument +csv_string+, if given, must be a \String object;
|
|
|
|
# defaults to a new empty \String.
|
2020-06-18 16:21:37 -04:00
|
|
|
# * Arguments +options+, if given, should be generating options.
|
|
|
|
# See {Options for Generating}[#class-CSV-label-Options+for+Generating].
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-05-26 17:13:05 -04:00
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Creates a new \CSV object via <tt>CSV.new(csv_string, **options)</tt>;
|
|
|
|
# calls the block with the \CSV object, which the block may modify;
|
|
|
|
# returns the \String generated from the \CSV object.
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-05-26 17:13:05 -04:00
|
|
|
# Note that a passed \String *is* modified by this method.
|
|
|
|
# Pass <tt>csv_string</tt>.dup if the \String must be preserved.
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
|
|
|
# This method has one additional option: <tt>:encoding</tt>,
|
|
|
|
# which sets the base Encoding for the output if no no +str+ is specified.
|
|
|
|
# CSV needs this hint if you plan to output non-ASCII compatible data.
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-05-26 17:13:05 -04:00
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Add lines:
|
|
|
|
# input_string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# output_string = CSV.generate(input_string) do |csv|
|
|
|
|
# csv << ['bat', 3]
|
|
|
|
# csv << ['bam', 4]
|
|
|
|
# end
|
|
|
|
# output_string # => "foo,0\nbar,1\nbaz,2\nbat,3\nbam,4\n"
|
|
|
|
# input_string # => "foo,0\nbar,1\nbaz,2\nbat,3\nbam,4\n"
|
|
|
|
# output_string.equal?(input_string) # => true # Same string, modified
|
|
|
|
#
|
|
|
|
# Add lines into new string, preserving old string:
|
|
|
|
# input_string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# output_string = CSV.generate(input_string.dup) do |csv|
|
|
|
|
# csv << ['bat', 3]
|
|
|
|
# csv << ['bam', 4]
|
|
|
|
# end
|
|
|
|
# output_string # => "foo,0\nbar,1\nbaz,2\nbat,3\nbam,4\n"
|
|
|
|
# input_string # => "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# output_string.equal?(input_string) # => false # Different strings
|
|
|
|
#
|
|
|
|
# Create lines from nothing:
|
|
|
|
# output_string = CSV.generate do |csv|
|
|
|
|
# csv << ['foo', 0]
|
|
|
|
# csv << ['bar', 1]
|
|
|
|
# csv << ['baz', 2]
|
|
|
|
# end
|
|
|
|
# output_string # => "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Raises an exception if +csv_string+ is not a \String object:
|
|
|
|
# # Raises TypeError (no implicit conversion of Integer into String)
|
|
|
|
# CSV.generate(0)
|
|
|
|
#
|
2019-10-12 01:03:21 -04:00
|
|
|
def generate(str=nil, **options)
|
2019-11-24 20:06:59 -05:00
|
|
|
encoding = options[:encoding]
|
2019-10-12 01:03:21 -04:00
|
|
|
# add a default empty String, if none was given
|
|
|
|
if str
|
|
|
|
str = StringIO.new(str)
|
|
|
|
str.seek(0, IO::SEEK_END)
|
2019-11-24 20:06:59 -05:00
|
|
|
str.set_encoding(encoding) if encoding
|
2019-10-12 01:03:21 -04:00
|
|
|
else
|
|
|
|
str = +""
|
|
|
|
str.force_encoding(encoding) if encoding
|
|
|
|
end
|
|
|
|
csv = new(str, **options) # wrap
|
|
|
|
yield csv # yield for appending
|
|
|
|
csv.string # return final String
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
|
2020-05-17 17:01:57 -04:00
|
|
|
# :call-seq:
|
|
|
|
# CSV.generate_line(ary)
|
|
|
|
# CSV.generate_line(ary, **options)
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-05-17 17:01:57 -04:00
|
|
|
# Returns the \String created by generating \CSV from +ary+
|
|
|
|
# using the specified +options+.
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-05-17 17:01:57 -04:00
|
|
|
# Argument +ary+ must be an \Array.
|
|
|
|
#
|
|
|
|
# Special options:
|
2021-09-11 18:34:15 -04:00
|
|
|
# * Option <tt>:row_sep</tt> defaults to <tt>"\n"> on Ruby 3.0 or later
|
|
|
|
# and <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>) otherwise.:
|
2020-05-17 17:01:57 -04:00
|
|
|
# $INPUT_RECORD_SEPARATOR # => "\n"
|
|
|
|
# * This method accepts an additional option, <tt>:encoding</tt>, which sets the base
|
|
|
|
# Encoding for the output. This method will try to guess your Encoding from
|
|
|
|
# the first non-+nil+ field in +row+, if possible, but you may need to use
|
|
|
|
# this parameter as a backup plan.
|
|
|
|
#
|
|
|
|
# For other +options+,
|
|
|
|
# see {Options for Generating}[#class-CSV-label-Options+for+Generating].
|
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Returns the \String generated from an \Array:
|
|
|
|
# CSV.generate_line(['foo', '0']) # => "foo,0\n"
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-05-17 17:01:57 -04:00
|
|
|
# ---
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-05-17 17:01:57 -04:00
|
|
|
# Raises an exception if +ary+ is not an \Array:
|
|
|
|
# # Raises NoMethodError (undefined method `find' for :foo:Symbol)
|
|
|
|
# CSV.generate_line(:foo)
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
|
|
|
def generate_line(row, **options)
|
2021-09-11 18:34:15 -04:00
|
|
|
options = {row_sep: InputRecordSeparator.value}.merge(options)
|
2019-01-25 01:49:59 -05:00
|
|
|
str = +""
|
2019-10-12 01:03:21 -04:00
|
|
|
if options[:encoding]
|
|
|
|
str.force_encoding(options[:encoding])
|
2020-07-18 17:25:05 -04:00
|
|
|
else
|
|
|
|
fallback_encoding = nil
|
|
|
|
output_encoding = nil
|
|
|
|
row.each do |field|
|
|
|
|
next unless field.is_a?(String)
|
|
|
|
fallback_encoding ||= field.encoding
|
|
|
|
next if field.ascii_only?
|
|
|
|
output_encoding = field.encoding
|
|
|
|
break
|
|
|
|
end
|
|
|
|
output_encoding ||= fallback_encoding
|
|
|
|
if output_encoding
|
|
|
|
str.force_encoding(output_encoding)
|
|
|
|
end
|
2019-10-12 01:03:21 -04:00
|
|
|
end
|
|
|
|
(new(str, **options) << row).string
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
|
|
|
# :call-seq:
|
2020-06-13 14:26:30 -04:00
|
|
|
# open(file_path, mode = "rb", **options ) -> new_csv
|
|
|
|
# open(io, mode = "rb", **options ) -> new_csv
|
2020-06-14 20:09:58 -04:00
|
|
|
# open(file_path, mode = "rb", **options ) { |csv| ... } -> object
|
|
|
|
# open(io, mode = "rb", **options ) { |csv| ... } -> object
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-01 19:30:25 -04:00
|
|
|
# possible options elements:
|
|
|
|
# hash form:
|
2020-06-03 23:33:47 -04:00
|
|
|
# :invalid => nil # raise error on invalid byte sequence (default)
|
|
|
|
# :invalid => :replace # replace invalid byte sequence
|
|
|
|
# :undef => :replace # replace undefined conversion
|
|
|
|
# :replace => string # replacement string ("?" or "\uFFFD" if not specified)
|
2020-06-01 19:30:25 -04:00
|
|
|
#
|
2020-06-13 14:26:30 -04:00
|
|
|
# * Argument +path+, if given, must be the path to a file.
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/arguments/io.rdoc
|
2020-06-13 14:26:30 -04:00
|
|
|
# * Argument +mode+, if given, must be a \File mode
|
|
|
|
# See {Open Mode}[IO.html#method-c-new-label-Open+Mode].
|
|
|
|
# * Arguments <tt>**options</tt> must be keyword options.
|
|
|
|
# See {Options for Generating}[#class-CSV-label-Options+for+Generating].
|
|
|
|
# * This method optionally accepts an additional <tt>:encoding</tt> option
|
|
|
|
# that you can use to specify the Encoding of the data read from +path+ or +io+.
|
|
|
|
# You must provide this unless your data is in the encoding
|
|
|
|
# given by <tt>Encoding::default_external</tt>.
|
|
|
|
# Parsing will use this to determine how to parse the data.
|
|
|
|
# You may provide a second Encoding to
|
|
|
|
# have the data transcoded as it is read. For example,
|
|
|
|
# encoding: 'UTF-32BE:UTF-8'
|
|
|
|
# would read +UTF-32BE+ data from the file
|
|
|
|
# but transcode it to +UTF-8+ before parsing.
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-13 14:26:30 -04:00
|
|
|
# ---
|
2020-05-12 17:42:45 -04:00
|
|
|
#
|
2020-06-13 14:26:30 -04:00
|
|
|
# These examples assume prior execution of:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# With no block given, returns a new \CSV object.
|
|
|
|
#
|
|
|
|
# Create a \CSV object using a file path:
|
|
|
|
# csv = CSV.open(path)
|
2020-07-02 22:06:26 -04:00
|
|
|
# csv # => #<CSV io_type:File io_path:"t.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-13 14:26:30 -04:00
|
|
|
# Create a \CSV object using an open \File:
|
|
|
|
# csv = CSV.open(File.open(path))
|
2020-07-02 22:06:26 -04:00
|
|
|
# csv # => #<CSV io_type:File io_path:"t.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
|
2020-06-13 14:26:30 -04:00
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
2020-06-14 20:09:58 -04:00
|
|
|
# With a block given, calls the block with the created \CSV object;
|
|
|
|
# returns the block's return value:
|
2020-06-13 14:26:30 -04:00
|
|
|
#
|
|
|
|
# Using a file path:
|
|
|
|
# csv = CSV.open(path) {|csv| p csv}
|
2020-07-02 22:06:26 -04:00
|
|
|
# csv # => #<CSV io_type:File io_path:"t.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
|
2020-06-13 14:26:30 -04:00
|
|
|
# Output:
|
2020-07-02 22:06:26 -04:00
|
|
|
# #<CSV io_type:File io_path:"t.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
|
2020-06-13 14:26:30 -04:00
|
|
|
#
|
|
|
|
# Using an open \File:
|
|
|
|
# csv = CSV.open(File.open(path)) {|csv| p csv}
|
2020-07-02 22:06:26 -04:00
|
|
|
# csv # => #<CSV io_type:File io_path:"t.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
|
2020-06-13 14:26:30 -04:00
|
|
|
# Output:
|
2020-07-02 22:06:26 -04:00
|
|
|
# #<CSV io_type:File io_path:"t.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-13 14:26:30 -04:00
|
|
|
# ---
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-13 14:26:30 -04:00
|
|
|
# Raises an exception if the argument is not a \String object or \IO object:
|
|
|
|
# # Raises TypeError (no implicit conversion of Symbol into String)
|
|
|
|
# CSV.open(:foo)
|
2019-10-12 01:03:21 -04:00
|
|
|
def open(filename, mode="r", **options)
|
|
|
|
# wrap a File opened with the remaining +args+ with no newline
|
|
|
|
# decorator
|
|
|
|
file_opts = {universal_newline: false}.merge(options)
|
2020-06-03 23:33:47 -04:00
|
|
|
options.delete(:invalid)
|
2020-06-01 19:30:25 -04:00
|
|
|
options.delete(:undef)
|
|
|
|
options.delete(:replace)
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2019-10-12 01:03:21 -04:00
|
|
|
begin
|
|
|
|
f = File.open(filename, mode, **file_opts)
|
|
|
|
rescue ArgumentError => e
|
|
|
|
raise unless /needs binmode/.match?(e.message) and mode == "r"
|
|
|
|
mode = "rb"
|
|
|
|
file_opts = {encoding: Encoding.default_external}.merge(file_opts)
|
|
|
|
retry
|
|
|
|
end
|
|
|
|
begin
|
|
|
|
csv = new(f, **options)
|
|
|
|
rescue Exception
|
|
|
|
f.close
|
|
|
|
raise
|
|
|
|
end
|
2017-07-28 03:46:20 -04:00
|
|
|
|
2019-10-12 01:03:21 -04:00
|
|
|
# handle blocks like Ruby's open(), not like the CSV library
|
|
|
|
if block_given?
|
|
|
|
begin
|
|
|
|
yield csv
|
|
|
|
ensure
|
|
|
|
csv.close
|
|
|
|
end
|
|
|
|
else
|
|
|
|
csv
|
|
|
|
end
|
2014-05-29 06:44:59 -04:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
|
|
|
# :call-seq:
|
2020-06-13 14:26:30 -04:00
|
|
|
# parse(string) -> array_of_arrays
|
|
|
|
# parse(io) -> array_of_arrays
|
2020-06-16 19:35:28 -04:00
|
|
|
# parse(string, headers: ..., **options) -> csv_table
|
|
|
|
# parse(io, headers: ..., **options) -> csv_table
|
2020-06-28 17:25:31 -04:00
|
|
|
# parse(string, **options) {|row| ... }
|
|
|
|
# parse(io, **options) {|row| ... }
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-13 14:26:30 -04:00
|
|
|
# Parses +string+ or +io+ using the specified +options+.
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-13 14:26:30 -04:00
|
|
|
# - Argument +string+ should be a \String object;
|
|
|
|
# it will be put into a new StringIO object positioned at the beginning.
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/arguments/io.rdoc
|
2020-06-13 14:26:30 -04:00
|
|
|
# - Argument +options+: see {Options for Parsing}[#class-CSV-label-Options+for+Parsing]
|
|
|
|
#
|
2020-06-18 16:21:37 -04:00
|
|
|
# ====== Without Option +headers+
|
|
|
|
#
|
2020-06-28 17:25:31 -04:00
|
|
|
# Without {option +headers+}[#class-CSV-label-Option+headers] case.
|
2020-06-18 16:21:37 -04:00
|
|
|
#
|
2020-06-13 14:26:30 -04:00
|
|
|
# These examples assume prior execution of:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# With no block given, returns an \Array of Arrays formed from the source.
|
|
|
|
#
|
|
|
|
# Parse a \String:
|
|
|
|
# a_of_a = CSV.parse(string)
|
|
|
|
# a_of_a # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
|
|
|
#
|
|
|
|
# Parse an open \File:
|
2020-06-16 19:35:28 -04:00
|
|
|
# a_of_a = File.open(path) do |file|
|
|
|
|
# CSV.parse(file)
|
|
|
|
# end
|
2020-06-13 14:26:30 -04:00
|
|
|
# a_of_a # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# With a block given, calls the block with each parsed row:
|
|
|
|
#
|
|
|
|
# Parse a \String:
|
2020-06-28 17:25:31 -04:00
|
|
|
# CSV.parse(string) {|row| p row }
|
2020-06-13 14:26:30 -04:00
|
|
|
#
|
|
|
|
# Output:
|
|
|
|
# ["foo", "0"]
|
|
|
|
# ["bar", "1"]
|
|
|
|
# ["baz", "2"]
|
|
|
|
#
|
|
|
|
# Parse an open \File:
|
2020-06-16 19:35:28 -04:00
|
|
|
# File.open(path) do |file|
|
2020-06-28 17:25:31 -04:00
|
|
|
# CSV.parse(file) {|row| p row }
|
2020-06-16 19:35:28 -04:00
|
|
|
# end
|
2020-06-13 14:26:30 -04:00
|
|
|
#
|
|
|
|
# Output:
|
|
|
|
# ["foo", "0"]
|
|
|
|
# ["bar", "1"]
|
|
|
|
# ["baz", "2"]
|
|
|
|
#
|
2020-06-16 19:35:28 -04:00
|
|
|
# ====== With Option +headers+
|
|
|
|
#
|
2020-06-28 17:25:31 -04:00
|
|
|
# With {option +headers+}[#class-CSV-label-Option+headers] case.
|
2020-06-16 19:35:28 -04:00
|
|
|
#
|
2020-06-18 16:21:37 -04:00
|
|
|
# These examples assume prior execution of:
|
|
|
|
# string = "Name,Count\nfoo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
#
|
2020-06-16 19:35:28 -04:00
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# With no block given, returns a CSV::Table object formed from the source.
|
|
|
|
#
|
|
|
|
# Parse a \String:
|
|
|
|
# csv_table = CSV.parse(string, headers: ['Name', 'Count'])
|
|
|
|
# csv_table # => #<CSV::Table mode:col_or_row row_count:5>
|
|
|
|
#
|
|
|
|
# Parse an open \File:
|
|
|
|
# csv_table = File.open(path) do |file|
|
|
|
|
# CSV.parse(file, headers: ['Name', 'Count'])
|
|
|
|
# end
|
|
|
|
# csv_table # => #<CSV::Table mode:col_or_row row_count:4>
|
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# With a block given, calls the block with each parsed row,
|
|
|
|
# which has been formed into a CSV::Row object:
|
|
|
|
#
|
|
|
|
# Parse a \String:
|
2020-06-28 17:25:31 -04:00
|
|
|
# CSV.parse(string, headers: ['Name', 'Count']) {|row| p row }
|
2020-06-16 19:35:28 -04:00
|
|
|
#
|
|
|
|
# Output:
|
|
|
|
# # <CSV::Row "Name":"foo" "Count":"0">
|
|
|
|
# # <CSV::Row "Name":"bar" "Count":"1">
|
|
|
|
# # <CSV::Row "Name":"baz" "Count":"2">
|
|
|
|
#
|
|
|
|
# Parse an open \File:
|
|
|
|
# File.open(path) do |file|
|
2020-06-28 17:25:31 -04:00
|
|
|
# CSV.parse(file, headers: ['Name', 'Count']) {|row| p row }
|
2020-06-16 19:35:28 -04:00
|
|
|
# end
|
|
|
|
#
|
|
|
|
# Output:
|
|
|
|
# # <CSV::Row "Name":"foo" "Count":"0">
|
|
|
|
# # <CSV::Row "Name":"bar" "Count":"1">
|
|
|
|
# # <CSV::Row "Name":"baz" "Count":"2">
|
|
|
|
#
|
2020-06-13 14:26:30 -04:00
|
|
|
# ---
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-13 14:26:30 -04:00
|
|
|
# Raises an exception if the argument is not a \String object or \IO object:
|
|
|
|
# # Raises NoMethodError (undefined method `close' for :foo:Symbol)
|
|
|
|
# CSV.parse(:foo)
|
2019-10-12 01:03:21 -04:00
|
|
|
def parse(str, **options, &block)
|
|
|
|
csv = new(str, **options)
|
|
|
|
|
|
|
|
return csv.each(&block) if block_given?
|
|
|
|
|
|
|
|
# slurp contents, if no block is given
|
2007-12-24 21:46:26 -05:00
|
|
|
begin
|
2019-10-12 01:03:21 -04:00
|
|
|
csv.read
|
2007-12-24 21:46:26 -05:00
|
|
|
ensure
|
2017-08-29 06:22:47 -04:00
|
|
|
csv.close
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
end
|
2018-05-09 00:39:16 -04:00
|
|
|
|
2020-05-17 17:01:57 -04:00
|
|
|
# :call-seq:
|
2020-06-18 18:02:02 -04:00
|
|
|
# CSV.parse_line(string) -> new_array or nil
|
|
|
|
# CSV.parse_line(io) -> new_array or nil
|
|
|
|
# CSV.parse_line(string, **options) -> new_array or nil
|
|
|
|
# CSV.parse_line(io, **options) -> new_array or nil
|
|
|
|
# CSV.parse_line(string, headers: true, **options) -> csv_row or nil
|
|
|
|
# CSV.parse_line(io, headers: true, **options) -> csv_row or nil
|
|
|
|
#
|
|
|
|
# Returns the data created by parsing the first line of +string+ or +io+
|
2020-05-17 17:01:57 -04:00
|
|
|
# using the specified +options+.
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-11 17:31:52 -04:00
|
|
|
# - Argument +string+ should be a \String object;
|
|
|
|
# it will be put into a new StringIO object positioned at the beginning.
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/arguments/io.rdoc
|
2020-06-13 14:26:30 -04:00
|
|
|
# - Argument +options+: see {Options for Parsing}[#class-CSV-label-Options+for+Parsing]
|
2020-05-17 17:01:57 -04:00
|
|
|
#
|
2020-06-18 18:02:02 -04:00
|
|
|
# ====== Without Option +headers+
|
2020-05-17 17:01:57 -04:00
|
|
|
#
|
2020-06-18 18:02:02 -04:00
|
|
|
# Without option +headers+, returns the first row as a new \Array.
|
2020-05-17 17:01:57 -04:00
|
|
|
#
|
2020-06-18 18:02:02 -04:00
|
|
|
# These examples assume prior execution of:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
#
|
|
|
|
# Parse the first line from a \String object:
|
|
|
|
# CSV.parse_line(string) # => ["foo", "0"]
|
|
|
|
#
|
|
|
|
# Parse the first line from a File object:
|
|
|
|
# File.open(path) do |file|
|
|
|
|
# CSV.parse_line(file) # => ["foo", "0"]
|
|
|
|
# end # => ["foo", "0"]
|
2020-05-17 17:01:57 -04:00
|
|
|
#
|
|
|
|
# Returns +nil+ if the argument is an empty \String:
|
|
|
|
# CSV.parse_line('') # => nil
|
|
|
|
#
|
2020-06-18 18:02:02 -04:00
|
|
|
# ====== With Option +headers+
|
|
|
|
#
|
|
|
|
# With {option +headers+}[#class-CSV-label-Option+headers],
|
|
|
|
# returns the first row as a CSV::Row object.
|
|
|
|
#
|
|
|
|
# These examples assume prior execution of:
|
|
|
|
# string = "Name,Count\nfoo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
#
|
|
|
|
# Parse the first line from a \String object:
|
|
|
|
# CSV.parse_line(string, headers: true) # => #<CSV::Row "Name":"foo" "Count":"0">
|
|
|
|
#
|
|
|
|
# Parse the first line from a File object:
|
|
|
|
# File.open(path) do |file|
|
|
|
|
# CSV.parse_line(file, headers: true)
|
|
|
|
# end # => #<CSV::Row "Name":"foo" "Count":"0">
|
|
|
|
#
|
2020-05-17 17:01:57 -04:00
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Raises an exception if the argument is +nil+:
|
|
|
|
# # Raises ArgumentError (Cannot parse nil as CSV):
|
|
|
|
# CSV.parse_line(nil)
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
|
|
|
def parse_line(line, **options)
|
2020-05-16 23:02:55 -04:00
|
|
|
new(line, **options).each.first
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-26 17:29:57 -04:00
|
|
|
# :call-seq:
|
|
|
|
# read(source, **options) -> array_of_arrays
|
|
|
|
# read(source, headers: true, **options) -> csv_table
|
|
|
|
#
|
|
|
|
# Opens the given +source+ with the given +options+ (see CSV.open),
|
|
|
|
# reads the source (see CSV#read), and returns the result,
|
|
|
|
# which will be either an \Array of Arrays or a CSV::Table.
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-26 17:29:57 -04:00
|
|
|
# Without headers:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
# CSV.read(path) # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
|
|
|
#
|
|
|
|
# With headers:
|
|
|
|
# string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
# CSV.read(path, headers: true) # => #<CSV::Table mode:col_or_row row_count:4>
|
2019-10-12 01:03:21 -04:00
|
|
|
def read(path, **options)
|
|
|
|
open(path, **options) { |csv| csv.read }
|
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-06-26 17:29:57 -04:00
|
|
|
# :call-seq:
|
|
|
|
# CSV.readlines(source, **options)
|
|
|
|
#
|
|
|
|
# Alias for CSV.read.
|
2019-10-12 01:03:21 -04:00
|
|
|
def readlines(path, **options)
|
|
|
|
read(path, **options)
|
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-06-26 17:29:57 -04:00
|
|
|
# :call-seq:
|
|
|
|
# CSV.table(source, **options)
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-26 17:29:57 -04:00
|
|
|
# Calls CSV.read with +source+, +options+, and certain default options:
|
|
|
|
# - +headers+: +true+
|
2020-12-26 23:47:42 -05:00
|
|
|
# - +converters+: +:numeric+
|
2020-06-26 17:29:57 -04:00
|
|
|
# - +header_converters+: +:symbol+
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-26 17:29:57 -04:00
|
|
|
# Returns a CSV::Table object.
|
2019-10-12 01:03:21 -04:00
|
|
|
#
|
2020-06-26 17:29:57 -04:00
|
|
|
# Example:
|
|
|
|
# string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
2020-06-30 21:30:49 -04:00
|
|
|
# CSV.table(path) # => #<CSV::Table mode:col_or_row row_count:4>
|
2019-10-12 01:03:21 -04:00
|
|
|
def table(path, **options)
|
|
|
|
default_options = {
|
|
|
|
headers: true,
|
|
|
|
converters: :numeric,
|
|
|
|
header_converters: :symbol,
|
|
|
|
}
|
|
|
|
options = default_options.merge(options)
|
|
|
|
read(path, **options)
|
|
|
|
end
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-05-17 17:01:57 -04:00
|
|
|
# :call-seq:
|
|
|
|
# CSV.new(string)
|
|
|
|
# CSV.new(io)
|
|
|
|
# CSV.new(string, **options)
|
|
|
|
# CSV.new(io, **options)
|
|
|
|
#
|
|
|
|
# Returns the new \CSV object created using +string+ or +io+
|
|
|
|
# and the specified +options+.
|
|
|
|
#
|
2020-06-11 17:31:52 -04:00
|
|
|
# - Argument +string+ should be a \String object;
|
|
|
|
# it will be put into a new StringIO object positioned at the beginning.
|
2020-07-20 03:02:47 -04:00
|
|
|
# :include: ../doc/csv/arguments/io.rdoc
|
2020-06-11 17:31:52 -04:00
|
|
|
# - Argument +options+: See:
|
|
|
|
# * {Options for Parsing}[#class-CSV-label-Options+for+Parsing]
|
|
|
|
# * {Options for Generating}[#class-CSV-label-Options+for+Generating]
|
|
|
|
# For performance reasons, the options cannot be overridden
|
|
|
|
# in a \CSV object, so those specified here will endure.
|
2020-05-17 17:01:57 -04:00
|
|
|
#
|
2020-06-11 17:31:52 -04:00
|
|
|
# In addition to the \CSV instance methods, several \IO methods are delegated.
|
|
|
|
# See {Delegated Methods}[#class-CSV-label-Delegated+Methods].
|
2020-05-17 17:01:57 -04:00
|
|
|
#
|
|
|
|
# ---
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-05-17 17:01:57 -04:00
|
|
|
# Create a \CSV object from a \String object:
|
|
|
|
# csv = CSV.new('foo,0')
|
|
|
|
# csv # => #<CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-05-17 17:01:57 -04:00
|
|
|
# Create a \CSV object from a \File object:
|
|
|
|
# File.write('t.csv', 'foo,0')
|
|
|
|
# csv = CSV.new(File.open('t.csv'))
|
|
|
|
# csv # => #<CSV io_type:File io_path:"t.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-05-17 17:01:57 -04:00
|
|
|
# ---
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-05-17 17:01:57 -04:00
|
|
|
# Raises an exception if the argument is +nil+:
|
|
|
|
# # Raises ArgumentError (Cannot parse nil as CSV):
|
|
|
|
# CSV.new(nil)
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2018-12-23 02:00:35 -05:00
|
|
|
def initialize(data,
|
|
|
|
col_sep: ",",
|
|
|
|
row_sep: :auto,
|
|
|
|
quote_char: '"',
|
|
|
|
field_size_limit: nil,
|
|
|
|
converters: nil,
|
|
|
|
unconverted_fields: nil,
|
|
|
|
headers: false,
|
|
|
|
return_headers: false,
|
|
|
|
write_headers: nil,
|
|
|
|
header_converters: nil,
|
|
|
|
skip_blanks: false,
|
|
|
|
force_quotes: false,
|
|
|
|
skip_lines: nil,
|
|
|
|
liberal_parsing: false,
|
|
|
|
internal_encoding: nil,
|
|
|
|
external_encoding: nil,
|
|
|
|
encoding: nil,
|
2018-05-09 00:39:16 -04:00
|
|
|
nil_value: nil,
|
2019-01-25 01:49:59 -05:00
|
|
|
empty_value: "",
|
[ruby/csv] Add handling for ambiguous parsing options (https://github.com/ruby/csv/pull/226)
GitHub: fix GH-225
With Ruby 3.0.2 and csv 3.2.1, the file
```ruby
require "csv"
File.open("example.tsv", "w") { |f| f.puts("foo\t\tbar") }
CSV.read("example.tsv", col_sep: "\t", strip: true)
```
produces the error
```
lib/csv/parser.rb:935:in `parse_quotable_robust': TODO: Meaningful
message in line 1. (CSV::MalformedCSVError)
```
However, the CSV in this example is not malformed; instead, ambiguous
options were provided to the parser. It is not obvious (to me) whether
the string should be parsed as
- `["foo\t\tbar"]`,
- `["foo", "bar"]`,
- `["foo", "", "bar"]`, or
- `["foo", nil, "bar"]`.
This commit adds code that raises an exception when this situation is
encountered. Specifically, it checks if the column separator either ends
with or starts with the characters that would be stripped away.
This commit also adds unit tests and updates the documentation.
https://github.com/ruby/csv/commit/cc317dd42d
2021-11-18 16:20:09 -05:00
|
|
|
strip: false,
|
2019-04-14 17:01:51 -04:00
|
|
|
quote_empty: true,
|
|
|
|
write_converters: nil,
|
|
|
|
write_nil_value: nil,
|
[ruby/csv] Add handling for ambiguous parsing options (https://github.com/ruby/csv/pull/226)
GitHub: fix GH-225
With Ruby 3.0.2 and csv 3.2.1, the file
```ruby
require "csv"
File.open("example.tsv", "w") { |f| f.puts("foo\t\tbar") }
CSV.read("example.tsv", col_sep: "\t", strip: true)
```
produces the error
```
lib/csv/parser.rb:935:in `parse_quotable_robust': TODO: Meaningful
message in line 1. (CSV::MalformedCSVError)
```
However, the CSV in this example is not malformed; instead, ambiguous
options were provided to the parser. It is not obvious (to me) whether
the string should be parsed as
- `["foo\t\tbar"]`,
- `["foo", "bar"]`,
- `["foo", "", "bar"]`, or
- `["foo", nil, "bar"]`.
This commit adds code that raises an exception when this situation is
encountered. Specifically, it checks if the column separator either ends
with or starts with the characters that would be stripped away.
This commit also adds unit tests and updates the documentation.
https://github.com/ruby/csv/commit/cc317dd42d
2021-11-18 16:20:09 -05:00
|
|
|
write_empty_value: "")
|
2017-07-28 03:46:20 -04:00
|
|
|
raise ArgumentError.new("Cannot parse nil as CSV") if data.nil?
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2019-10-12 01:03:21 -04:00
|
|
|
if data.is_a?(String)
|
|
|
|
@io = StringIO.new(data)
|
|
|
|
@io.set_encoding(encoding || data.encoding)
|
|
|
|
else
|
|
|
|
@io = data
|
|
|
|
end
|
2018-05-09 00:39:16 -04:00
|
|
|
@encoding = determine_encoding(encoding, internal_encoding)
|
2018-12-23 02:00:35 -05:00
|
|
|
|
|
|
|
@base_fields_converter_options = {
|
|
|
|
nil_value: nil_value,
|
|
|
|
empty_value: empty_value,
|
|
|
|
}
|
2019-04-14 17:01:51 -04:00
|
|
|
@write_fields_converter_options = {
|
|
|
|
nil_value: write_nil_value,
|
|
|
|
empty_value: write_empty_value,
|
|
|
|
}
|
2018-12-23 02:00:35 -05:00
|
|
|
@initial_converters = converters
|
|
|
|
@initial_header_converters = header_converters
|
2019-04-14 17:01:51 -04:00
|
|
|
@initial_write_converters = write_converters
|
2018-12-23 02:00:35 -05:00
|
|
|
|
|
|
|
@parser_options = {
|
|
|
|
column_separator: col_sep,
|
|
|
|
row_separator: row_sep,
|
|
|
|
quote_character: quote_char,
|
|
|
|
field_size_limit: field_size_limit,
|
|
|
|
unconverted_fields: unconverted_fields,
|
|
|
|
headers: headers,
|
|
|
|
return_headers: return_headers,
|
|
|
|
skip_blanks: skip_blanks,
|
|
|
|
skip_lines: skip_lines,
|
|
|
|
liberal_parsing: liberal_parsing,
|
|
|
|
encoding: @encoding,
|
|
|
|
nil_value: nil_value,
|
|
|
|
empty_value: empty_value,
|
2019-04-14 17:01:51 -04:00
|
|
|
strip: strip,
|
2018-12-23 02:00:35 -05:00
|
|
|
}
|
|
|
|
@parser = nil
|
2019-04-17 09:02:40 -04:00
|
|
|
@parser_enumerator = nil
|
|
|
|
@eof_error = nil
|
2018-12-23 02:00:35 -05:00
|
|
|
|
|
|
|
@writer_options = {
|
|
|
|
encoding: @encoding,
|
|
|
|
force_encoding: (not encoding.nil?),
|
|
|
|
force_quotes: force_quotes,
|
|
|
|
headers: headers,
|
|
|
|
write_headers: write_headers,
|
|
|
|
column_separator: col_sep,
|
|
|
|
row_separator: row_sep,
|
|
|
|
quote_character: quote_char,
|
2019-01-25 01:49:59 -05:00
|
|
|
quote_empty: quote_empty,
|
2018-12-23 02:00:35 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
@writer = nil
|
|
|
|
writer if @writer_options[:write_headers]
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.col_sep -> string
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the encoded column separator; used for parsing and writing;
|
|
|
|
# see {Option +col_sep+}[#class-CSV-label-Option+col_sep]:
|
|
|
|
# CSV.new('').col_sep # => ","
|
2018-12-23 02:00:35 -05:00
|
|
|
def col_sep
|
|
|
|
parser.column_separator
|
|
|
|
end
|
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.row_sep -> string
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the encoded row separator; used for parsing and writing;
|
|
|
|
# see {Option +row_sep+}[#class-CSV-label-Option+row_sep]:
|
|
|
|
# CSV.new('').row_sep # => "\n"
|
2018-12-23 02:00:35 -05:00
|
|
|
def row_sep
|
|
|
|
parser.row_separator
|
|
|
|
end
|
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.quote_char -> character
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the encoded quote character; used for parsing and writing;
|
|
|
|
# see {Option +quote_char+}[#class-CSV-label-Option+quote_char]:
|
|
|
|
# CSV.new('').quote_char # => "\""
|
2018-12-23 02:00:35 -05:00
|
|
|
def quote_char
|
|
|
|
parser.quote_character
|
|
|
|
end
|
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.field_size_limit -> integer or nil
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the limit for field size; used for parsing;
|
|
|
|
# see {Option +field_size_limit+}[#class-CSV-label-Option+field_size_limit]:
|
|
|
|
# CSV.new('').field_size_limit # => nil
|
2018-12-23 02:00:35 -05:00
|
|
|
def field_size_limit
|
|
|
|
parser.field_size_limit
|
|
|
|
end
|
2012-08-20 16:52:36 -04:00
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.skip_lines -> regexp or nil
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the \Regexp used to identify comment lines; used for parsing;
|
|
|
|
# see {Option +skip_lines+}[#class-CSV-label-Option+skip_lines]:
|
|
|
|
# CSV.new('').skip_lines # => nil
|
2018-12-23 02:00:35 -05:00
|
|
|
def skip_lines
|
|
|
|
parser.skip_lines
|
|
|
|
end
|
2012-08-20 16:52:36 -04:00
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.converters -> array
|
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# Returns an \Array containing field converters;
|
|
|
|
# see {Field Converters}[#class-CSV-label-Field+Converters]:
|
|
|
|
# csv = CSV.new('')
|
|
|
|
# csv.converters # => []
|
|
|
|
# csv.convert(:integer)
|
|
|
|
# csv.converters # => [:integer]
|
|
|
|
# csv.convert(proc {|x| x.to_s })
|
|
|
|
# csv.converters
|
2021-10-10 22:21:42 -04:00
|
|
|
#
|
|
|
|
# Notes that you need to call
|
|
|
|
# +Ractor.make_shareable(CSV::Converters)+ on the main Ractor to use
|
|
|
|
# this method.
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
def converters
|
2019-04-14 17:01:51 -04:00
|
|
|
parser_fields_converter.map do |converter|
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
name = Converters.rassoc(converter)
|
|
|
|
name ? name.first : converter
|
|
|
|
end
|
|
|
|
end
|
2019-10-12 01:03:21 -04:00
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.unconverted_fields? -> object
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the value that determines whether unconverted fields are to be
|
|
|
|
# available; used for parsing;
|
|
|
|
# see {Option +unconverted_fields+}[#class-CSV-label-Option+unconverted_fields]:
|
|
|
|
# CSV.new('').unconverted_fields? # => nil
|
2018-12-23 02:00:35 -05:00
|
|
|
def unconverted_fields?
|
|
|
|
parser.unconverted_fields?
|
|
|
|
end
|
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.headers -> object
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the value that determines whether headers are used; used for parsing;
|
|
|
|
# see {Option +headers+}[#class-CSV-label-Option+headers]:
|
|
|
|
# CSV.new('').headers # => nil
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
def headers
|
2018-12-23 02:00:35 -05:00
|
|
|
if @writer
|
|
|
|
@writer.headers
|
|
|
|
else
|
|
|
|
parsed_headers = parser.headers
|
|
|
|
return parsed_headers if parsed_headers
|
|
|
|
raw_headers = @parser_options[:headers]
|
|
|
|
raw_headers = nil if raw_headers == false
|
|
|
|
raw_headers
|
|
|
|
end
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
end
|
2020-07-02 22:06:26 -04:00
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.return_headers? -> true or false
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the value that determines whether headers are to be returned; used for parsing;
|
|
|
|
# see {Option +return_headers+}[#class-CSV-label-Option+return_headers]:
|
|
|
|
# CSV.new('').return_headers? # => false
|
2018-12-23 02:00:35 -05:00
|
|
|
def return_headers?
|
|
|
|
parser.return_headers?
|
|
|
|
end
|
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.write_headers? -> true or false
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the value that determines whether headers are to be written; used for generating;
|
|
|
|
# see {Option +write_headers+}[#class-CSV-label-Option+write_headers]:
|
|
|
|
# CSV.new('').write_headers? # => nil
|
2018-12-23 02:00:35 -05:00
|
|
|
def write_headers?
|
|
|
|
@writer_options[:write_headers]
|
|
|
|
end
|
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.header_converters -> array
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns an \Array containing header converters; used for parsing;
|
2020-07-15 16:37:17 -04:00
|
|
|
# see {Header Converters}[#class-CSV-label-Header+Converters]:
|
2020-07-02 22:06:26 -04:00
|
|
|
# CSV.new('').header_converters # => []
|
2021-10-10 22:21:42 -04:00
|
|
|
#
|
|
|
|
# Notes that you need to call
|
|
|
|
# +Ractor.make_shareable(CSV::HeaderConverters)+ on the main Ractor
|
|
|
|
# to use this method.
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
def header_converters
|
2018-12-23 02:00:35 -05:00
|
|
|
header_fields_converter.map do |converter|
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
name = HeaderConverters.rassoc(converter)
|
|
|
|
name ? name.first : converter
|
|
|
|
end
|
|
|
|
end
|
2018-12-23 02:00:35 -05:00
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.skip_blanks? -> true or false
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the value that determines whether blank lines are to be ignored; used for parsing;
|
|
|
|
# see {Option +skip_blanks+}[#class-CSV-label-Option+skip_blanks]:
|
|
|
|
# CSV.new('').skip_blanks? # => false
|
2018-12-23 02:00:35 -05:00
|
|
|
def skip_blanks?
|
|
|
|
parser.skip_blanks?
|
|
|
|
end
|
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.force_quotes? -> true or false
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the value that determines whether all output fields are to be quoted;
|
|
|
|
# used for generating;
|
|
|
|
# see {Option +force_quotes+}[#class-CSV-label-Option+force_quotes]:
|
|
|
|
# CSV.new('').force_quotes? # => false
|
2018-12-23 02:00:35 -05:00
|
|
|
def force_quotes?
|
|
|
|
@writer_options[:force_quotes]
|
|
|
|
end
|
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.liberal_parsing? -> true or false
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the value that determines whether illegal input is to be handled; used for parsing;
|
|
|
|
# see {Option +liberal_parsing+}[#class-CSV-label-Option+liberal_parsing]:
|
|
|
|
# CSV.new('').liberal_parsing? # => false
|
2018-12-23 02:00:35 -05:00
|
|
|
def liberal_parsing?
|
|
|
|
parser.liberal_parsing?
|
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.encoding -> endcoding
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the encoding used for parsing and generating;
|
2020-07-15 16:37:17 -04:00
|
|
|
# see {Character Encodings (M17n or Multilingualization)}[#class-CSV-label-Character+Encodings+-28M17n+or+Multilingualization-29]:
|
2020-07-02 22:06:26 -04:00
|
|
|
# CSV.new('').encoding # => #<Encoding:UTF-8>
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
attr_reader :encoding
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.line_no -> integer
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the count of the rows parsed or generated.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Parsing:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
# CSV.open(path) do |csv|
|
|
|
|
# csv.each do |row|
|
|
|
|
# p [csv.lineno, row]
|
|
|
|
# end
|
|
|
|
# end
|
|
|
|
# Output:
|
|
|
|
# [1, ["foo", "0"]]
|
|
|
|
# [2, ["bar", "1"]]
|
|
|
|
# [3, ["baz", "2"]]
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Generating:
|
|
|
|
# CSV.generate do |csv|
|
|
|
|
# p csv.lineno; csv << ['foo', 0]
|
|
|
|
# p csv.lineno; csv << ['bar', 1]
|
|
|
|
# p csv.lineno; csv << ['baz', 2]
|
|
|
|
# end
|
|
|
|
# Output:
|
|
|
|
# 0
|
|
|
|
# 1
|
|
|
|
# 2
|
2018-12-23 02:00:35 -05:00
|
|
|
def lineno
|
|
|
|
if @writer
|
|
|
|
@writer.lineno
|
|
|
|
else
|
|
|
|
parser.lineno
|
|
|
|
end
|
|
|
|
end
|
|
|
|
|
2020-07-04 10:25:31 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.line -> array
|
|
|
|
#
|
2020-07-02 22:06:26 -04:00
|
|
|
# Returns the line most recently read:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
# CSV.open(path) do |csv|
|
|
|
|
# csv.each do |row|
|
|
|
|
# p [csv.lineno, csv.line]
|
|
|
|
# end
|
|
|
|
# end
|
|
|
|
# Output:
|
|
|
|
# [1, "foo,0\n"]
|
|
|
|
# [2, "bar,1\n"]
|
|
|
|
# [3, "baz,2\n"]
|
2018-12-23 02:00:35 -05:00
|
|
|
def line
|
|
|
|
parser.line
|
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2007-12-24 21:46:26 -05:00
|
|
|
### IO and StringIO Delegation ###
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2007-12-24 21:46:26 -05:00
|
|
|
extend Forwardable
|
2019-04-14 17:01:51 -04:00
|
|
|
def_delegators :@io, :binmode, :close, :close_read, :close_write,
|
|
|
|
:closed?, :external_encoding, :fcntl,
|
|
|
|
:fileno, :flush, :fsync, :internal_encoding,
|
|
|
|
:isatty, :pid, :pos, :pos=, :reopen,
|
|
|
|
:seek, :string, :sync, :sync=, :tell,
|
|
|
|
:truncate, :tty?
|
|
|
|
|
|
|
|
def binmode?
|
|
|
|
if @io.respond_to?(:binmode?)
|
|
|
|
@io.binmode?
|
|
|
|
else
|
|
|
|
false
|
|
|
|
end
|
|
|
|
end
|
|
|
|
|
|
|
|
def flock(*args)
|
|
|
|
raise NotImplementedError unless @io.respond_to?(:flock)
|
|
|
|
@io.flock(*args)
|
|
|
|
end
|
|
|
|
|
|
|
|
def ioctl(*args)
|
|
|
|
raise NotImplementedError unless @io.respond_to?(:ioctl)
|
|
|
|
@io.ioctl(*args)
|
|
|
|
end
|
|
|
|
|
|
|
|
def path
|
|
|
|
@io.path if @io.respond_to?(:path)
|
|
|
|
end
|
|
|
|
|
|
|
|
def stat(*args)
|
|
|
|
raise NotImplementedError unless @io.respond_to?(:stat)
|
|
|
|
@io.stat(*args)
|
|
|
|
end
|
|
|
|
|
|
|
|
def to_i
|
|
|
|
raise NotImplementedError unless @io.respond_to?(:to_i)
|
|
|
|
@io.to_i
|
|
|
|
end
|
|
|
|
|
|
|
|
def to_io
|
|
|
|
@io.respond_to?(:to_io) ? @io.to_io : @io
|
|
|
|
end
|
|
|
|
|
|
|
|
def eof?
|
2019-04-17 09:02:40 -04:00
|
|
|
return false if @eof_error
|
2019-04-14 17:01:51 -04:00
|
|
|
begin
|
|
|
|
parser_enumerator.peek
|
|
|
|
false
|
2019-04-17 09:02:40 -04:00
|
|
|
rescue MalformedCSVError => error
|
|
|
|
@eof_error = error
|
|
|
|
false
|
2019-04-14 17:01:51 -04:00
|
|
|
rescue StopIteration
|
|
|
|
true
|
|
|
|
end
|
|
|
|
end
|
|
|
|
alias_method :eof, :eof?
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2007-12-24 21:46:26 -05:00
|
|
|
# Rewinds the underlying IO object and resets CSV's lineno() counter.
|
|
|
|
def rewind
|
2018-12-23 02:00:35 -05:00
|
|
|
@parser = nil
|
|
|
|
@parser_enumerator = nil
|
2019-04-17 09:02:40 -04:00
|
|
|
@eof_error = nil
|
2018-12-23 02:00:35 -05:00
|
|
|
@writer.rewind if @writer
|
2007-12-24 21:46:26 -05:00
|
|
|
@io.rewind
|
|
|
|
end
|
|
|
|
|
|
|
|
### End Delegation ###
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-07-15 16:37:17 -04:00
|
|
|
# :call-seq:
|
2020-07-19 20:42:28 -04:00
|
|
|
# csv << row -> self
|
2020-07-15 16:37:17 -04:00
|
|
|
#
|
|
|
|
# Appends a row to +self+.
|
|
|
|
#
|
|
|
|
# - Argument +row+ must be an \Array object or a CSV::Row object.
|
|
|
|
# - The output stream must be open for writing.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# ---
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# Append Arrays:
|
|
|
|
# CSV.generate do |csv|
|
|
|
|
# csv << ['foo', 0]
|
|
|
|
# csv << ['bar', 1]
|
|
|
|
# csv << ['baz', 2]
|
|
|
|
# end # => "foo,0\nbar,1\nbaz,2\n"
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# Append CSV::Rows:
|
|
|
|
# headers = []
|
|
|
|
# CSV.generate do |csv|
|
|
|
|
# csv << CSV::Row.new(headers, ['foo', 0])
|
|
|
|
# csv << CSV::Row.new(headers, ['bar', 1])
|
|
|
|
# csv << CSV::Row.new(headers, ['baz', 2])
|
|
|
|
# end # => "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
#
|
|
|
|
# Headers in CSV::Row objects are not appended:
|
|
|
|
# headers = ['Name', 'Count']
|
|
|
|
# CSV.generate do |csv|
|
|
|
|
# csv << CSV::Row.new(headers, ['foo', 0])
|
|
|
|
# csv << CSV::Row.new(headers, ['bar', 1])
|
|
|
|
# csv << CSV::Row.new(headers, ['baz', 2])
|
|
|
|
# end # => "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Raises an exception if +row+ is not an \Array or \CSV::Row:
|
|
|
|
# CSV.generate do |csv|
|
|
|
|
# # Raises NoMethodError (undefined method `collect' for :foo:Symbol)
|
|
|
|
# csv << :foo
|
|
|
|
# end
|
|
|
|
#
|
2020-07-19 20:42:28 -04:00
|
|
|
# Raises an exception if the output stream is not opened for writing:
|
2020-07-15 16:37:17 -04:00
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, '')
|
|
|
|
# File.open(path) do |file|
|
|
|
|
# CSV.open(file) do |csv|
|
|
|
|
# # Raises IOError (not opened for writing)
|
|
|
|
# csv << ['foo', 0]
|
|
|
|
# end
|
|
|
|
# end
|
2007-12-24 21:46:26 -05:00
|
|
|
def <<(row)
|
2018-12-23 02:00:35 -05:00
|
|
|
writer << row
|
|
|
|
self
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
alias_method :add_row, :<<
|
|
|
|
alias_method :puts, :<<
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2007-12-24 21:46:26 -05:00
|
|
|
# :call-seq:
|
2020-07-15 16:37:17 -04:00
|
|
|
# convert(converter_name) -> array_of_procs
|
|
|
|
# convert {|field, field_info| ... } -> array_of_procs
|
|
|
|
#
|
|
|
|
# - With no block, installs a field converter (a \Proc).
|
|
|
|
# - With a block, defines and installs a custom field converter.
|
|
|
|
# - Returns the \Array of installed field converters.
|
|
|
|
#
|
|
|
|
# - Argument +converter_name+, if given, should be the name
|
|
|
|
# of an existing field converter.
|
|
|
|
#
|
|
|
|
# See {Field Converters}[#class-CSV-label-Field+Converters].
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# With no block, installs a field converter:
|
|
|
|
# csv = CSV.new('')
|
|
|
|
# csv.convert(:integer)
|
|
|
|
# csv.convert(:float)
|
|
|
|
# csv.convert(:date)
|
|
|
|
# csv.converters # => [:integer, :float, :date]
|
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# The block, if given, is called for each field:
|
|
|
|
# - Argument +field+ is the field value.
|
|
|
|
# - Argument +field_info+ is a CSV::FieldInfo object
|
|
|
|
# containing details about the field.
|
|
|
|
#
|
|
|
|
# The examples here assume the prior execution of:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
#
|
|
|
|
# Example giving a block:
|
|
|
|
# csv = CSV.open(path)
|
|
|
|
# csv.convert {|field, field_info| p [field, field_info]; field.upcase }
|
|
|
|
# csv.read # => [["FOO", "0"], ["BAR", "1"], ["BAZ", "2"]]
|
|
|
|
#
|
|
|
|
# Output:
|
|
|
|
# ["foo", #<struct CSV::FieldInfo index=0, line=1, header=nil>]
|
|
|
|
# ["0", #<struct CSV::FieldInfo index=1, line=1, header=nil>]
|
|
|
|
# ["bar", #<struct CSV::FieldInfo index=0, line=2, header=nil>]
|
|
|
|
# ["1", #<struct CSV::FieldInfo index=1, line=2, header=nil>]
|
|
|
|
# ["baz", #<struct CSV::FieldInfo index=0, line=3, header=nil>]
|
|
|
|
# ["2", #<struct CSV::FieldInfo index=1, line=3, header=nil>]
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# The block need not return a \String object:
|
|
|
|
# csv = CSV.open(path)
|
|
|
|
# csv.convert {|field, field_info| field.to_sym }
|
|
|
|
# csv.read # => [[:foo, :"0"], [:bar, :"1"], [:baz, :"2"]]
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# If +converter_name+ is given, the block is not called:
|
|
|
|
# csv = CSV.open(path)
|
|
|
|
# csv.convert(:integer) {|field, field_info| fail 'Cannot happen' }
|
|
|
|
# csv.read # => [["foo", 0], ["bar", 1], ["baz", 2]]
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Raises a parse-time exception if +converter_name+ is not the name of a built-in
|
|
|
|
# field converter:
|
|
|
|
# csv = CSV.open(path)
|
|
|
|
# csv.convert(:nosuch) => [nil]
|
|
|
|
# # Raises NoMethodError (undefined method `arity' for nil:NilClass)
|
|
|
|
# csv.read
|
2007-12-24 21:46:26 -05:00
|
|
|
def convert(name = nil, &converter)
|
2019-04-14 17:01:51 -04:00
|
|
|
parser_fields_converter.add_converter(name, &converter)
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
|
|
|
|
# :call-seq:
|
2020-07-15 16:37:17 -04:00
|
|
|
# header_convert(converter_name) -> array_of_procs
|
|
|
|
# header_convert {|header, field_info| ... } -> array_of_procs
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# - With no block, installs a header converter (a \Proc).
|
|
|
|
# - With a block, defines and installs a custom header converter.
|
|
|
|
# - Returns the \Array of installed header converters.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# - Argument +converter_name+, if given, should be the name
|
|
|
|
# of an existing header converter.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-15 16:37:17 -04:00
|
|
|
# See {Header Converters}[#class-CSV-label-Header+Converters].
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# With no block, installs a header converter:
|
|
|
|
# csv = CSV.new('')
|
|
|
|
# csv.header_convert(:symbol)
|
|
|
|
# csv.header_convert(:downcase)
|
|
|
|
# csv.header_converters # => [:symbol, :downcase]
|
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# The block, if given, is called for each header:
|
|
|
|
# - Argument +header+ is the header value.
|
|
|
|
# - Argument +field_info+ is a CSV::FieldInfo object
|
|
|
|
# containing details about the header.
|
|
|
|
#
|
|
|
|
# The examples here assume the prior execution of:
|
|
|
|
# string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
#
|
|
|
|
# Example giving a block:
|
|
|
|
# csv = CSV.open(path, headers: true)
|
|
|
|
# csv.header_convert {|header, field_info| p [header, field_info]; header.upcase }
|
|
|
|
# table = csv.read
|
|
|
|
# table # => #<CSV::Table mode:col_or_row row_count:4>
|
|
|
|
# table.headers # => ["NAME", "VALUE"]
|
|
|
|
#
|
|
|
|
# Output:
|
|
|
|
# ["Name", #<struct CSV::FieldInfo index=0, line=1, header=nil>]
|
|
|
|
# ["Value", #<struct CSV::FieldInfo index=1, line=1, header=nil>]
|
|
|
|
|
|
|
|
# The block need not return a \String object:
|
|
|
|
# csv = CSV.open(path, headers: true)
|
|
|
|
# csv.header_convert {|header, field_info| header.to_sym }
|
|
|
|
# table = csv.read
|
|
|
|
# table.headers # => [:Name, :Value]
|
|
|
|
#
|
|
|
|
# If +converter_name+ is given, the block is not called:
|
|
|
|
# csv = CSV.open(path, headers: true)
|
|
|
|
# csv.header_convert(:downcase) {|header, field_info| fail 'Cannot happen' }
|
|
|
|
# table = csv.read
|
|
|
|
# table.headers # => ["name", "value"]
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Raises a parse-time exception if +converter_name+ is not the name of a built-in
|
|
|
|
# field converter:
|
|
|
|
# csv = CSV.open(path, headers: true)
|
|
|
|
# csv.header_convert(:nosuch)
|
|
|
|
# # Raises NoMethodError (undefined method `arity' for nil:NilClass)
|
|
|
|
# csv.read
|
2007-12-24 21:46:26 -05:00
|
|
|
def header_convert(name = nil, &converter)
|
2018-12-23 02:00:35 -05:00
|
|
|
header_fields_converter.add_converter(name, &converter)
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2007-12-24 21:46:26 -05:00
|
|
|
include Enumerable
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-07-19 20:42:28 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.each -> enumerator
|
|
|
|
# csv.each {|row| ...}
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-19 20:42:28 -04:00
|
|
|
# Calls the block with each successive row.
|
|
|
|
# The data source must be opened for reading.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-19 20:42:28 -04:00
|
|
|
# Without headers:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# csv = CSV.new(string)
|
|
|
|
# csv.each do |row|
|
|
|
|
# p row
|
|
|
|
# end
|
|
|
|
# Output:
|
|
|
|
# ["foo", "0"]
|
|
|
|
# ["bar", "1"]
|
|
|
|
# ["baz", "2"]
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-19 20:42:28 -04:00
|
|
|
# With headers:
|
|
|
|
# string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
|
|
|
|
# csv = CSV.new(string, headers: true)
|
|
|
|
# csv.each do |row|
|
|
|
|
# p row
|
|
|
|
# end
|
|
|
|
# Output:
|
|
|
|
# <CSV::Row "Name":"foo" "Value":"0">
|
|
|
|
# <CSV::Row "Name":"bar" "Value":"1">
|
|
|
|
# <CSV::Row "Name":"baz" "Value":"2">
|
|
|
|
#
|
|
|
|
# ---
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-19 20:42:28 -04:00
|
|
|
# Raises an exception if the source is not opened for reading:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# csv = CSV.new(string)
|
|
|
|
# csv.close
|
|
|
|
# # Raises IOError (not opened for reading)
|
|
|
|
# csv.each do |row|
|
|
|
|
# p row
|
|
|
|
# end
|
2018-12-23 02:00:35 -05:00
|
|
|
def each(&block)
|
2019-04-14 17:01:51 -04:00
|
|
|
parser_enumerator.each(&block)
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-06-26 17:29:57 -04:00
|
|
|
# :call-seq:
|
2020-07-19 20:42:28 -04:00
|
|
|
# csv.read -> array or csv_table
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-26 17:29:57 -04:00
|
|
|
# Forms the remaining rows from +self+ into:
|
|
|
|
# - A CSV::Table object, if headers are in use.
|
2020-07-19 20:42:28 -04:00
|
|
|
# - An \Array of Arrays, otherwise.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-19 20:42:28 -04:00
|
|
|
# The data source must be opened for reading.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-06-26 17:29:57 -04:00
|
|
|
# Without headers:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
# csv = CSV.open(path)
|
|
|
|
# csv.read # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
|
|
|
|
#
|
|
|
|
# With headers:
|
|
|
|
# string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
|
|
|
|
# path = 't.csv'
|
|
|
|
# File.write(path, string)
|
|
|
|
# csv = CSV.open(path, headers: true)
|
|
|
|
# csv.read # => #<CSV::Table mode:col_or_row row_count:4>
|
2020-07-19 20:42:28 -04:00
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Raises an exception if the source is not opened for reading:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# csv = CSV.new(string)
|
|
|
|
# csv.close
|
|
|
|
# # Raises IOError (not opened for reading)
|
|
|
|
# csv.read
|
2007-12-24 21:46:26 -05:00
|
|
|
def read
|
|
|
|
rows = to_a
|
2019-01-25 01:49:59 -05:00
|
|
|
if parser.use_headers?
|
|
|
|
Table.new(rows, headers: parser.headers)
|
2007-12-24 21:46:26 -05:00
|
|
|
else
|
|
|
|
rows
|
|
|
|
end
|
|
|
|
end
|
|
|
|
alias_method :readlines, :read
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-07-19 20:42:28 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.header_row? -> true or false
|
|
|
|
#
|
|
|
|
# Returns +true+ if the next row to be read is a header row\;
|
|
|
|
# +false+ otherwise.
|
|
|
|
#
|
|
|
|
# Without headers:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# csv = CSV.new(string)
|
|
|
|
# csv.header_row? # => false
|
|
|
|
#
|
|
|
|
# With headers:
|
|
|
|
# string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
|
|
|
|
# csv = CSV.new(string, headers: true)
|
|
|
|
# csv.header_row? # => true
|
|
|
|
# csv.shift # => #<CSV::Row "Name":"foo" "Value":"0">
|
|
|
|
# csv.header_row? # => false
|
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Raises an exception if the source is not opened for reading:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# csv = CSV.new(string)
|
|
|
|
# csv.close
|
|
|
|
# # Raises IOError (not opened for reading)
|
|
|
|
# csv.header_row?
|
2007-12-24 21:46:26 -05:00
|
|
|
def header_row?
|
2018-12-23 02:00:35 -05:00
|
|
|
parser.header_row?
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-07-19 20:42:28 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.shift -> array, csv_row, or nil
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-19 20:42:28 -04:00
|
|
|
# Returns the next row of data as:
|
|
|
|
# - An \Array if no headers are used.
|
|
|
|
# - A CSV::Row object if headers are used.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-19 20:42:28 -04:00
|
|
|
# The data source must be opened for reading.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-19 20:42:28 -04:00
|
|
|
# Without headers:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# csv = CSV.new(string)
|
|
|
|
# csv.shift # => ["foo", "0"]
|
|
|
|
# csv.shift # => ["bar", "1"]
|
|
|
|
# csv.shift # => ["baz", "2"]
|
|
|
|
# csv.shift # => nil
|
|
|
|
#
|
|
|
|
# With headers:
|
|
|
|
# string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
|
|
|
|
# csv = CSV.new(string, headers: true)
|
|
|
|
# csv.shift # => #<CSV::Row "Name":"foo" "Value":"0">
|
|
|
|
# csv.shift # => #<CSV::Row "Name":"bar" "Value":"1">
|
|
|
|
# csv.shift # => #<CSV::Row "Name":"baz" "Value":"2">
|
|
|
|
# csv.shift # => nil
|
|
|
|
#
|
|
|
|
# ---
|
|
|
|
#
|
|
|
|
# Raises an exception if the source is not opened for reading:
|
|
|
|
# string = "foo,0\nbar,1\nbaz,2\n"
|
|
|
|
# csv = CSV.new(string)
|
|
|
|
# csv.close
|
|
|
|
# # Raises IOError (not opened for reading)
|
|
|
|
# csv.shift
|
2007-12-24 21:46:26 -05:00
|
|
|
def shift
|
2019-04-17 09:02:40 -04:00
|
|
|
if @eof_error
|
|
|
|
eof_error, @eof_error = @eof_error, nil
|
|
|
|
raise eof_error
|
|
|
|
end
|
2018-12-23 02:00:35 -05:00
|
|
|
begin
|
2019-04-14 17:01:51 -04:00
|
|
|
parser_enumerator.next
|
2018-12-23 02:00:35 -05:00
|
|
|
rescue StopIteration
|
|
|
|
nil
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
end
|
|
|
|
alias_method :gets, :shift
|
|
|
|
alias_method :readline, :shift
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2020-07-19 20:42:28 -04:00
|
|
|
# :call-seq:
|
|
|
|
# csv.inspect -> string
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2020-07-19 20:42:28 -04:00
|
|
|
# Returns a \String showing certain properties of +self+:
|
|
|
|
# string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
|
|
|
|
# csv = CSV.new(string, headers: true)
|
|
|
|
# s = csv.inspect
|
|
|
|
# s # => "#<CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:\",\" row_sep:\"\\n\" quote_char:\"\\\"\" headers:true>"
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
def inspect
|
2019-07-25 03:39:28 -04:00
|
|
|
str = ["#<", self.class.to_s, " io_type:"]
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
# show type of wrapped IO
|
|
|
|
if @io == $stdout then str << "$stdout"
|
|
|
|
elsif @io == $stdin then str << "$stdin"
|
|
|
|
elsif @io == $stderr then str << "$stderr"
|
|
|
|
else str << @io.class.to_s
|
|
|
|
end
|
|
|
|
# show IO.path(), if available
|
|
|
|
if @io.respond_to?(:path) and (p = @io.path)
|
|
|
|
str << " io_path:" << p.inspect
|
|
|
|
end
|
|
|
|
# show encoding
|
|
|
|
str << " encoding:" << @encoding.name
|
|
|
|
# show other attributes
|
2018-12-23 02:00:35 -05:00
|
|
|
["lineno", "col_sep", "row_sep", "quote_char"].each do |attr_name|
|
|
|
|
if a = __send__(attr_name)
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
str << " " << attr_name << ":" << a.inspect
|
|
|
|
end
|
|
|
|
end
|
2018-12-23 02:00:35 -05:00
|
|
|
["skip_blanks", "liberal_parsing"].each do |attr_name|
|
|
|
|
if a = __send__("#{attr_name}?")
|
|
|
|
str << " " << attr_name << ":" << a.inspect
|
|
|
|
end
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
end
|
2018-12-23 02:00:35 -05:00
|
|
|
_headers = headers
|
|
|
|
str << " headers:" << _headers.inspect if _headers
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
str << ">"
|
2008-12-26 11:53:13 -05:00
|
|
|
begin
|
2010-12-25 01:58:58 -05:00
|
|
|
str.join('')
|
2008-12-26 11:53:13 -05:00
|
|
|
rescue # any encoding error
|
|
|
|
str.map do |s|
|
|
|
|
e = Encoding::Converter.asciicompat_encoding(s.encoding)
|
|
|
|
e ? s.encode(e) : s.force_encoding("ASCII-8BIT")
|
2010-12-25 01:58:58 -05:00
|
|
|
end.join('')
|
2008-12-26 11:53:13 -05:00
|
|
|
end
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2007-12-24 21:46:26 -05:00
|
|
|
private
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2018-05-09 00:39:16 -04:00
|
|
|
def determine_encoding(encoding, internal_encoding)
|
|
|
|
# honor the IO encoding if we can, otherwise default to ASCII-8BIT
|
2018-12-23 02:00:35 -05:00
|
|
|
io_encoding = raw_encoding
|
2018-05-09 00:39:16 -04:00
|
|
|
return io_encoding if io_encoding
|
|
|
|
|
|
|
|
return Encoding.find(internal_encoding) if internal_encoding
|
|
|
|
|
|
|
|
if encoding
|
|
|
|
encoding, = encoding.split(":", 2) if encoding.is_a?(String)
|
|
|
|
return Encoding.find(encoding)
|
|
|
|
end
|
|
|
|
|
|
|
|
Encoding.default_internal || Encoding.default_external
|
|
|
|
end
|
|
|
|
|
2018-12-23 02:00:35 -05:00
|
|
|
def normalize_converters(converters)
|
|
|
|
converters ||= []
|
|
|
|
unless converters.is_a?(Array)
|
|
|
|
converters = [converters]
|
2012-08-20 16:52:36 -04:00
|
|
|
end
|
2018-12-23 02:00:35 -05:00
|
|
|
converters.collect do |converter|
|
|
|
|
case converter
|
|
|
|
when Proc # custom code block
|
|
|
|
[nil, converter]
|
|
|
|
else # by name
|
|
|
|
[converter, nil]
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
end
|
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
# Processes +fields+ with <tt>@converters</tt>, or <tt>@header_converters</tt>
|
2019-10-12 01:03:21 -04:00
|
|
|
# if +headers+ is passed as +true+, returning the converted field set. Any
|
2007-12-24 21:46:26 -05:00
|
|
|
# converter that changes the field into something other than a String halts
|
2019-10-12 01:03:21 -04:00
|
|
|
# the pipeline of conversion for that field. This is primarily an efficiency
|
2007-12-24 21:46:26 -05:00
|
|
|
# shortcut.
|
2009-03-05 22:56:38 -05:00
|
|
|
#
|
2007-12-24 21:46:26 -05:00
|
|
|
def convert_fields(fields, headers = false)
|
2018-05-09 00:39:16 -04:00
|
|
|
if headers
|
2018-12-23 02:00:35 -05:00
|
|
|
header_fields_converter.convert(fields, nil, 0)
|
2018-05-09 00:39:16 -04:00
|
|
|
else
|
2019-04-14 17:01:51 -04:00
|
|
|
parser_fields_converter.convert(fields, @headers, lineno)
|
2018-05-09 00:39:16 -04:00
|
|
|
end
|
2018-12-23 02:00:35 -05:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2018-12-23 02:00:35 -05:00
|
|
|
#
|
|
|
|
# Returns the encoding of the internal IO object.
|
|
|
|
#
|
|
|
|
def raw_encoding
|
|
|
|
if @io.respond_to? :internal_encoding
|
|
|
|
@io.internal_encoding || @io.external_encoding
|
|
|
|
elsif @io.respond_to? :encoding
|
|
|
|
@io.encoding
|
|
|
|
else
|
|
|
|
nil
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2019-04-14 17:01:51 -04:00
|
|
|
def parser_fields_converter
|
|
|
|
@parser_fields_converter ||= build_parser_fields_converter
|
2018-12-23 02:00:35 -05:00
|
|
|
end
|
|
|
|
|
2019-04-14 17:01:51 -04:00
|
|
|
def build_parser_fields_converter
|
2018-12-23 02:00:35 -05:00
|
|
|
specific_options = {
|
2021-09-15 02:58:57 -04:00
|
|
|
builtin_converters_name: :Converters,
|
2018-12-23 02:00:35 -05:00
|
|
|
}
|
|
|
|
options = @base_fields_converter_options.merge(specific_options)
|
2019-04-14 17:01:51 -04:00
|
|
|
build_fields_converter(@initial_converters, options)
|
2018-12-23 02:00:35 -05:00
|
|
|
end
|
2007-12-24 21:46:26 -05:00
|
|
|
|
2018-12-23 02:00:35 -05:00
|
|
|
def header_fields_converter
|
|
|
|
@header_fields_converter ||= build_header_fields_converter
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2018-12-23 02:00:35 -05:00
|
|
|
def build_header_fields_converter
|
|
|
|
specific_options = {
|
2021-09-15 02:58:57 -04:00
|
|
|
builtin_converters_name: :HeaderConverters,
|
2018-12-23 02:00:35 -05:00
|
|
|
accept_nil: true,
|
|
|
|
}
|
|
|
|
options = @base_fields_converter_options.merge(specific_options)
|
2019-04-14 17:01:51 -04:00
|
|
|
build_fields_converter(@initial_header_converters, options)
|
|
|
|
end
|
|
|
|
|
|
|
|
def writer_fields_converter
|
|
|
|
@writer_fields_converter ||= build_writer_fields_converter
|
|
|
|
end
|
|
|
|
|
|
|
|
def build_writer_fields_converter
|
|
|
|
build_fields_converter(@initial_write_converters,
|
|
|
|
@write_fields_converter_options)
|
|
|
|
end
|
|
|
|
|
|
|
|
def build_fields_converter(initial_converters, options)
|
2018-12-23 02:00:35 -05:00
|
|
|
fields_converter = FieldsConverter.new(options)
|
2019-04-14 17:01:51 -04:00
|
|
|
normalize_converters(initial_converters).each do |name, converter|
|
2018-12-23 02:00:35 -05:00
|
|
|
fields_converter.add_converter(name, &converter)
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
2018-12-23 02:00:35 -05:00
|
|
|
fields_converter
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2018-12-23 02:00:35 -05:00
|
|
|
def parser
|
|
|
|
@parser ||= Parser.new(@io, parser_options)
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2018-12-23 02:00:35 -05:00
|
|
|
def parser_options
|
2019-04-14 17:01:51 -04:00
|
|
|
@parser_options.merge(header_fields_converter: header_fields_converter,
|
|
|
|
fields_converter: parser_fields_converter)
|
|
|
|
end
|
|
|
|
|
|
|
|
def parser_enumerator
|
|
|
|
@parser_enumerator ||= parser.parse
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
end
|
2009-03-05 22:56:38 -05:00
|
|
|
|
2018-12-23 02:00:35 -05:00
|
|
|
def writer
|
|
|
|
@writer ||= Writer.new(@io, writer_options)
|
* lib/csv/csv.rb: Reworked CSV's parser and generator to be m17n. Data
is now parsed in the Encoding it is in without need for translation.
* lib/csv/csv.rb: Improved inspect() messages for better IRb support.
* lib/csv/csv.rb: Fixed header writing bug reported by Dov Murik.
* lib/csv/csv.rb: Use custom separators in parsing header Strings as
suggested by Shmulik Regev.
* lib/csv/csv.rb: Added a :write_headers option for outputting headers.
* lib/csv/csv.rb: Handle open() calls in binary mode whenever we can to
workaround a Windows issue where line-ending translation can cause an
off-by-one error in seeking back to a non-zero starting position after
auto-discovery for :row_sep as suggested by Robert Battle.
* lib/csv/csv.rb: Improved the parser to fail faster when fed some forms
of invalid CSV that can be detected without reading ahead.
* lib/csv/csv.rb: Added a :field_size_limit option to control CSV's
lookahead and prevent the parser from biting off more data than
it can chew.
* lib/csv/csv.rb: Added readers for CSV attributes: col_sep(), row_sep(),
quote_char(), field_size_limit(), converters(), unconverted_fields?(),
headers(), return_headers?(), write_headers?(), header_converters(),
skip_blanks?(), and force_quotes?().
* lib/csv/csv.rb: Cleaned up code syntax to be more inline with
Ruby 1.9 than 1.8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-20 20:39:03 -04:00
|
|
|
end
|
|
|
|
|
2018-12-23 02:00:35 -05:00
|
|
|
def writer_options
|
2019-04-14 17:01:51 -04:00
|
|
|
@writer_options.merge(header_fields_converter: header_fields_converter,
|
|
|
|
fields_converter: writer_fields_converter)
|
2009-10-15 14:19:15 -04:00
|
|
|
end
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
|
2012-09-19 18:07:44 -04:00
|
|
|
# Passes +args+ to CSV::instance.
|
|
|
|
#
|
|
|
|
# CSV("CSV,data").read
|
|
|
|
# #=> [["CSV", "data"]]
|
|
|
|
#
|
|
|
|
# If a block is given, the instance is passed the block and the return value
|
|
|
|
# becomes the return value of the block.
|
|
|
|
#
|
|
|
|
# CSV("CSV,data") { |c|
|
|
|
|
# c.read.any? { |a| a.include?("data") }
|
|
|
|
# } #=> true
|
|
|
|
#
|
|
|
|
# CSV("CSV,data") { |c|
|
|
|
|
# c.read.any? { |a| a.include?("zombies") }
|
2012-09-20 03:06:06 -04:00
|
|
|
# } #=> false
|
2012-09-19 18:07:44 -04:00
|
|
|
#
|
2021-05-10 20:41:26 -04:00
|
|
|
# CSV options may also be given.
|
|
|
|
#
|
|
|
|
# io = StringIO.new
|
|
|
|
# CSV(io, col_sep: ";") { |csv| csv << ["a", "b", "c"] }
|
|
|
|
#
|
2021-10-10 22:21:42 -04:00
|
|
|
# This API is not Ractor-safe.
|
|
|
|
#
|
2021-05-10 20:41:26 -04:00
|
|
|
def CSV(*args, **options, &block)
|
|
|
|
CSV.instance(*args, **options, &block)
|
2007-12-24 21:46:26 -05:00
|
|
|
end
|
|
|
|
|
2018-05-09 00:39:16 -04:00
|
|
|
require_relative "csv/version"
|
|
|
|
require_relative "csv/core_ext/array"
|
|
|
|
require_relative "csv/core_ext/string"
|