This class provides a complete interface to CSV files and data. It offers tools to enable you to read and write to and from Strings or IO objects, as needed.
Reading
From a File
A Line at a Time
CSV.foreach("path/to/file.csv") do |row| # use row here... end
All at Once
arr_of_arrs = CSV.read("path/to/file.csv")
From a String
A Line at a Time
CSV.parse("CSV,data,String") do |row| # use row here... end
All at Once
arr_of_arrs = CSV.parse("CSV,data,String")
Writing
To a File
CSV.open("path/to/file.csv", "wb") do |csv| csv << ["row", "of", "CSV", "data"] csv << ["another", "row"] # ... end
To a String
csv_string = CSV.generate do |csv| csv << ["row", "of", "CSV", "data"] csv << ["another", "row"] # ... end
Convert a Single Line
csv_string = ["CSV", "data"].to_csv # to CSV csv_array = "CSV,String".parse_csv # from CSV
Shortcut Interface
CSV { |csv_out| csv_out << %w{my data here} } # to $stdout CSV(csv = "") { |csv_str| csv_str << %w{my data here} } # to a String CSV($stderr) { |csv_err| csv_err << %w{my data here} } # to $stderr CSV($stdin) { |csv_in| csv_in.each { |row| p row } } # from $stdin
Advanced Usage
Wrap an IO Object
csv = CSV.new(io, options) # ... read (with gets() or each()) from and write (with <<) to csv here ...
CSV and Character Encodings (M17n or Multilingualization)
This new CSV parser is m17n savvy. The parser works in the Encoding of the IO or String object being read from or written to. Your data is never transcoded (unless you ask Ruby to transcode it for you) and will literally be parsed in the Encoding it is in. Thus CSV will return Arrays or Rows of Strings in the Encoding of your data. This is accomplished by transcoding the parser itself into your Encoding.
Some transcoding must take place, of course, to accomplish this multiencoding support. For example, :col_sep, :row_sep, and :quote_char must be transcoded to match your data. Hopefully this makes the entire process feel transparent, since CSV’s defaults should just magically work for you data. However, you can set these values manually in the target Encoding to avoid the translation.
It’s also important to note that while all of CSV’s core parser is now Encoding agnostic, some features are not. For example, the built-in converters will try to transcode data to UTF-8 before making conversions. Again, you can provide custom converters that are aware of your Encodings to avoid this translation. It’s just too hard for me to support native conversions in all of Ruby’s Encodings.
Anyway, the practical side of this is simple: make sure IO and String objects passed into CSV have the proper Encoding set and everything should just work. CSV methods that allow you to open IO objects (CSV::foreach(), CSV::open(), CSV::read(), and CSV::readlines()) do allow you to specify the Encoding.
One minor exception comes when generating CSV into a String with an Encoding that is not ASCII compatible. There’s no existing data for CSV to use to prepare itself and thus you will probably need to manually specify the desired Encoding for most of those cases. It will try to guess using the fields in a row of output though, when using CSV::generate_line() or Array#to_csv().
I try to point out any other Encoding issues in the documentation of methods as they come up.
This has been tested to the best of my ability with all non-“dummy” Encodings Ruby ships with. However, it is brave new code and may have some bugs. Please feel free to report any issues you find with it.
Constants
VERSION = "2.4.8".freeze
FieldInfo = Struct.new(:index, :line, :header)
DateMatcher = / \A(?: (\w+,?\s+)?\w+\s+\d{1,2},?\s+\d{2,4} | \d{4}-\d{2}-\d{2} )\z /x
DateTimeMatcher = / \A(?: (\w+,?\s+)?\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2},?\s+\d{2,4} | \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} )\z /x
ConverterEncoding = Encoding.find("UTF-8")
Converters = { integer: lambda { |f| Integer(f.encode(ConverterEncoding)) rescue f }, float: lambda { |f| Float(f.encode(ConverterEncoding)) rescue f }, numeric: [:integer, :float], date: lambda { |f| begin e = f.encode(ConverterEncoding) e =~ DateMatcher ? Date.parse(e) : f rescue # encoding conversion or date parse errors f end }, date_time: lambda { |f| begin e = f.encode(ConverterEncoding) e =~ DateTimeMatcher ? DateTime.parse(e) : f rescue # encoding conversion or date parse errors f end }, all: [:date_time, :numeric] }
HeaderConverters = { downcase: lambda { |h| h.encode(ConverterEncoding).downcase }, symbol: lambda { |h| h.encode(ConverterEncoding).downcase.gsub(/\s+/, "_"). gsub(/\W+/, "").to_sym } }
DEFAULT_OPTIONS = { col_sep: ",", row_sep: :auto, quote_char: '"', field_size_limit: nil, converters: nil, unconverted_fields: nil, headers: false, return_headers: false, header_converters: nil, skip_blanks: false, force_quotes: false }.freeze
Attributes
[R] | col_sep |
The encoded :col_sep used in parsing and writing. See CSV::new for details. |
[R] | row_sep |
The encoded :row_sep used in parsing and writing. See CSV::new for details. |
[R] | quote_char |
The encoded :quote_char used in parsing and writing. See CSV::new for details. |
[R] | field_size_limit |
The limit for field size, if any. See CSV::new for details. |
[R] | encoding |
The Encoding CSV is parsing or writing in. This will be the Encoding you receive parsed data in and/or the Encoding data will be written in. |
[R] | lineno |
The line number of the last row read from this file. Fields with nested line-end characters will not affect this count. |