Module: RedAmber::DataFrameLoadSave::ClassMethods

Defined in:
lib/red_amber/data_frame_loadsave.rb

Overview

Enable ‘self.load` as class method of DataFrame

Instance Method Summary collapse

Instance Method Details

#load(input, format: nil, compression: nil, schema: nil, skip_lines: nil) ⇒ DataFrame

Load DataFrame via Arrow::Table.load.

Format is automatically detected by extension.

Examples:

Load a tsv file

DataFrame.load("file.tsv")

Load a csv.gz file

DataFrame.load("file.csv.gz")

Load from URI

DataFrame.load(URI("https://some_uri/file.csv"))

Load from a Buffer

DataFrame.load(Arrow::Buffer.new(<<~BUFFER), format: :csv)
  name,age
  Yasuko,68
  Rui,49
  Hinata,28
BUFFER

Load from a Buffer skipping comment line

DataFrame.load(Arrow::Buffer.new(<<~BUFFER), format: :csv, skip_lines: /\A#/)
  # comment
  name,age
  Yasuko,68
  Rui,49
  Hinata,28
BUFFER

Parameters:

  • input (path)

    source path.

  • format (:arrow_file, :batch, :arrows, :arrow_stream, :stream, :csv, :tsv) (defaults to: nil)

    format specifier.

  • compression (:gzip, nil) (defaults to: nil)

    compression type.

  • schema (Arrow::Schema) (defaults to: nil)

    schema of table.

  • skip_lines (Regexp) (defaults to: nil)

    pattern of rows to skip.

Returns:



55
56
57
# File 'lib/red_amber/data_frame_loadsave.rb', line 55

def load(input, **options)
  DataFrame.new(Arrow::Table.load(input, options))
end