Class: RedAmber::DataFrame

Inherits:

Object

Object
RedAmber::DataFrame

show all

Includes:: DataFrameCombinable, DataFrameDisplayable, DataFrameIndexable, DataFrameLoadSave, DataFrameReshaping, DataFrameSelectable, DataFrameVariableOperation, Helper

Defined in:: lib/red_amber/data_frame.rb

Overview

Class to represent a data frame. Variable @table holds an Arrow::Table object.

Instance Attribute Summary collapse

#table ⇒ Arrow::Table (also: #to_arrow) readonly

Returns the table having within.

Class Method Summary collapse

.create(table) ⇒ DataFrame

Quicker DataFrame constructor from a ‘Arrow::Table`.
.new_dataframe_with_schema(dataframe_for_schema, dataframe_for_value) ⇒ DataFrame

Return new DataFrame for specified schema and value.

Instance Method Summary collapse

#==(other) ⇒ true, false

Compare DataFrames.
#build_subframes(subset_specifier = nil, &block) ⇒ Object

Generic builder of sub-dataframes from self.
#each_row ⇒ Object

Enumerate for each row.
#empty? ⇒ true, false

Check if it is a empty DataFrame.
#group(*group_keys, &block) ⇒ Object

Create a Group object.
#initialize(*args) ⇒ DataFrame constructor

Creates a new DataFrame.
#key?(key) ⇒ Boolean (also: #has_key?)

Returns true if self has a specified key in the argument.
#key_index(key) ⇒ Integer (also: #find_index, #index)

Returns index of specified key in the Array keys.
#keys ⇒ Array (also: #column_names, #var_names)

Returns an Array of keys.
#method_missing(name, *args, &block) ⇒ Object

Catch variable (column) key as method name.
#n_keys ⇒ Integer (also: #n_variables, #n_vars, #n_cols)

Returns the number of variables (columns).
#propagate(scalar = nil, &block) ⇒ Object

Returns a Vector such that all elements have value ‘scalar` and have same size as self.
#respond_to_missing?(name, include_private) ⇒ Boolean

Catch variable (column) key as method name.
#schema ⇒ Hash

Returns column name and data type in a Hash.
#shape ⇒ Array

Returns the numbers of rows and columns.
#size ⇒ Integer (also: #n_records, #n_obs, #n_rows)

Returns the number of records (rows).
#sub_by_enum(enumerator_method, *args) ⇒ SubFrames (also: #subframes_by_enum)

Create SubFrames by Grouping/Windowing by posion from a enumrator method.
#sub_by_kernel(kernel, step: 1) ⇒ SubFrames (also: #subframes_by_kernel)

Create SubFrames by windowing with a kernel (i.e. masked window) and step.
#sub_by_value(*keys) ⇒ SubFrames (also: #subframes_by_value, #sub_group)

Create SubFrames by value grouping.
#sub_by_window(from: 0, size: nil, step: 1) ⇒ SubFrames (also: #subframes_by_window)

Create SubFrames by Windowing with ‘from`, `size` and `step`.
#to_a ⇒ Array (also: #raw_records)

Returns a row-oriented array without header.
#to_h ⇒ Hash

Returns column-oriented data in a Hash.
#to_rover ⇒ Rover::DataFrame

Returns self in a ‘Rover::DataFrame`.
#type_classes ⇒ Array

Returns an Array of Classes of data type.
#types ⇒ Array

Returns abbreviated type names in an Array.
#variables ⇒ Hash (also: #vars)

Returns a Hash of key and Vector pairs in the columns.
#vectors ⇒ Array

Returns Vectors in an Array.

Constructor Details

#initialize(hash) ⇒ `DataFrame` #initialize(table) ⇒ `DataFrame` #initialize(schama, row_oriented_array) ⇒ `DataFrame` #initialize(arrowable) ⇒ `DataFrame` #initialize(rover_like) ⇒ `DataFrame` #initialize ⇒ `DataFrame` #initialize(empty) ⇒ `DataFrame`

Creates a new DataFrame.

Overloads:

#initialize(hash) ⇒ DataFrame

Initialize a DataFrame by a Hash.
Examples:

Initialize by a Hash
```
hash = { x: [1, 2, 3], y: %w[A B C] }
DataFrame.new(hash)
```
Initialize by a Hash like arguments.
```
DataFrame.new(x: [1, 2, 3], y: %w[A B C])
```
Initialize from #to_arrow_array responsibles.
```
# #to_arrow_array responsible `array-like` is also available.
require 'arrow-numo-narray'
DataFrame.new(numo: Numo::DFloat.new(3).rand)
```
Parameters:
- hash (Hash<key => <Array, Arrow::Array, #to_arrow_array>>) —
  
  a Hash of ‘key` with array-like for column values. `key`s are Symbol or String.
#initialize(table) ⇒ DataFrame

Initialize a DataFrame by an ‘Arrow::Table`.
Examples:

Initialize by a Table
```
table = Arrow::Table.new(x: [1, 2, 3], y: %w[A B C])
DataFrame.new(table)
```
Parameters:
- table (Arrow::Table) —
  
  a table to have in the DataFrame.
#initialize(schama, row_oriented_array) ⇒ DataFrame

Initialize a DataFrame by schema and row_oriented_array.
Examples:

Initialize by a schema and a row_oriented_array.
```
schema = { x: :uint8, y: :string }
row_oriented_array = [[1, 'A'], [2, 'B'], [3, 'C']]
DataFrame.new(schema, row_oriented_array)
```
Parameters:
- schema (Hash<key => type>) —
  
  a schema of key and data type.
- row_oriented_array (Array) —
  
  an Array of rows.
#initialize(arrowable) ⇒ DataFrame

Note:

‘RedAmber::DataFrame` itself is readable by this.

Note:

Hash is refined to respond to ‘#to_arrow` in this class.

Initialize DataFrame by a ‘#to_arrow` responsible object.
Examples:

Initialize by Red Dataset object.
```
require 'datasets-arrow'
dataset = Datasets::Penguins.new
penguins = DataFrame.new(dataset)
```
Parameters:
- arrowable (#to_arrow) —
  
  Any object which responds to ‘#to_arrow`. `#to_arrow` must return `Arrow::Table`.
Since:
- 0.2.2
#initialize(rover_like) ⇒ DataFrame

Note:

‘Rover::DataFrame` is readable by this.

Initialize DataFrame by a ‘Rover::DataFrame`-like `#to_h` responsible object.
Parameters:
- rover_like (#to_h) —
  
  Any object which responds to ‘#to_h`. `#to_h` must return a Hash which is convertable by `Arrow::Table.new`.
#initialize ⇒ DataFrame

Create empty DataFrame
Examples:
```
DataFrame.new
```
#initialize(empty) ⇒ DataFrame

Create empty DataFrame
Examples:

Return empty DataFrame.
```
DataFrame.new([])
DataFrame.new({})
DataFrame.new(nil)
```
Parameters:
- empty (nil, [], {})

# File 'lib/red_amber/data_frame.rb', line 134

def initialize(*args)
  case args
  in nil | [nil] | [] | {} | [[]] | [{}]
    @table = Arrow::Table.new({}, [])
  in [Arrow::Table => table]
    @table = table
  in [arrowable] if arrowable.respond_to?(:to_arrow)
    table = arrowable.to_arrow
    unless table.is_a?(Arrow::Table)
      raise DataFrameTypeError,
            "to_arrow must return an Arrow::Table but #{table.class}: #{arrowable}"
    end
    @table = table
  in [rover_like] if rover_like.respond_to?(:to_h)
    begin
      # Accepts Rover::DataFrame
      @table = Arrow::Table.new(rover_like.to_h)
    rescue StandardError
      raise DataFrameTypeError, "to_h must return Arrowable object: #{rover_like}"
    end
  else
    begin
      @table = Arrow::Table.new(*args)
    rescue StandardError
      raise DataFrameTypeError, "invalid argument to create Arrow::Table: #{args}"
    end
  end

  name_unnamed_keys
  check_duplicate_keys(keys)
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(name, *args, &block) ⇒ `Object`

Catch variable (column) key as method name.

# File 'lib/red_amber/data_frame.rb', line 775

def method_missing(name, *args, &block)
  return variables[name] if args.empty? && key?(name)

  super
end

Instance Attribute Details

#table ⇒ `Arrow::Table` (readonly) Also known as: to_arrow

Returns the table having within.

Returns:

(Arrow::Table) —

the table within.



171
172
173

# File 'lib/red_amber/data_frame.rb', line 171

def table
  @table
end

Class Method Details

.create(table) ⇒ `DataFrame`

Note:

This method will allocate table directly and may be used in the method.

Note:

‘table` must have unique keys.

Quicker DataFrame constructor from a ‘Arrow::Table`.

Parameters:

table (Arrow::Table) —

A table to have in the DataFrame.

Returns:

(DataFrame) —

Initialized DataFrame.

# File 'lib/red_amber/data_frame.rb', line 31

def create(table)
  instance = allocate
  instance.instance_variable_set(:@table, table)
  instance
end

.new_dataframe_with_schema(dataframe_for_schema, dataframe_for_value) ⇒ `DataFrame`

Return new DataFrame for specified schema and value.

Parameters:

dataframe_for_schema (Dataframe) —

schema of this dataframe will be used.
dataframe_for_value (DataFrame) —

column values of thes dataframe will be used.

Returns:

(DataFrame) —

created DataFrame.

Since:

0.4.1

# File 'lib/red_amber/data_frame.rb', line 47

def new_dataframe_with_schema(dataframe_for_schema, dataframe_for_value)
  DataFrame.create(
    Arrow::Table.new(dataframe_for_schema.table.schema,
                     dataframe_for_value.table.columns)
  )
end

Instance Method Details

#==(other) ⇒ `true`, `false`

Compare DataFrames.

Returns:

(true, false) —

true if other is a DataFrame and table is same. Otherwise return false.



323
324
325

# File 'lib/red_amber/data_frame.rb', line 323

def ==(other)
  other.is_a?(DataFrame) && @table == other.table
end

#build_subframes(subset_specifier) ⇒ `SubFrames` #build_subframes {|self| ... } ⇒ `Object`

Generic builder of sub-dataframes from self.

Experimental feature: this method may be removed or be changed in the future.

Overloads:

#build_subframes(subset_specifier) ⇒ SubFrames

Create a new SubFrames object.

Examples:

df.build_subframes([[0, 2, 4], [1, 3, 5]])

# =>
#<RedAmber::SubFrames : 0x000000000000fe9c>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000fba4>
2 SubFrames: [3, 3] in sizes.
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000feb0>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       3 B        false
2       5 B        true
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000fec4>
        x y        z
  <uint8> <string> <boolean>
0       2 A        true
1       4 B        (nil)
2       6 C        false

Parameters:

subset_specifier (Array<Vector>, Array<array-like>) —

an Array of numeric indices or boolean filters to create subsets of DataFrame.

Returns:

(SubFrames) —

new SubFrames.

#build_subframes {|self| ... } ⇒ Object

Create a new SubFrames object by block.

Examples:

dataframe.build_subframes do
  even = indices.map(&:even?)
  [even, !even]
end

# =>
#<RedAmber::SubFrames : 0x000000000000fe60>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000fba4>
2 SubFrames: [3, 3] in sizes.
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000fe74>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       3 B        false
2       5 B        true
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000fe88>
        x y        z
  <uint8> <string> <boolean>
0       2 A        true
1       4 B        (nil)
2       6 C        false

Yields:

(self) —

the block is called within the context of self. (Block is called by instance_eval(&block). )

Yield Returns:

(Array<numeric_array_like>, Array<boolean_array_like>) —

an Array of index or boolean array-likes to create subsets of DataFrame. All array-likes are responsible to #numeric? or #boolean?.

Since:

0.4.0

# File 'lib/red_amber/data_frame.rb', line 693

def build_subframes(subset_specifier = nil, &block)
  if block
    SubFrames.new(self, instance_eval(&block))
  else
    SubFrames.new(self, subset_specifier)
  end
end

#each_row ⇒ `Enumerator` #each_row {|key_row_pairs| ... } ⇒ `Integer`

Enumerate for each row.

Overloads:

#each_row ⇒ Enumerator

Returns Enumerator when no block given.
Returns:
- (Enumerator) —
  
  enumerator of each rows.
#each_row {|key_row_pairs| ... } ⇒ Integer

Yields with key and row pairs.
Yield Parameters:
- key_row_pairs (Hash) —
  
  key and row pairs.
Yield Returns:
- (Integer) —
  
  size of the DataFrame.
Returns:
- (Integer) —
  
  returns size.

# File 'lib/red_amber/data_frame.rb', line 354

def each_row
  return enum_for(:each_row) unless block_given?

  size.times do |i|
    key_row_pairs =
      vectors.each_with_object({}) do |v, h|
        h[v.key] = v.data[i]
      end
    yield key_row_pairs
  end
end

#empty? ⇒ `true`, `false`

Check if it is a empty DataFrame.

Returns:

(true, false) —

true if it has no columns.



332
333
334

# File 'lib/red_amber/data_frame.rb', line 332

def empty?
  variables.empty?
end

#group(group_keys) ⇒ `Group` #group(group_keys) {|group| ... } ⇒ `DataFrame`

Create a Group object. Or create a Group and summarize it.

Overloads:

#group(*group_keys) ⇒ Group

Create a Group object.

Examples:

Create a Group

penguins.group(:species)

# =>
#<RedAmber::Group : 0x000000000000c3c8>
  species   group_count
  <string>      <uint8>
0 Adelie            152
1 Chinstrap          68
2 Gentoo            124

Parameters:

group_keys (Array<Symbol, String>) —

keys for grouping.

Returns:

(Group) —

Group object.

#group(*group_keys) {|group| ... } ⇒ DataFrame

Create a Group and summarize it by aggregation functions from the block.
Examples:

Create a group and summarize it.
```
penguins.group(:species)  { mean(:bill_length_mm) }

# =>
#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f3fc>
  species   mean(bill_length_mm)
  <string>              <double>
0 Adelie                   38.79
1 Chinstrap                48.83
2 Gentoo                    47.5
```
Yield Parameters:
- group (Group) —
  
  passes Group object.
Yield Returns:
- (DataFrame, Array<DataFrame>) —
  
  an aggregated DataFrame or an array of aggregated DataFrames.
Returns:
- (DataFrame) —
  
  summarized DataFrame.

# File 'lib/red_amber/data_frame.rb', line 416

def group(*group_keys, &block)
  g = Group.new(self, group_keys)
  g = g.summarize(&block) if block
  g
end

#key?(key) ⇒ `Boolean` Also known as: has_key?

Returns true if self has a specified key in the argument.

Parameters:

key (Symbol, String) —

key to test.

Returns:

(Boolean) —

returns true if self has key in Symbol.



236
237
238

# File 'lib/red_amber/data_frame.rb', line 236

def key?(key)
  keys.include?(key.to_sym)
end

#key_index(key) ⇒ `Integer` Also known as: find_index, index

Returns index of specified key in the Array keys.

Parameters:

key (Symbol, String) —

key to know.

Returns:

(Integer) —

index of key in the Array keys.



248
249
250

# File 'lib/red_amber/data_frame.rb', line 248

def key_index(key)
  keys.find_index(key.to_sym)
end

#keys ⇒ `Array` Also known as: column_names, var_names

Returns an Array of keys.

Returns:

(Array) —

keys in an Array.



223
224
225

# File 'lib/red_amber/data_frame.rb', line 223

def keys
  @keys ||= init_instance_vars(:keys)
end

#n_keys ⇒ `Integer` Also known as: n_variables, n_vars, n_cols

Returns the number of variables (columns).

Returns:

(Integer) —

number of variables (columns).



191
192
193

# File 'lib/red_amber/data_frame.rb', line 191

def n_keys
  @table.n_columns
end

#propagate(scalar) ⇒ `Vector` #propagate {|self| ... } ⇒ `Vector`

Returns a Vector such that all elements have value ‘scalar`

and have same size as self.

Overloads:

#propagate(scalar) ⇒ Vector

Specifies scalar as an agrument.

Examples:

propagate a value

df
# =>
#<RedAmber::DataFrame : 6 x 3 Vectors, 0x00000000000849a4>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
2       3 B        false
3       4 B        (nil)
4       5 B        true
5       6 C        false

df.assign(:sum_x) { propagate(x.sum) }
# =>
#<RedAmber::DataFrame : 6 x 4 Vectors, 0x000000000007bd04>
        x y        z           sum_x
  <uint8> <string> <boolean> <uint8>
0       1 A        false          21
1       2 A        true           21
2       3 B        false          21
3       4 B        (nil)          21
4       5 B        true           21
5       6 C        false          21

# Using `Vector#propagate` like below has same result as above.
df.assign(:sum_x) { x.propagate(:sum) }

# Also it is same as creating column from an Array.
df.assign(:sum_x) { [x.sum] * size }

Parameters:

scalar (scalar) —

a value to propagate in Vector.

Returns:

(Vector) —

created Vector.

#propagate {|self| ... } ⇒ Vector

Returns created Vector.

Examples:

propagate the value from the block

df.assign(:range) { propagate { x.max - x.min } }
# =>
#<RedAmber::DataFrame : 6 x 4 Vectors, 0x00000000000e603c>
        x y        z           range
  <uint8> <string> <boolean> <uint8>
0       1 A        false           5
1       2 A        true            5
2       3 B        false           5
3       4 B        (nil)           5
4       5 B        true            5
5       6 C        false           5

Yield Parameters:

self (DataFrame) —

gives self to the block.

Yield Returns:

(scalar) —

a value to propagate in Vector

Returns:

(Vector) —

created Vector.

Since:

0.5.0

# File 'lib/red_amber/data_frame.rb', line 765

def propagate(scalar = nil, &block)
  if block
    raise VectorArgumentError, "can't specify both function and block" if scalar

    scalar = instance_eval(&block)
  end
  Vector.new([scalar] * size)
end

#respond_to_missing?(name, include_private) ⇒ `Boolean`

Catch variable (column) key as method name.

Returns:

(Boolean)

# File 'lib/red_amber/data_frame.rb', line 782

def respond_to_missing?(name, include_private)
  return true if key?(name)

  super
end

#schema ⇒ `Hash`

Returns column name and data type in a Hash.

Examples:

RedAmber::DataFrame.new(x: [1, 2, 3], y: %w[A B C]).schema
# => {:x=>:uint8, :y=>:string}

Returns:

(Hash) —

column name and data type.



313
314
315

# File 'lib/red_amber/data_frame.rb', line 313

def schema
  keys.zip(types).to_h
end

#shape ⇒ `Array`

Returns the numbers of rows and columns.

Returns:

(Array) —

number of rows and number of columns in an array. Same as [size, n_keys].



204
205
206

# File 'lib/red_amber/data_frame.rb', line 204

def shape
  [size, n_keys]
end

#size ⇒ `Integer` Also known as: n_records, n_obs, n_rows

Returns the number of records (rows).

Returns:

(Integer) —

number of records (rows).



179
180
181

# File 'lib/red_amber/data_frame.rb', line 179

def size
  @table.n_rows
end

#sub_by_enum(enumerator_method, *args) ⇒ `SubFrames` Also known as: subframes_by_enum

Create SubFrames by Grouping/Windowing by posion from a enumrator method.

This method will process the indices of self by enumerator.

Experimental feature: this method may be removed or be changed in the future.

Examples:

Create a SubFrames object sliced by 3 rows.

df.sub_by_enum(:each_slice, 3)

# =>
#<RedAmber::SubFrames : 0x000000000000fd20>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000fba4>
2 SubFrames: [3, 3] in sizes.
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000fd34>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
2       3 B        false
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000fd48>
        x y        z
  <uint8> <string> <boolean>
0       4 B        (nil)
1       5 B        true
2       6 C        false

Create a SubFrames object for each consecutive 3 rows.

df.sub_by_enum(:each_cons, 4)

# =>
#<RedAmber::SubFrames : 0x000000000000fd98>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000fba4>
3 SubFrames: [4, 4, 4] in sizes.
---
#<RedAmber::DataFrame : 4 x 3 Vectors, 0x000000000000fdac>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
2       3 B        false
3       4 B        (nil)
---
#<RedAmber::DataFrame : 4 x 3 Vectors, 0x000000000000fdc0>
        x y        z
  <uint8> <string> <boolean>
0       2 A        true
1       3 B        false
2       4 B        (nil)
3       5 B        true
---
#<RedAmber::DataFrame : 4 x 3 Vectors, 0x000000000000fdd4>
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       4 B        (nil)
2       5 B        true
3       6 C        false

Parameters:

enumerator_method (Symbol) —

Enumerator name.
args (<Object>) —

arguments for the enumerator method.

Returns:

(SubFrames) —

a created SubFrames.

Since:

0.4.0



575
576
577

# File 'lib/red_amber/data_frame.rb', line 575

def sub_by_enum(enumerator_method, *args)
  SubFrames.new(self, indices.send(enumerator_method, *args).to_a)
end

#sub_by_kernel(kernel, step: 1) ⇒ `SubFrames` Also known as: subframes_by_kernel

Create SubFrames by windowing with a kernel (i.e. masked window) and step.

Experimental feature: this method may be removed or be changed in the future.

Examples:

kernel = [true, false, false, true]
df.sub_by_kernel(kernel, step: 2)

# =>
#<RedAmber::SubFrames : 0x000000000000fde8>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000fba4>
2 SubFrames: [2, 2] in sizes.
---
#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000fdfc>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       4 B        (nil)
---
#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000fe10>
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       6 C        false

Parameters:

kernel (Array<true, false>, Vector) —

boolean array-like to pick records in the window. Kernel is a boolean Array and it behaves like a masked window.
step (Integer) (defaults to: 1) —

moving step of window.

Returns:

(SubFrames) —

a created SubFrames.

Since:

0.4.0

# File 'lib/red_amber/data_frame.rb', line 613

def sub_by_kernel(kernel, step: 1)
  limit_size = size - kernel.size
  kernel_vector = Vector.new(kernel.concat([nil] * limit_size))
  SubFrames.new(self) do
    0.step(by: step, to: limit_size).map do |i|
      kernel_vector.shift(i)
    end
  end
end

#sub_by_value(*keys) ⇒ `SubFrames` Also known as: subframes_by_value, sub_group

Create SubFrames by value grouping.

Experimental feature: this method may be removed or be changed in the future.

Examples:

df.sub_by_value(:y)

# =>
#<RedAmber::SubFrames : 0x000000000000fc08>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000fba4>
3 SubFrames: [2, 3, 1] in sizes.
---
#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000fc1c>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000fc30>
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       4 B        (nil)
2       5 B        true
---
#<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000000fc44>
        x y        z
  <uint8> <string> <boolean>
0       6 C        false

Parameters:

keys (List<Symbol, String>, Array<Symbol, String>) —

grouping keys.

Returns:

(SubFrames) —

a created SubFrames grouped by column values on ‘keys`.

Since:

0.4.0



457
458
459

# File 'lib/red_amber/data_frame.rb', line 457

def sub_by_value(*keys)
  SubFrames.new(self, group(keys.flatten).filters)
end

#sub_by_window(from: 0, size: nil, step: 1) ⇒ `SubFrames` Also known as: subframes_by_window

Create SubFrames by Windowing with ‘from`, `size` and `step`.

Experimental feature: this method may be removed or be changed in the future.

Examples:

df.sub_by_window(size: 4, step: 2)

# =>
#<RedAmber::SubFrames : 0x000000000000fc58>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000fba4>
2 SubFrames: [4, 4] in sizes.
---
#<RedAmber::DataFrame : 4 x 3 Vectors, 0x000000000000fc6c>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
2       3 B        false
3       4 B        (nil)
---
#<RedAmber::DataFrame : 4 x 3 Vectors, 0x000000000000fc80>
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       4 B        (nil)
2       5 B        true
3       6 C        false

Parameters:

from (Integer) (defaults to: 0) —

start position of window.
size (Integer) (defaults to: nil) —

window size.
step (Integer) (defaults to: 1) —

moving step of window.

Returns:

(SubFrames) —

a created SubFrames.

Since:

0.4.0

# File 'lib/red_amber/data_frame.rb', line 500

def sub_by_window(from: 0, size: nil, step: 1)
  SubFrames.new(self) do
    from.step(by: step, to: (size() - size)).map do |i| # rubocop:disable Style/MethodCallWithoutArgsParentheses
      [*i...(i + size)]
    end
  end
end

#to_a ⇒ `Array` Also known as: raw_records

Note:

If you need column-oriented array, use ‘.to_h.to_a`.

Returns a row-oriented array without header.

Returns:

(Array) —

row-oriented data without header.



299
300
301

# File 'lib/red_amber/data_frame.rb', line 299

def to_a
  @table.raw_records
end

#to_h ⇒ `Hash`

Returns column-oriented data in a Hash.

Returns:

(Hash) —

a Hash of ‘key => column_in_an_array’.



288
289
290

# File 'lib/red_amber/data_frame.rb', line 288

def to_h
  variables.transform_values(&:to_a)
end

#to_rover ⇒ `Rover::DataFrame`

Returns self in a ‘Rover::DataFrame`.

Returns:

(Rover::DataFrame) —

a ‘Rover::DataFrame`.

# File 'lib/red_amber/data_frame.rb', line 371

def to_rover
  require 'rover'
  Rover::DataFrame.new(to_h)
end

#type_classes ⇒ `Array`

Returns an Array of Classes of data type.

Returns:

(Array) —

an Array of Red Arrow data type Classes.



270
271
272

# File 'lib/red_amber/data_frame.rb', line 270

def type_classes
  @type_classes ||= @table.columns.map { |column| column.data_type.class }
end

#types ⇒ `Array`

Returns abbreviated type names in an Array.

Returns:

(Array) —

abbreviated Red Arrow data type names.

# File 'lib/red_amber/data_frame.rb', line 259

def types
  @types ||= @table.columns.map do |column|
    column.data.value_type.nick.to_sym
  end
end

#variables ⇒ `Hash` Also known as: vars

Returns a Hash of key and Vector pairs in the columns.

Returns:

(Hash) —

‘key => Vector` pairs for each columns.



213
214
215

# File 'lib/red_amber/data_frame.rb', line 213

def variables
  @variables ||= init_instance_vars(:variables)
end

#vectors ⇒ `Array`

Returns Vectors in an Array.

Returns:

(Array) —

an Array of Vector.



279
280
281

# File 'lib/red_amber/data_frame.rb', line 279

def vectors
  @vectors ||= init_instance_vars(:vectors)
end

Class: RedAmber::DataFrame

Overview

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from DataFrameVariableOperation

Methods included from DataFrameSelectable

Methods included from DataFrameReshaping

Methods included from DataFrameLoadSave

Methods included from DataFrameIndexable

Methods included from DataFrameDisplayable

Methods included from DataFrameCombinable

Constructor Details

#initialize(hash) ⇒ DataFrame #initialize(table) ⇒ DataFrame #initialize(schama, row_oriented_array) ⇒ DataFrame #initialize(arrowable) ⇒ DataFrame #initialize(rover_like) ⇒ DataFrame #initialize ⇒ DataFrame #initialize(empty) ⇒ DataFrame

Examples:

Initialize by a Hash

Initialize by a Hash like arguments.

Initialize from #to_arrow_array responsibles.

Examples:

Initialize by a Table

Examples:

Initialize by a schema and a row_oriented_array.

Examples:

Initialize by Red Dataset object.

Examples:

Examples:

Return empty DataFrame.

Dynamic Method Handling

#method_missing(name, *args, &block) ⇒ Object

Instance Attribute Details

#table ⇒ Arrow::Table (readonly) Also known as: to_arrow

Class Method Details

.create(table) ⇒ DataFrame

.new_dataframe_with_schema(dataframe_for_schema, dataframe_for_value) ⇒ DataFrame

Instance Method Details

#==(other) ⇒ true, false

#build_subframes(subset_specifier) ⇒ SubFrames #build_subframes {|self| ... } ⇒ Object

Examples:

Examples:

#each_row ⇒ Enumerator #each_row {|key_row_pairs| ... } ⇒ Integer

#empty? ⇒ true, false

#group(*group_keys) ⇒ Group #group(*group_keys) {|group| ... } ⇒ DataFrame

Examples:

Create a Group

Examples:

Create a group and summarize it.

#key?(key) ⇒ Boolean Also known as: has_key?

#key_index(key) ⇒ Integer Also known as: find_index, index

#keys ⇒ Array Also known as: column_names, var_names

#n_keys ⇒ Integer Also known as: n_variables, n_vars, n_cols

#propagate(scalar) ⇒ Vector #propagate {|self| ... } ⇒ Vector

Examples:

propagate a value

Examples:

propagate the value from the block

#respond_to_missing?(name, include_private) ⇒ Boolean

#schema ⇒ Hash

Examples:

#shape ⇒ Array

#size ⇒ Integer Also known as: n_records, n_obs, n_rows

#sub_by_enum(enumerator_method, *args) ⇒ SubFrames Also known as: subframes_by_enum

Examples:

Create a SubFrames object sliced by 3 rows.

Create a SubFrames object for each consecutive 3 rows.

#sub_by_kernel(kernel, step: 1) ⇒ SubFrames Also known as: subframes_by_kernel

Examples:

#sub_by_value(*keys) ⇒ SubFrames Also known as: subframes_by_value, sub_group

Examples:

#sub_by_window(from: 0, size: nil, step: 1) ⇒ SubFrames Also known as: subframes_by_window

Examples:

#to_a ⇒ Array Also known as: raw_records

#to_h ⇒ Hash

#to_rover ⇒ Rover::DataFrame

#type_classes ⇒ Array

#types ⇒ Array

#variables ⇒ Hash Also known as: vars

#vectors ⇒ Array

#initialize(hash) ⇒ `DataFrame` #initialize(table) ⇒ `DataFrame` #initialize(schama, row_oriented_array) ⇒ `DataFrame` #initialize(arrowable) ⇒ `DataFrame` #initialize(rover_like) ⇒ `DataFrame` #initialize ⇒ `DataFrame` #initialize(empty) ⇒ `DataFrame`

#method_missing(name, *args, &block) ⇒ `Object`

#table ⇒ `Arrow::Table` (readonly) Also known as: to_arrow

.create(table) ⇒ `DataFrame`

.new_dataframe_with_schema(dataframe_for_schema, dataframe_for_value) ⇒ `DataFrame`

#==(other) ⇒ `true`, `false`

#build_subframes(subset_specifier) ⇒ `SubFrames` #build_subframes {|self| ... } ⇒ `Object`

#each_row ⇒ `Enumerator` #each_row {|key_row_pairs| ... } ⇒ `Integer`

#empty? ⇒ `true`, `false`

#group(group_keys) ⇒ `Group` #group(group_keys) {|group| ... } ⇒ `DataFrame`

#key?(key) ⇒ `Boolean` Also known as: has_key?

#key_index(key) ⇒ `Integer` Also known as: find_index, index

#keys ⇒ `Array` Also known as: column_names, var_names

#n_keys ⇒ `Integer` Also known as: n_variables, n_vars, n_cols

#propagate(scalar) ⇒ `Vector` #propagate {|self| ... } ⇒ `Vector`

#respond_to_missing?(name, include_private) ⇒ `Boolean`

#schema ⇒ `Hash`

#shape ⇒ `Array`

#size ⇒ `Integer` Also known as: n_records, n_obs, n_rows

#sub_by_enum(enumerator_method, *args) ⇒ `SubFrames` Also known as: subframes_by_enum

#sub_by_kernel(kernel, step: 1) ⇒ `SubFrames` Also known as: subframes_by_kernel

#sub_by_value(*keys) ⇒ `SubFrames` Also known as: subframes_by_value, sub_group

#sub_by_window(from: 0, size: nil, step: 1) ⇒ `SubFrames` Also known as: subframes_by_window

#to_a ⇒ `Array` Also known as: raw_records

#to_h ⇒ `Hash`

#to_rover ⇒ `Rover::DataFrame`

#type_classes ⇒ `Array`

#types ⇒ `Array`

#variables ⇒ `Hash` Also known as: vars

#vectors ⇒ `Array`