Class: RedAmber::SubFrames

Inherits:
Object
  • Object
show all
Includes:
Enumerable, Helper
Defined in:
lib/red_amber/subframes.rb

Overview

class SubFrames treats subsets of a DataFrame

Experimental feature

Class SubFrames may be removed or be changed in the future.

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dataframe, subset_specifier) ⇒ SubFrames #initialize(dataframe) {|dataframe| ... } ⇒ SubFrames

Create a new SubFrames object from a DataFrame and an array of indices or filters.

Overloads:

  • #initialize(dataframe, subset_specifier) ⇒ SubFrames

    Create a new SubFrames object.

    Examples:

    dataframe
    
    # =>
    #<RedAmber::DataFrame : 6 x 3 Vectors, 0x00000000000039e4>
      x y        z
      <uint8> <string> <boolean>
    0       1 A        false
    1       2 A        true
    2       3 B        false
    3       4 B        (nil)
    4       5 B        true
    5       6 C        false
    
    # --- This object is used as common source in this class ---
    subframes = SubFrames.new(dataframe, [[0 ,1], [2, 3, 4], [5]])
    
    # =>
    #<RedAmber::SubFrames : 0x000000000000cf6c>
    @baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000cf80>
    3 SubFrames: [2, 3, 1] in sizes.
    ---
    #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000cf94>
            x y        z
      <uint8> <string> <boolean>
    0       1 A        false
    1       2 A        true
    ---
    #<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000cfa8>
            x y        z
      <uint8> <string> <boolean>
    0       3 B        false
    1       4 B        (nil)
    2       5 B        true
    ---
    #<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000000cfbc>
            x y        z
      <uint8> <string> <boolean>
    0       6 C        false

    Parameters:

    • dataframe (DataFrame)

      a source dataframe.

    • subset_specifier (Array<Vector>, Array<array-like>)

      an Array of numeric indices or boolean filters to create subsets of DataFrame.

  • #initialize(dataframe) {|dataframe| ... } ⇒ SubFrames

    Create a new SubFrames object by block.

    Examples:

    SubFrames.new(dataframe) do |df|
      booleans = df[:z]
      [booleans, !booleans]
    end
    
    # =>
    #<RedAmber::SubFrames : 0x0000000000003aac>
    @baseframe=#<RedAmber::DataFrame : 5 x 3 Vectors, 0x0000000000003ac0>
    2 SubFrames: [2, 3] in sizes.
    ---
    #<RedAmber::DataFrame : 2 x 3 Vectors, 0x0000000000003ad4>
            x y        z
      <uint8> <string> <boolean>
    0       2 A        true
    1       5 B        true
    ---
    #<RedAmber::DataFrame : 3 x 3 Vectors, 0x0000000000003ae8>
            x y        z
      <uint8> <string> <boolean>
    0       1 A        false
    1       3 B        false
    2       6 C        false

    Parameters:

    • dataframe (DataFrame)

      a source dataframe.

    Yield Parameters:

    • dataframe (DataFrame)

      the block is called with ‘dataframe`.

    Yield Returns:

    • (Array<numeric_array_like>, Array<boolean_array_like>)

      an Array of index or boolean array-likes to create subsets of DataFrame. All array-likes are responsible to #numeric? or #boolean?.

Since:

  • 0.4.0



288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
# File 'lib/red_amber/subframes.rb', line 288

def initialize(dataframe, selectors = nil, &block)
  unless dataframe.is_a?(DataFrame)
    raise SubFramesArgumentError, "not a DataFrame: #{dataframe}"
  end

  if block
    unless selectors.nil?
      raise SubFramesArgumentError, 'Must not specify both arguments and block.'
    end

    selectors = yield(dataframe)
  end

  if dataframe.empty? || selectors.nil? || selectors.size.zero? # rubocop:disable Style/ZeroLengthPredicate
    @baseframe = DataFrame.new
    @selectors = Selectors.new([])
  else
    @baseframe = dataframe
    @selectors =
      if selectors.first.boolean?
        Filters.new(selectors)
      elsif selectors.first.numeric?
        Indices.new(selectors)
      else
        raise SubFramesArgumentError, "illegal type: #{selectors}"
      end
  end
  @frames = []
end

Class Method Details

.by_dataframes(dataframes) ⇒ SubFrames

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Note:

dataframes must have same schema.

Create a new SubFrames from an Array of DataFrames.

Parameters:

  • dataframes (Array<DataFrame>)

    an array of DataFrames which have same schema.

Returns:

Since:

  • 0.4.0



157
158
159
160
161
162
163
164
165
166
167
168
169
170
# File 'lib/red_amber/subframes.rb', line 157

def by_dataframes(dataframes)
  instance = allocate
  case Array(dataframes)
  when [] || [nil]
    instance.instance_variable_set(:@baseframe, DataFrame.new)
    instance.instance_variable_set(:@selectors, [])
    instance.instance_variable_set(:@frames, [])
  else
    instance.instance_variable_set(:@baseframe, nil)
    instance.instance_variable_set(:@selectors, nil)
    instance.instance_variable_set(:@frames, dataframes)
  end
  instance
end

.by_filters(dataframe, subset_filters) ⇒ SubFrames

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Note:

this method doesn’t check arguments.

Create a new SubFrames object from a DataFrame and an array of filters.

Parameters:

  • dataframe (DataFrame)

    a source dataframe.

  • subset_filters (Array, Array<Vector>)

    an Array of booleans to specify subsets of DataFrame. Each filters must have same length as dataframe.

Returns:

Since:

  • 0.4.0



139
140
141
142
143
144
145
# File 'lib/red_amber/subframes.rb', line 139

def by_filters(dataframe, subset_filters)
  instance = allocate
  instance.instance_variable_set(:@baseframe, dataframe)
  instance.instance_variable_set(:@selectors, Filters.new(subset_filters))
  instance.instance_variable_set(:@frames, [])
  instance
end

.by_group(group) ⇒ SubFrames

Create SubFrames from a Group.

Experimental feature

this method may be removed or be changed in the future.

Examples:

dataframe

# =>
#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000fba4>
x y        z
<uint8> <string> <boolean>
0       1 A        false
1       2 A        true
2       3 B        false
3       4 B        (nil)
4       5 B        true
5       6 C        false

group = Group.new(dataframe, [:y])
sf = SubFrames.by_group(group)

# =>
#<RedAmber::SubFrames : 0x000000000000fbb8>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000fb7c>
3 SubFrames: [2, 3, 1] in sizes.
---
#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000fbcc>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000fbe0>
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       4 B        (nil)
2       5 B        true
---
#<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000000fbf4>
        x y        z
  <uint8> <string> <boolean>
0       6 C        false

Parameters:

  • group (Group)

    a Group to be used to create SubFrames.

Returns:

Since:

  • 0.4.0



102
103
104
# File 'lib/red_amber/subframes.rb', line 102

def by_group(group)
  SubFrames.by_filters(group.dataframe, group.filters)
end

.by_indices(dataframe, subset_indices) ⇒ SubFrames

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Note:

this method doesn’t check arguments.

Create a new SubFrames object from a DataFrame and an array of indices.

Parameters:

  • dataframe (DataFrame)

    a source dataframe.

  • subset_indices (Array, Array<Vector>)

    an Array of numeric indices to create subsets of DataFrame.

Returns:

Since:

  • 0.4.0



118
119
120
121
122
123
124
# File 'lib/red_amber/subframes.rb', line 118

def by_indices(dataframe, subset_indices)
  instance = allocate
  instance.instance_variable_set(:@baseframe, dataframe)
  instance.instance_variable_set(:@selectors, Indices.new(subset_indices))
  instance.instance_variable_set(:@frames, [])
  instance
end

Instance Method Details

#aggregate(keys) {|dataframe| ... } ⇒ DataFrame #aggregate {|dataframe| ... } ⇒ DataFrame #aggregate {|dataframe| ... } ⇒ DataFrame #aggregate(group_keys, aggregations) ⇒ DataFrame #aggregate(group_keys, aggregations) ⇒ DataFrame

Note:

This method does not check if aggregation function is used.

Aggregate SubFrames to create a DataFrame.

This method creates a DataFrame with one row corresponding to one sub dataframe.

Overloads:

  • #aggregate(keys) {|dataframe| ... } ⇒ DataFrame

    Aggregate SubFrames creating DataFrame with label ‘keys` and its column values by block.

    Examples:

    Aggregate by key labels in arguments and values from block.

    subframes.aggregate(:y, :sum_x) { [y.one, x.sum] }
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000003b24>
      y          sum_x
      <string> <uint8>
    0 A              3
    1 B             12
    2 C              6

    Aggregate by key labels in an Array and values from block.

    subframes.aggregate([:y, :sum_x]) { [y.one, x.sum] }
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000003b24>
      y          sum_x
      <string> <uint8>
    0 A              3
    1 B             12
    2 C              6

    Parameters:

    • keys (Symbol, Array<Symbol>)

      a key or keys of result. Key names may be renamed to new label.

    Yield Parameters:

    • dataframe (DataFrame)

      passes each dataframe in self to the block. Block is called by instance_eval, so inside of the block is the context of passed dataframe.

    Yield Returns:

    • (Array)

      aggregated values from the columns of passed dataframe.

    Returns:

  • #aggregate {|dataframe| ... } ⇒ DataFrame

    Aggregate SubFrames creating DataFrame with pairs of key and aggregated values in Hash from the block.

    Examples:

    Aggregate by key and value pairs from block.

    subframes.aggregate do
      { y: y.one, sum_x: x.sum }
    end
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000003b24>
      y          sum_x
      <string> <uint8>
    0 A              3
    1 B             12
    2 C              6

    Yield Parameters:

    • dataframe (DataFrame)

      passes each dataframe in self to the block. Block is called by instance_eval, so inside of the block is the context of passed dataframe.

    Yield Returns:

    • (Hash<key => aggregated_value>)

      pairs of key name and aggregated values from the columns of passed dataframe. Key names may be renamed to new label in the result.

    Returns:

  • #aggregate {|dataframe| ... } ⇒ DataFrame

    Aggregate SubFrames creating DataFrame with an Array of key and aggregated value from the block.

    Examples:

    Aggregate by key and value arrays from block.

    subframes.aggregate do
      [[:y, y.first], [:sum_x, x.sum]]
    end
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000003b24>
      y          sum_x
      <string> <uint8>
    0 A              3
    1 B             12
    2 C              6

    Yield Parameters:

    • dataframe (DataFrame)

      passes each dataframe in self to the block. Block is called by instance_eval, so inside of the block is the context of passed dataframe.

    Yield Returns:

    • (Array<key, aggregated_value>)

      pairs of key name and aggregated values from the columns of passed dataframe. Key names may be renamed to new label in the result.

    Returns:

  • #aggregate(group_keys, aggregations) ⇒ DataFrame

    Aggregate SubFrames for first values of the columns of ‘group_keys` and the aggregated results of key-function pairs.

    Experimental

    This API may be changed in the future.

    Examples:

    Aggregate with a group key and key function pairs by a Hash.

    subframes.aggregate(:y, { x: :sum, z: :count })
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000003b24>
      y          sum_x count_z
      <string> <uint8> <uint8>
    0 A              3       2
    1 B             12       2
    2 C              6       1

    Parameters:

    • group_keys (Symbol, String, Array<Symbol, String>)

      group key name(s) to output values.

    • aggregations (Hash<Array<Symbol, String> => Array<:Symbol>>)

      a Hash of variable (column) name and Vector aggregate function name to apply.

    Returns:

  • #aggregate(group_keys, aggregations) ⇒ DataFrame

    Aggregate SubFrames for first values of the columns of ‘group_keys` and the aggregated results of all combinations of supplied keys and functions.

    Experimental

    This API may be changed in the future.

    Examples:

    Aggregate with group keys and keys and functions by an Array.

    sf.aggregate(:y, [[:x, :z], [:count, :sum]])
    
    # =>
    #<RedAmber::DataFrame : 3 x 5 Vectors, 0x000000000000fcbc>
      y        count_x   sum_x count_z   sum_z
      <string> <uint8> <uint8> <uint8> <uint8>
    0 A              2       3       2       1
    1 B              3      12       2       1
    2 C              1       6       1       0

    Parameters:

    • group_keys (Symbol, String, Array<Symbol, String>)

      group key name(s) to output values.

    • aggregations (Array[Array<Symbol, String>, Array<:Symbol>])

      an Array of Array of variable (column) names and Array of Vector aggregate function names to apply.

    Returns:

Since:

  • 0.4.0



561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
# File 'lib/red_amber/subframes.rb', line 561

def aggregate(*args, &block)
  aggregator =
    if block
      if args.empty?
        # aggregate { {key => value} or [[key, value], ...] }
        each_with_object(Hash.new { |h, k| h[k] = [] }) do |df, hash|
          df.instance_eval(&block).to_h.each do |k, v|
            hash[k] << v
          end
        end
      else
        # aggregate(keys) { values }
        values = each.map { |df| Array(df.instance_eval(&block)) }.transpose
        args.flatten.zip(values)
      end
    else
      # These functions may be removed in the future.
      case args
      in [group_keys1, Hash => h]
        # aggregate(group_keys, { key => func })
        ary = Array(group_keys1).map { |key| [:first, key] }
        ary.concat(h.to_a.map { [_2, _1] }) # rubocop:disable Style/NumberedParametersLimit
      in [group_keys2, [Array => keys, Array => funcs]]
        # aggregate(group_keys, [keys, funcs])
        ary = Array(group_keys2).map { |key| [:first, key] }
        ary.concat(funcs.product(keys))
      else
        raise SubFramesArgumentError, "invalid argument: #{args}"
      end
      sf = self
      ary.map do |func, key|
        label = func == :first ? key : "#{func}_#{key}"
        [label, sf.each.map { |df| df[key].send(func) }]
      end
    end
  DataFrame.new(aggregator)
end

#assign(key) {|dataframe| ... } ⇒ SubFrames #assign(keys) {|dataframe| ... } ⇒ SubFrames #assign {|dataframe| ... } ⇒ SubFrames

Update existing column(s) or create new columns(s) for each DataFrames in self.

Column values are updated by an oveloaded common operation.

Overloads:

  • #assign(key) {|dataframe| ... } ⇒ SubFrames

    Assign a column by argument and block.

    Examples:

    subframes.assign(:x_plus1) { x + 1 }
    
    # =>
    #<RedAmber::SubFrames : 0x000000000000c3a0>
    @baseframe=#<RedAmber::DataFrame : 6 x 4 Vectors, 0x000000000000c3b4>
    3 SubFrames: [2, 3, 1] in sizes.
    ---
    #<RedAmber::DataFrame : 2 x 4 Vectors, 0x000000000000c3c8>
            x y        z         x_plus1
      <uint8> <string> <boolean> <uint8>
    0       1 A        false           2
    1       2 A        true            3
    ---
    #<RedAmber::DataFrame : 3 x 4 Vectors, 0x000000000000c3dc>
            x y        z         x_plus1
      <uint8> <string> <boolean> <uint8>
    0       3 B        false           4
    1       4 B        (nil)           5
    2       5 B        true            6
    ---
    #<RedAmber::DataFrame : 1 x 4 Vectors, 0x000000000000c3f0>
            x y        z         x_plus1
      <uint8> <string> <boolean> <uint8>
    0       6 C        false           7

    Parameters:

    • key (Symbol, String)

      a key of column to assign.

    Yield Parameters:

    • dataframe (DataFrame)

      gives overloaded dataframe in self to the block.

    Yield Returns:

    • (Vector, Array, Arrow::Array)

      an updated column value which are overloaded.

    Returns:

    • (SubFrames)

      a new SubFrames object with updated DataFrames.

  • #assign(keys) {|dataframe| ... } ⇒ SubFrames

    Assign columns by arguments and block.

    Examples:

    subframes.assign(:sum_x, :frac_x) do
      group_sum = x.sum
      [[group_sum] * size, x / group_sum.to_f]
    end
    
    # =>
    #<RedAmber::SubFrames : 0x000000000000fce4>
    @baseframe=#<RedAmber::DataFrame : 6 x 5 Vectors, 0x000000000000fcf8>
    3 SubFrames: [2, 3, 1] in sizes.
    ---
    #<RedAmber::DataFrame : 2 x 5 Vectors, 0x000000000000fd0c>
            x y        z           sum_x   frac_x
      <uint8> <string> <boolean> <uint8> <double>
    0       1 A        false           3     0.33
    1       2 A        true            3     0.67
    ---
    #<RedAmber::DataFrame : 3 x 5 Vectors, 0x000000000000fd20>
            x y        z           sum_x   frac_x
      <uint8> <string> <boolean> <uint8> <double>
    0       3 B        false          12     0.25
    1       4 B        (nil)          12     0.33
    2       5 B        true           12     0.42
    ---
    #<RedAmber::DataFrame : 1 x 5 Vectors, 0x000000000000fd34>
            x y        z           sum_x   frac_x
      <uint8> <string> <boolean> <uint8> <double>
    0       6 C        false           6      1.0

    Parameters:

    • keys (Array<Symbol, String>)

      keys of columns to assign.

    Yield Parameters:

    • dataframe (DataFrame)

      gives overloaded dataframes in self to the block.

    Yield Returns:

    • (Array<Vector, Array, Arrow::Array>)

      an updated column values which are overloaded.

    Returns:

    • (SubFrames)

      a new SubFrames object with updated DataFrames.

  • #assign {|dataframe| ... } ⇒ SubFrames

    Assign column(s) by block.

    Examples:

    Compute ‘x * z’ when (true, not_true) = (1, 0) in z

    subframes.assign do
      { 'x*z': x * z.if_else(1, 0) }
    end
    
    # =>
    #<RedAmber::SubFrames : 0x000000000000fd98>
    @baseframe=#<RedAmber::DataFrame : 6 x 4 Vectors, 0x000000000000fdac>
    3 SubFrames: [2, 3, 1] in sizes.
    ---
    #<RedAmber::DataFrame : 2 x 4 Vectors, 0x000000000000fdc0>
            x y        z             x*z
      <uint8> <string> <boolean> <uint8>
    0       1 A        false           0
    1       2 A        true            2
    ---
    #<RedAmber::DataFrame : 3 x 4 Vectors, 0x000000000000fdd4>
            x y        z             x*z
      <uint8> <string> <boolean> <uint8>
    0       3 B        false           0
    1       4 B        (nil)       (nil)
    2       5 B        true            5
    ---
    #<RedAmber::DataFrame : 1 x 4 Vectors, 0x000000000000fde8>
            x y        z             x*z
      <uint8> <string> <boolean> <uint8>
    0       6 C        false           0

    Yield Parameters:

    • dataframe (DataFrame)

      gives overloaded dataframes in self to the block.

    Yield Returns:

    • (Hash, Array)

      pairs of keys and column values which are overloaded.

    Returns:

    • (SubFrames)

      a new SubFrames object with updated DataFrames.

Since:

  • 0.4.0



780
781
782
# File 'lib/red_amber/subframes.rb', line 780

def assign(...)
  map { |df| df.assign(...) }
end

#baseframeDataFrame Also known as: concatenate, concat

Return concatenated SubFrames as a DataFrame.

Once evaluated, memorize it as @baseframe.

Returns:

Since:

  • 0.4.0



325
326
327
# File 'lib/red_amber/subframes.rb', line 325

def baseframe
  @baseframe ||= reduce(&:concatenate)
end

#eachEnumerator #each {|subframe| ... } ⇒ self

Iterates over sub DataFrames or returns an Enumerator.

This method will memorize sub DataFrames and always returns the same object. The Class SubFrames is including Enumerable module. So many methods in Enumerable are available.

Examples:

Returns Enumerator

subframes.each

# =>
#<Enumerator: ...>

‘to_a` from Enumerable.

subframes.to_a

# =>
[#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000002a120>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
,
 #<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000002a134>
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       4 B        (nil)
2       5 B        true
,
 #<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000002a148>
        x y        z
  <uint8> <string> <boolean>
0       6 C        false
]

Concatenate SubFrames. This example is used in #concatenate.

subframes.reduce(&:concatenate)

# =>
#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000004883c>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
2       3 B        false
3       4 B        (nil)
4       5 B        true
5       6 C        false

Overloads:

  • #eachEnumerator

    Returns a new Enumerator if no block given.

    Returns:

    • (Enumerator)

      Enumerator of each elements.

  • #each {|subframe| ... } ⇒ self

    When a block given, passes each sub DataFrames to the block.

    Yield Parameters:

    • subframe (DataFrame)

      passes sub DataFrame by a block parameter.

    Yield Returns:

    • (Object)

      evaluated result value from the block.

    Returns:

    • (self)

      returns self.

Since:

  • 0.4.0



398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
# File 'lib/red_amber/subframes.rb', line 398

def each(&block)
  return enum_for(__method__) { size } unless block

  if @selectors
    @selectors.each.with_index do |selector, i|
      if i < @frames.size
        yield @frames[i]
      else
        frame = get_subframe(selector)
        @frames << frame
        yield frame
      end
    end
  else
    @frames.each(&block)
  end
  nil
end

#empty?true, false

Test if subset is empty?.

Returns:

  • (true, false)

    true if self is an empty subset.

Since:

  • 0.4.0



1024
1025
1026
# File 'lib/red_amber/subframes.rb', line 1024

def empty?
  size.zero?
end

#filter_map {|dataframe| ... } ⇒ SubFrames

Returns a SubFrames containing truthy DataFrames returned by the block.

With a block given, calls the block with successive DataFrames; returns a SubFrames of those DataFrames for which the block returns nil or false.

Returns SubFrames

Use ‘#each.filter_map` if you want to get DataFrames by Array.

Returns an Enumerator with no block given.

Examples:

Filter for size is larger than 1 and append number to column ‘y’.

subframes.filter_map do |df|
  if df.size > 1
    df.assign(:y) do
      y.merge(indices('1'), sep: '')
    end
  end
end

# =>
#<RedAmber::SubFrames : 0x000000000001da88>
@baseframe=#<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000001da9c>
2 SubFrames: [2, 3] in sizes.
---
#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000001dab0>
        x y        z
  <uint8> <string> <boolean>
0       1 A1       false
1       2 A2       true
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000001dac4>
        x y        z
  <uint8> <string> <boolean>
0       3 B1       false
1       4 B2       (nil)
2       5 B3       true

Yield Parameters:

  • dataframe (DataFrame)

    gives each element.

Yield Returns:

  • (Array<DataFrame>)

    the block should return DataFrames with same schema.

Returns:

Since:

  • 0.4.0



943
# File 'lib/red_amber/subframes.rb', line 943

define_subframable_method :filter_map

#framesArray<DataFrame> #frames(n_frames) ⇒ Array<DataFrame>

Return an Array of sub DataFrames

Overloads:

  • #framesArray<DataFrame>

    Returns all sub dataframes.

    Returns:

  • #frames(n_frames) ⇒ Array<DataFrame>

    Returns partial sub dataframes.

    Parameters:

    • n_frames (Integer)

      num of dataframes to retrieve.

    Returns:

Since:

  • 0.4.2



1163
1164
1165
1166
1167
1168
1169
1170
1171
# File 'lib/red_amber/subframes.rb', line 1163

def frames(n_frames = nil)
  n_frames = size if n_frames.nil?

  if @frames.size < n_frames
    @frames = each.take(n_frames)
  else
    @frames.take(n_frames)
  end
end

#inspect(limit: 5) ⇒ String

Return summary information of self.

Examples:

df

# =>
#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000caa8>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
2       3 B        false
3       4 B        (nil)
4       5 B        true
5       6 C        false

SubFrames.new(df, [[0, 1], [2, 3, 4], [5]])

# =>
#<RedAmber::SubFrames : 0x000000000000c1fc>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000c170>
3 SubFrames: [2, 3, 1] in sizes.
---
#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000002a120>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
---
#<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000002a134>
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       4 B        (nil)
2       5 B        true
---
#<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000002a148>
        x y        z
  <uint8> <string> <boolean>
0       6 C        false

Parameters:

  • limit (Integer) (defaults to: 5)

    maximum number of DataFrames to show.

Returns:

  • (String)

    return class name, object id, universal DataFrame, size and subset sizes in a String.

Since:

  • 0.4.0



1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
# File 'lib/red_amber/subframes.rb', line 1130

def inspect(limit: 5)
  shape =
    if @baseframe.nil?
      '(Not prepared)'
    else
      baseframe.shape_str(with_id: true)
    end
  sizes_truncated = (size > limit ? sizes.take(limit) << '...' : sizes).join(', ')
  "#<#{self.class} : #{format('0x%016x', object_id)}>\n" \
    "@baseframe=#<#{shape}>\n" \
    "#{size} SubFrame#{pl(size)}: " \
    "[#{sizes_truncated}] in size#{pl(size)}.\n" \
    "---\n#{_to_s(limit: limit, with_id: true)}"
end

#map {|dataframe| ... } ⇒ SubFrames Also known as: collect

Returns a SubFrames containing DataFrames returned by the block.

Returns SubFrames

Use ‘#each.map` if you want to get DataFrames by Array.

Returns an Enumerator with no block given.

Examples:

Map as it is.

subframes.map { _1 }

# This will create a new SubFrame and a new baseframe,
# But each element DataFrames are re-used.
# =>
#<RedAmber::SubFrames : 0x000000000001e6cc>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000001e6e0>
3 SubFrames: [2, 3, 1] in sizes.
---
#<RedAmber::DataFrame : 2 x 3 Vectors, 0x00000000000135c4>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000135d8>
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       4 B        (nil)
2       5 B        true
---
#<RedAmber::DataFrame : 1 x 3 Vectors, 0x00000000000135ec>
        x y        z
  <uint8> <string> <boolean>
0       6 C        false

Assign a new column.

subframes.map { |df| df.assign(x_plus1: df[:x] + 1) }

# =>
#<RedAmber::SubFrames : 0x0000000000040948>
@baseframe=#<RedAmber::DataFrame : 6 x 4 Vectors, 0x000000000004095c>
3 SubFrames: [2, 3, 1] in sizes.
---
#<RedAmber::DataFrame : 2 x 4 Vectors, 0x0000000000040970>
        x y        z         x_plus1
  <uint8> <string> <boolean> <uint8>
0       1 A        false           2
1       2 A        true            3
---
#<RedAmber::DataFrame : 3 x 4 Vectors, 0x0000000000040984>
        x y        z         x_plus1
  <uint8> <string> <boolean> <uint8>
0       3 B        false           4
1       4 B        (nil)           5
2       5 B        true            6
---
#<RedAmber::DataFrame : 1 x 4 Vectors, 0x0000000000040998>
        x y        z         x_plus1
  <uint8> <string> <boolean> <uint8>
0       6 C        false           7

Yield Parameters:

  • dataframe (DataFrame)

    gives each element.

Yield Returns:

  • (Array<DataFrame>)

    the block should return DataFrames with same schema.

Returns:

Since:

  • 0.4.0



657
# File 'lib/red_amber/subframes.rb', line 657

define_subframable_method :map

#offset_indicesArray<Integer>

Indices at the top of each sub DataFrames.

Examples:

When ‘sizes` is [2, 3, 1].

subframes.offset_indices # => [0, 2, 5]

Returns:

  • (Array<Integer>)

    indices of offset of each sub DataFrames.

Since:

  • 0.4.0



1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
# File 'lib/red_amber/subframes.rb', line 1003

def offset_indices
  case @selectors
  when Filters
    @selectors.selectors.map do |selector|
      selector.each.with_index.find { |x, _| x }[1]
    end
  else # Indices, nil
    sum = 0
    sizes.map do |size|
      sum += size
      sum - size
    end
  end
end

#reject {|dataframe| ... } ⇒ SubFrames

Returns a SubFrames containing DataFrames rejected by the block.

With a block given, calls the block with successive DataFrames; returns a SubFrames of those DataFrames for which the block returns nil or false.

Returns SubFrames

Use ‘#each.reject` if you want to get DataFrames by Array.

Returns an Enumerator with no block given.

Examples:

Reject all.

subframes.reject { true }

# =>
#<RedAmber::SubFrames : 0x00000000000238c0>
@baseframe=#<RedAmber::DataFrame : (empty), 0x00000000000238d4>
0 SubFrame: [] in size.
---

Reject nothing.

subframes.reject { false }

# =>
#<RedAmber::SubFrames : 0x0000000000003a84>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x0000000000003a98>
3 SubFrames: [2, 3, 1] in sizes.
---
#<RedAmber::DataFrame : 2 x 3 Vectors, 0x0000000000003a0c>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x0000000000003a20>
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       4 B        (nil)
2       5 B        true
---
#<RedAmber::DataFrame : 1 x 3 Vectors, 0x0000000000003a34>
        x y        z
  <uint8> <string> <boolean>
0       6 C        false

Reject if Vector ‘:z` has any true.

subframes.reject { |df| df[:z].any? }

# =>
#<RedAmber::SubFrames : 0x0000000000038d74>
@baseframe=#<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000001ad10>
1 SubFrame: [1] in size.
---
#<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000001ad10>
        x y        z
  <uint8> <string> <boolean>
0       6 C        false

Yield Parameters:

  • dataframe (DataFrame)

    gives each element.

Yield Returns:

  • (Array<DataFrame>)

    the block should return DataFrames with same schema.

Returns:

Since:

  • 0.4.0



907
# File 'lib/red_amber/subframes.rb', line 907

define_subframable_method :reject

#select {|dataframe| ... } ⇒ SubFrames Also known as: filter, find_all

Returns a SubFrames containing DataFrames selected by the block.

With a block given, calls the block with successive DataFrames; returns a SubFrames of those DataFrames for which the block returns a truthy value.

Returns SubFrames

Use ‘#each.select` if you want to get DataFrames by Array.

Returns an Enumerator with no block given.

Examples:

Select all.

subframes.select { true }

# =>
#<RedAmber::SubFrames : 0x0000000000003a84>
@baseframe=#<RedAmber::DataFrame : 6 x 3 Vectors, 0x0000000000003a98>
3 SubFrames: [2, 3, 1] in sizes.
---
#<RedAmber::DataFrame : 2 x 3 Vectors, 0x0000000000003a0c>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x0000000000003a20>
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       4 B        (nil)
2       5 B        true
---
#<RedAmber::DataFrame : 1 x 3 Vectors, 0x0000000000003a34>
        x y        z
  <uint8> <string> <boolean>
0       6 C        false

Select nothing.

subframes.select { false }

# =>
#<RedAmber::SubFrames : 0x00000000000238c0>
@baseframe=#<RedAmber::DataFrame : (empty), 0x00000000000238d4>
0 SubFrame: [] in size.
---

Select if Vector ‘:z` has any true.

subframes.select { |df| df[:z].any? }

# =>
#<RedAmber::SubFrames : 0x000000000000fba4>
@baseframe=#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000fbb8>
2 SubFrames: [2, 1] in sizes.
---
#<RedAmber::DataFrame : 2 x 3 Vectors, 0x0000000000003a0c>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
---
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x0000000000003a20>
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       4 B        (nil)
2       5 B        true

Yield Parameters:

  • dataframe (DataFrame)

    gives each element.

Yield Returns:

  • (Array<DataFrame>)

    the block should return DataFrames with same schema.

Returns:

Since:

  • 0.4.0



848
# File 'lib/red_amber/subframes.rb', line 848

define_subframable_method :select

#sizeInteger

Number of subsets.

Returns:

  • (Integer)

    number of subsets in self.

Since:

  • 0.4.0



971
972
973
974
975
976
977
978
# File 'lib/red_amber/subframes.rb', line 971

def size
  @size ||=
    if @selectors
      @selectors.size
    else
      @frames.size
    end
end

#sizesArray<Integer>

Size list of subsets.

Returns:

  • (Array<Integer>)

    sizes of sub DataFrames.

Since:

  • 0.4.0



986
987
988
989
990
991
992
993
# File 'lib/red_amber/subframes.rb', line 986

def sizes
  @sizes ||=
    if @selectors
      @selectors.sizes
    else
      @frames.map(&:size)
    end
end

#take(num) ⇒ SubFrames

Return 0…num sub-dataframes in self.

Parameters:

  • num (Integer, Float)

    num of sub-dataframes to pick up. ‘num“ must be positive or zero.

Returns:

  • (SubFrames)

    A new SubFrames. If n == 0, it returns empty SubFrames. If n >= size, it returns self.

Since:

  • 0.4.2



955
956
957
958
959
960
961
962
963
# File 'lib/red_amber/subframes.rb', line 955

def take(num)
  if num.zero?
    SubFrames.new(DataFrame.new, [])
  elsif num >= size
    self
  else
    SubFrames.by_dataframes(frames(num))
  end
end

#to_s(limit: 5) ⇒ String

Return string representation of self.

Examples:

df

# =>
#<RedAmber::DataFrame : 6 x 3 Vectors, 0x000000000000caa8>
        x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
2       3 B        false
3       4 B        (nil)
4       5 B        true
5       6 C        false

puts SubFrames.new(df, [[0, 1], [2, 3, 4], [5]])

# =>
  x y        z
  <uint8> <string> <boolean>
0       1 A        false
1       2 A        true
---
        x y        z
  <uint8> <string> <boolean>
0       3 B        false
1       4 B        (nil)
2       5 B        true
---
        x y        z
  <uint8> <string> <boolean>
0       6 C        false

Parameters:

  • limit (Integer) (defaults to: 5)

    maximum number of DataFrames to show.

Returns:

  • (String)

    return string representation of each sub DataFrame.

Since:

  • 0.4.0



1078
1079
1080
# File 'lib/red_amber/subframes.rb', line 1078

def to_s(limit: 5)
  _to_s(limit: limit)
end

#universal?true, false

Test if self has only one subset and it is comprehensive.

Returns:

  • (true, false)

    true if the only member of self is equal to universal DataFrame.

Since:

  • 0.4.0



1034
1035
1036
# File 'lib/red_amber/subframes.rb', line 1034

def universal?
  size == 1 && first == @baseframe
end