Module: RedAmber::DataFrameVariableOperation

Included in:
DataFrame
Defined in:
lib/red_amber/data_frame_variable_operation.rb

Overview

Mix-in for the class DataFrame

Instance Method Summary collapse

Instance Method Details

#assign(key_value_pairs) ⇒ DataFrame #assign {|self| ... } ⇒ DataFrame #assign(keys) {|self| ... } ⇒ DataFrame

Assign new or updated variables (columns) and create an updated DataFrame.

  • Array-like variables with new keys will append new columns from right.

  • Array-like variables with exisiting keys will update corresponding vectors.

  • Symbol key and String key are considered as the same key.

  • If assigner is empty or nil, returns self.

Overloads:

  • #assign(key_value_pairs) ⇒ DataFrame

    accepts pairs of key and values by an Array or a Hash.

    Examples:

    Assign a new column

    comecome
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x00000000000280dc>
      name         age
      <string> <uint8>
    0 Yasuko        68
    1 Rui           49
    2 Hinata        28
    
    brothers = ['Santa', nil, 'Momotaro']
    comecome.assign(brother: brothers)
    # or
    comecome.assign({ brother: brothers })
    # or
    comecome.assign(:brother, brothers)
    # or
    comecome.assign([:brother, brothers])
    
    # =>
    #<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000004077c>
      name         age brother
      <string> <uint8> <string>
    0 Yasuko        68 Santa
    1 Rui           49 (nil)
    2 Hinata        28 Momotaro

    Assign new data for a existing column

    comecome.assign(age: comecome[:age] + 29)
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000065860>
      name         age
      <string> <uint8>
    0 Yasuko        97
    1 Rui           78
    2 Hinata        57

    Parameters:

    • key_value_pairs (Array<key, array_like>, Hash{key => array_like})

      ‘key` must be a Symbol or a String. `array_like` is column data to be assigned. It must be one of `Vector` or `Arrow::Array` or `Array`.

    Returns:

  • #assign {|self| ... } ⇒ DataFrame

    accepts block yielding pairs of key and values.

    Examples:

    Assign new data for a existing column by block

    comecome.assign { { age: age + 29 } }
    # or
    comecome.assign { [:age, age + 29] }
    # or
    comecome.assign { [[:age, age + 29]] }
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000007d640>
      name         age
      <string> <uint8>
    0 Yasuko        97
    1 Rui           78
    2 Hinata        57

    Yields:

    • (self)

      the block is called within the context of self. (Block is called by instance_eval(&block). )

    Yield Returns:

    • (Array<key, array_like>, Hash(key => array_like))

      ‘key` must be a Symbol or a String. `array_like` is column data to be assigned. It must be one of `Vector` or `Arrow::Array` or `Array`.

    Returns:

  • #assign(keys) {|self| ... } ⇒ DataFrame

    accepts keys from argument and pairs of key and values from block.

    Examples:

    Assign new data for a existing column by block

    comecome.assign(:age) { age + 29 }
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000007af94>
      name         age
      <string> <uint8>
    0 Yasuko        97
    1 Rui           78
    2 Hinata        57

    Assign multiple data

    comecome.assign(:age_in_1993, :brother) do
      [
        age + 29,
        ['Santa', nil, 'Momotaro'],
      ]
    end
    
    # =>
    #<RedAmber::DataFrame : 3 x 4 Vectors, 0x00000000000b363c>
      name         age age_in_1993 brother
      <string> <uint8>     <uint8> <string>
    0 Yasuko        68          97 Santa
    1 Rui           49          78 (nil)
    2 Hinata        28          57 Momotaro

    Parameters:

    • keys (Symbol, String)

      keys of columns to create or update.

    Yields:

    • (self)

      the block is called within the context of self. (Block is called by instance_eval(&block).)

    Yield Returns:

    • (Array<array_like>)

      column data to be assigned. ‘array_like` must be one of `Vector` or `Arrow::Array` or `Array`.

    Returns:



515
516
517
# File 'lib/red_amber/data_frame_variable_operation.rb', line 515

def assign(...)
  assign_update(false, ...)
end

#assign_left(key_value_pairs) ⇒ DataFrame #assign_left {|self| ... } ⇒ DataFrame #assign_left(keys) {|self| ... } ⇒ DataFrame

Assign new or updated variables (columns) and create an updated DataFrame.

  • Array-like variables with new keys will append new columns from left.

  • Array-like variables with exisiting keys will update corresponding vectors.

  • Symbol key and String key are considered as the same key.

  • If assigner is empty or nil, returns self.

Overloads:

  • #assign_left(key_value_pairs) ⇒ DataFrame

    accepts pairs of key and values by an Array or a Hash.

    Examples:

    Assign a new column from left

    df
    
    # =>
    #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000c10c>
        index    float string
      <uint8> <double> <string>
    0       0      0.0 A
    1       1      1.1 B
    2       2      2.2 C
    3       3      NaN D
    4   (nil)    (nil) (nil)
    
    df.assign_left(new_index: df.indices(1))
    
    # =>
    #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000001787c>
      new_index   index    float string
        <uint8> <uint8> <double> <string>
    0         1       0      0.0 A
    1         2       1      1.1 B
    2         3       2      2.2 C
    3         4       3      NaN D
    4         5   (nil)    (nil) (nil)

    Parameters:

    • key_value_pairs (Array<key, array_like>, Hash{key => array_like})

      ‘key` must be a Symbol or a String. `array_like` is column data to be assigned. It must be one of `Vector` or `Arrow::Array` or `Array`.

    Returns:

  • #assign_left {|self| ... } ⇒ DataFrame

    accepts block yielding pairs of key and values.

    Yields:

    • (self)

      the block is called within the context of self. (Block is called by instance_eval(&block). )

    Yield Returns:

    • (Array<key, array_like>, Hash(key => array_like))

      ‘key` must be a Symbol or a String. `array_like` is column data to be assigned. It must be one of `Vector` or `Arrow::Array` or `Array`.

    Returns:

  • #assign_left(keys) {|self| ... } ⇒ DataFrame

    accepts keys from argument and pairs of key and values from block.

    Parameters:

    • keys (Symbol, String)

      keys of columns to create or update.

    Yields:

    • (self)

      the block is called within the context of self. (Block is called by instance_eval(&block).)

    Yield Returns:

    • (Array<array_like>)

      column data to be assigned. ‘array_like` must be one of `Vector` or `Arrow::Array` or `Array`.

    Returns:



586
587
588
# File 'lib/red_amber/data_frame_variable_operation.rb', line 586

def assign_left(...)
  assign_update(true, ...)
end

#drop(keys) ⇒ DataFrame #drop(booleans) ⇒ DataFrame #drop(indices) ⇒ DataFrame #drop {|self| ... } ⇒ DataFrame

Note:

DataFrame#drop creates a DataFrame even if it is a single column (not a Vector).

Drop off some variables (columns) to create a remainer DataFrame.

Overloads:

  • #drop(keys) ⇒ DataFrame

    Drop off variables by Symbol(s) or String(s).

    Examples:

    Drop off by a key

    languages
    
    # =>
    #<RedAmber::DataFrame : 4 x 3 Vectors, 0x00000000000cfd8c>
      Language Creator                         Released
      <string> <string>                        <uint16>
    0 Ruby     Yukihiro Matsumoto                  1995
    1 Python   Guido van Rossum                    1991
    2 R        Ross Ihaka and Robert Gentleman     1993
    3 Rust     Graydon Hoare                       2001
    
    languages.drop(:Language)
    
    # =>
    #<RedAmber::DataFrame : 4 x 2 Vectors, 0x000000000005805c>
      Creator                         Released
      <string>                        <uint16>
    0 Yukihiro Matsumoto                  1995
    1 Guido van Rossum                    1991
    2 Ross Ihaka and Robert Gentleman     1993
    3 Graydon Hoare                       2001

    Parameters:

    • keys (Symbol, String, <Symbol, String>)

      key name(s) of variables to drop.

    Returns:

  • #drop(booleans) ⇒ DataFrame

    Drop off variables by booleans.

    Examples:

    Drop off by booleans

    is_numeric = languages.vectors.map(&:numeric?) # [nil, nil, true]
    languages.drop(is_numeric)
    
    # =>
    #<RedAmber::DataFrame : 4 x 2 Vectors, 0x0000000000066a1c>
    Language Creator
    <string> <string>
    0 Ruby     Yukihiro Matsumoto
    1 Python   Guido van Rossum
    2 R        Ross Ihaka and Robert Gentleman
    3 Rust     Graydon Hoare

    Parameters:

    • booleans (<Booleans, nil>, Vector)

      boolean array or vector of variables to drop at true.

    Returns:

  • #drop(indices) ⇒ DataFrame

    Drop off variables by column indices.

    Examples:

    Drop off by indices

    languages.drop(2)
    
    # =>
    #<RedAmber::DataFrame : 4 x 2 Vectors, 0x0000000000066a1c>
    Language Creator
    <string> <string>
    0 Ruby     Yukihiro Matsumoto
    1 Python   Guido van Rossum
    2 R        Ross Ihaka and Robert Gentleman
    3 Rust     Graydon Hoare

    Parameters:

    • indices (Integer, Float, Range<Integer>, Vector, Arrow::Array)

      numeric array of variables to drop by column index.

    Returns:

  • #drop {|self| ... } ⇒ DataFrame
    Note:

    Arguments and a block cannot be used simultaneously.

    Drop off variables by the yielded value from the block.

    Examples:

    Drop off by a block.

    # same as languages.drop { |df| df.vectors.map(&:numeric?) }
    languages.drop { vectors.map(&:numeric?) }
    
    # =>
    #<RedAmber::DataFrame : 4 x 2 Vectors, 0x0000000000154104>
      Language Creator
      <string> <string>
    0 Ruby     Yukihiro Matsumoto
    1 Python   Guido van Rossum
    2 R        Ross Ihaka and Robert Gentleman
    3 Rust     Graydon Hoare

    Yields:

    • (self)

      the block is called within the context of self. (Block is called by instance_eval(&block). )

    Yield Returns:

    • (keys, booleans, indices)

      returns keys, booleans or indices just same as arguments.

    Returns:



251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
# File 'lib/red_amber/data_frame_variable_operation.rb', line 251

def drop(*args, &block)
  if block
    unless args.empty?
      raise DataFrameArgumentError, 'Must not specify both arguments and block.'
    end

    args = [instance_eval(&block)]
  end
  return self if args.compact.empty? || empty?

  picker =
    if args.symbol?
      keys - args
    elsif args.boolean?
      keys.reject_by_booleans(args)
    elsif args.integer?
      keys.reject_by_indices(args)
    else
      dropper = parse_args(args, n_keys)
      if dropper.compact.empty?
        return self
      elsif dropper.boolean?
        keys.reject_by_booleans(dropper)
      elsif dropper.symbol?
        keys - dropper
      else
        dropper.compact!
        unless dropper.integer?
          raise DataFrameArgumentError, "Invalid argument #{args}"
        end

        keys.reject_by_indices(dropper)
      end
    end

  return DataFrame.new if picker.empty?

  DataFrame.create(@table.select_columns(*picker))
end

#pick(keys) ⇒ DataFrame #pick(booleans) ⇒ DataFrame #pick(indices) ⇒ DataFrame #pick {|self| ... } ⇒ DataFrame

Note:

if a single key is specified, DataFrame#pick generates a DataFrame. On the other hand, DataFrame#[] generates a Vector.

Select variables (columns) to create a new DataFrame.

Overloads:

  • #pick(keys) ⇒ DataFrame

    Pick up variables by Symbol(s) or String(s).

    Examples:

    Pick up by a key

    languages
    
    # =>
    #<RedAmber::DataFrame : 4 x 3 Vectors, 0x00000000000cfd8c>
      Language Creator                         Released
      <string> <string>                        <uint16>
    0 Ruby     Yukihiro Matsumoto                  1995
    1 Python   Guido van Rossum                    1991
    2 R        Ross Ihaka and Robert Gentleman     1993
    3 Rust     Graydon Hoare                       2001
    
    languages.pick(:Language)
    
    # =>
    #<RedAmber::DataFrame : 4 x 1 Vector, 0x0000000000113d20>
      Language
      <string>
    0 Ruby
    1 Python
    2 R
    3 Rust
    
    languages[:Language]
    
    # =>
    #<RedAmber::Vector(:string, size=4, chunked):0x000000000010359c>
    ["Ruby", "Python", "R", "Rust"]

    Parameters:

    • keys (Symbol, String, <Symbol, String>)

      key name(s) of variables to pick.

    Returns:

  • #pick(booleans) ⇒ DataFrame

    Pick up variables by booleans.

    Examples:

    Pick up by booleans

    languages.pick(true, true, false)
    
    # =>
    #<RedAmber::DataFrame : 4 x 2 Vectors, 0x0000000000066a1c>
    Language Creator
    <string> <string>
    0 Ruby     Yukihiro Matsumoto
    1 Python   Guido van Rossum
    2 R        Ross Ihaka and Robert Gentleman
    3 Rust     Graydon Hoare
    
    is_string = languages.vectors.map(&:string?) # [true, true, false]
    languages.pick(is_string)
    # =>
    (same as above)

    Parameters:

    • booleans (<Booleans, nil>, Vector)

      boolean array or vecctor to pick up variables at true.

    Returns:

  • #pick(indices) ⇒ DataFrame

    Pick up variables by column indices.

    Examples:

    Pick up by indices

    languages.pick(0, 2, 1)
    
    # =>
    #<RedAmber::DataFrame : 4 x 3 Vectors, 0x000000000011cfb0>
      Language Released Creator
      <string> <uint16> <string>
    0 Ruby         1995 Yukihiro Matsumoto
    1 Python       1991 Guido van Rossum
    2 R            1993 Ross Ihaka and Robert Gentleman
    3 Rust         2001 Graydon Hoare

    Parameters:

    • indices (Integer, Float, Range<Integer>, Vector, Arrow::Array)

      numeric array to pick up variables by column index.

    Returns:

  • #pick {|self| ... } ⇒ DataFrame
    Note:

    Arguments and a block cannot be used simultaneously.

    Pick up variables by the yielded value from the block.

    Examples:

    Pick up by a block.

    # same as languages.pick { |df| df.languages.vectors.map(&:string?) }
    languages.pick { languages.vectors.map(&:string?) }
    
    # =>
    #<RedAmber::DataFrame : 4 x 2 Vectors, 0x0000000000154104>
      Language Creator
      <string> <string>
    0 Ruby     Yukihiro Matsumoto
    1 Python   Guido van Rossum
    2 R        Ross Ihaka and Robert Gentleman
    3 Rust     Graydon Hoare

    Yields:

    • (self)

      the block is called within the context of self. (Block is called by instance_eval(&block). )

    Yield Returns:

    • (keys, booleans, indices)

      returns keys, booleans or indices just same as arguments.

    Returns:

Raises:



117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# File 'lib/red_amber/data_frame_variable_operation.rb', line 117

def pick(*args, &block)
  if block
    unless args.empty?
      raise DataFrameArgumentError, 'Must not specify both arguments and block.'
    end

    args = [instance_eval(&block)]
  end

  case args
  in [] | [nil]
    return DataFrame.new
  in [*] if args.symbol?
    return DataFrame.create(@table.select_columns(*args))
  in [*] if args.boolean?
    picker = keys.select_by_booleans(args)
    return DataFrame.create(@table.select_columns(*picker))
  in [(Vector | Arrow::Array | Arrow::ChunkedArray) => a]
    picker = a.to_a
  else
    picker = parse_args(args, n_keys)
  end

  return DataFrame.new if picker.compact.empty?

  if picker.boolean?
    picker = keys.select_by_booleans(picker)
    return DataFrame.create(@table.select_columns(*picker))
  end
  picker.compact!
  raise DataFrameArgumentError, "some keys are duplicated: #{args}" if picker.uniq!

  return self if picker == keys

  DataFrame.create(@table.select_columns(*picker))
end

#rename(key_pairs) ⇒ DataFrame #rename(key_pairs) ⇒ DataFrame #rename {|self| ... } ⇒ DataFrame

rename keys (variable/column names) to create a updated DataFrame.

Overloads:

  • #rename(key_pairs) ⇒ DataFrame

    Rename by key pairs as a Hash.

    Examples:

    Rename by a Hash

    comecome
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x00000000000037b4>
      name         age
      <string> <uint8>
    0 Yasuko        68
    1 Rui           49
    2 Hinata        28
    
    comecome.rename(:age => :age_in_1993)
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x00000000000037c8>
      name     age_in_1993
      <string>     <uint8>
    0 Yasuko            68
    1 Rui               49
    2 Hinata            28

    Parameters:

    • key_pairs (Hash{existing_key => new_key})

      key pair(s) of existing name and new name.

    Returns:

  • #rename(key_pairs) ⇒ DataFrame

    Rename by key pairs as an Array of Array.

    Examples:

    Rename by an Array

    renamer = [[:name, :heroine], [:age, :age_in_1993]]
    comecome.rename(renamer)
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x00000000000037dc>
      heroine  age_in_1993
      <string>     <uint8>
    0 Yasuko            68
    1 Rui               49
    2 Hinata            28

    Parameters:

    • key_pairs (<Array[existing_key, new_key]>)

      key pair(s) of existing name and new name.

    Returns:

  • #rename {|self| ... } ⇒ DataFrame

    Rename by key pairs yielding from block.

    Examples:

    Rename by block.

    df
    
    # =>
    #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000c29c>
            X       Y       Z
      <uint8> <uint8> <uint8>
    0       1       3       5
    1       2       4       6
    
    df.rename { keys.zip(keys.map(&:downcase)) }
    # or
    df.rename { [keys, keys.map(&:downcase)].transpose }
    
    # =>
    #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000c364>
            x       y       z
      <uint8> <uint8> <uint8>
    0       1       3       5
    1       2       4       6

    Yields:

    • (self)

      the block is called within the context of self. (Block is called by instance_eval(&block). )

    Yield Returns:

    • (<[existing_key, new_key]>, Hash)

      returns an Array or a Hash just same as arguments.

    Returns:



370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
# File 'lib/red_amber/data_frame_variable_operation.rb', line 370

def rename(*renamer, &block)
  if block
    unless renamer.empty?
      raise DataFrameArgumentError, 'Must not specify both arguments and a block'
    end

    renamer = [instance_eval(&block)]
  end
  case renamer
  in [] | [nil] | [{}] | [[]]
    return self
  in [Hash => key_pairs]
  # noop
  in [ (Symbol | String) => from, (Symbol | String) => to]
    key_pairs = { from => to }
  in [Array => array_in_array]
    key_pairs = try_convert_to_hash(array_in_array)
  in [Array, *] => array_in_array1
    key_pairs = try_convert_to_hash(array_in_array1)
  else
    raise DataFrameArgumentError, "Invalid argument #{renamer}"
  end
  rename_by_hash(key_pairs)
end