Module: RedAmber::DataFrameSelectable
- Included in:
- DataFrame
- Defined in:
- lib/red_amber/data_frame_selectable.rb
Overview
Mix-in for the class DataFrame
Instance Method Summary collapse
-
#[](*args) ⇒ Object
Select variables or records.
-
#filter(*booleans, &block) ⇒ Object
Select records by filtering with booleans to create a DataFrame.
-
#first(n_obs = 1) ⇒ DataFrame
Select records from the top.
-
#head(n_obs = 5) ⇒ DataFrame
Select records from the top.
-
#last(n_obs = 1) ⇒ DataFrame
Select records from the end.
-
#remove(*args, &block) ⇒ Object
Select records and remove them to create a remainer DataFrame.
-
#remove_nil ⇒ DataFrame
(also: #drop_nil)
Remove records (rows) contains any nil.
-
#sample(n_or_prop = nil) ⇒ Object
Select records randomly to create a DataFrame.
-
#shuffle ⇒ Object
Returns a DataFrame with shuffled rows.
-
#slice(*args, &block) ⇒ Object
Select records to create a DataFrame.
-
#slice_by(key, keep_key: false, &block) ⇒ Object
Select records by a column specified by a key and corresponding record with a block.
-
#tail(n_obs = 5) ⇒ DataFrame
Select records from the end.
-
#take(index_array) ⇒ DataFrame
private
Select records by index Array to create a DataFrame.
-
#v(key) ⇒ Vector
Select a variable by String or Symbol and return as a Vector.
Instance Method Details
#[](key) ⇒ Vector #[](keys) ⇒ DataFrame #[](index) ⇒ DataFrame #[](indices) ⇒ DataFrame #[](booleans) ⇒ DataFrame
Select variables or records.
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
# File 'lib/red_amber/data_frame_selectable.rb', line 132 def [](*args) raise DataFrameArgumentError, 'self is an empty dataframe' if empty? case args in [] | [nil] return remove_all_values in [(Symbol | String) => k] if key? k return variables[k.to_sym] in [Integer => i] return take([i.negative? ? i + size : i]) in [Vector => v] arrow_array = v.data in [(Arrow::Array | Arrow::ChunkedArray) => aa] arrow_array = aa else a = parse_args(args, size) return select_variables_by_keys(a) if a.symbol? return take(normalize_indices(Arrow::Array.new(a))) if a.integer? return remove_all_values if a.compact.empty? return filter_by_array(Arrow::BooleanArray.new(a)) if a.boolean? raise DataFrameArgumentError, "invalid arguments: #{args}" end return take(normalize_indices(arrow_array)) if arrow_array.numeric? return filter_by_array(arrow_array) if arrow_array.boolean? a = arrow_array.to_a return select_variables_by_keys(a) if a.symbol_or_string? raise DataFrameArgumentError, "invalid arguments: #{args}" end |
#filter(booleans) ⇒ DataFrame #filter {|self| ... } ⇒ DataFrame
Select records by filtering with booleans to create a DataFrame.
563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 |
# File 'lib/red_amber/data_frame_selectable.rb', line 563 def filter(*booleans, &block) booleans.flatten! raise DataFrameArgumentError, 'Self is an empty dataframe' if empty? if block unless booleans.empty? raise DataFrameArgumentError, 'Must not specify both arguments and block.' end booleans = [instance_eval(&block)] end case booleans in [] | [[]] return remove_all_values in [Vector => v] if v.boolean? filter_by_array(v.data) in [Arrow::ChunkedArray => ca] if ca.boolean? filter_by_array(ca) in [Arrow::BooleanArray => b] filter_by_array(b) else a = Arrow::Array.new(parse_args(booleans, size)) unless a.boolean? raise DataFrameArgumentError, "not a boolean filter: #{booleans}" end filter_by_array(a) end end |
#first(n_obs = 1) ⇒ DataFrame
Select records from the top.
825 826 827 |
# File 'lib/red_amber/data_frame_selectable.rb', line 825 def first(n_obs = 1) head(n_obs) end |
#head(n_obs = 5) ⇒ DataFrame
Select records from the top.
801 802 803 804 805 |
# File 'lib/red_amber/data_frame_selectable.rb', line 801 def head(n_obs = 5) raise DataFrameArgumentError, "Index is out of range #{n_obs}" if n_obs.negative? self[0...[n_obs, size].min] end |
#last(n_obs = 1) ⇒ DataFrame
Select records from the end.
835 836 837 |
# File 'lib/red_amber/data_frame_selectable.rb', line 835 def last(n_obs = 1) tail(n_obs) end |
#remove(row) ⇒ DataFrame #remove(rows) ⇒ DataFrame #remove {|self| ... } ⇒ DataFrame #remove(booleans) ⇒ DataFrame #remove {|self| ... } ⇒ DataFrame
Select records and remove them to create a remainer DataFrame.
731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 |
# File 'lib/red_amber/data_frame_selectable.rb', line 731 def remove(*args, &block) raise DataFrameArgumentError, 'Self is an empty dataframe' if empty? if block unless args.empty? raise DataFrameArgumentError, 'Must not specify both arguments and block.' end args = [instance_eval(&block)] end arrow_array = case args in [] | [[]] | [nil] return self in [Vector => v] v.data in [(Arrow::Array | Arrow::ChunkedArray) => aa] aa else Arrow::Array.new(parse_args(args, size)) end if arrow_array.boolean? filter_by_array(arrow_array.primitive_invert) elsif arrow_array.numeric? remover = normalize_indices(arrow_array).to_a return self if remover.empty? slicer = indices.to_a - remover.map(&:to_i) return remove_all_values if slicer.empty? take(slicer) else raise DataFrameArgumentError, "Invalid argument #{args}" end end |
#remove_nil ⇒ DataFrame Also known as: drop_nil
Remove records (rows) contains any nil.
789 790 791 792 |
# File 'lib/red_amber/data_frame_selectable.rb', line 789 def remove_nil func = Arrow::Function.find(:drop_null) DataFrame.create(func.execute([table]).value) end |
#sample ⇒ DataFrame #sample(n) ⇒ DataFrame #sample(prop) ⇒ Vector
This method requires ‘arrow-numo-narray’ gem.
Select records randomly to create a DataFrame.
This method calls `indices.sample`.
We can use the same arguments in `Vector#sample`.
873 874 875 |
# File 'lib/red_amber/data_frame_selectable.rb', line 873 def sample(n_or_prop = nil) slice { indices.sample(n_or_prop) } end |
#shuffle ⇒ Object
This method requires ‘arrow-numo-narray’ gem.
Same behavior as ‘DataFrame#sample(1.0)`
Returns a DataFrame with shuffled rows.
884 885 886 |
# File 'lib/red_amber/data_frame_selectable.rb', line 884 def shuffle sample(1.0) end |
#slice(row) ⇒ DataFrame #slice(rows) ⇒ DataFrame #slice(enumerator) ⇒ DataFrame #slice {|self| ... } ⇒ DataFrame #slice(booleans) ⇒ DataFrame #slice {|self| ... } ⇒ DataFrame
Select records to create a DataFrame.
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 |
# File 'lib/red_amber/data_frame_selectable.rb', line 351 def slice(*args, &block) raise DataFrameArgumentError, 'Self is an empty dataframe' if empty? if block unless args.empty? raise DataFrameArgumentError, 'Must not specify both arguments and block.' end args = [instance_eval(&block)] end arrow_array = case args in [] | [[]] return remove_all_values in [Vector => v] v.data in [(Arrow::Array | Arrow::ChunkedArray) => aa] aa else Arrow::Array.new(parse_args(args, size)) end if arrow_array.numeric? take(normalize_indices(arrow_array)) elsif arrow_array.boolean? filter_by_array(arrow_array) elsif arrow_array.to_a.compact.empty? # Ruby 3.0.4 does not accept Arrow::Array#compact here. 2.7.6 and 3.1.2 is OK. remove_all_values else raise DataFrameArgumentError, "invalid arguments: #{args}" end end |
#slice_by(key) {|self| ... } ⇒ DataFrame #slice_by(key) {|self| ... } ⇒ DataFrame
Select records by a column specified by a key and corresponding record with a block.
453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 |
# File 'lib/red_amber/data_frame_selectable.rb', line 453 def slice_by(key, keep_key: false, &block) raise DataFrameArgumentError, 'Self is an empty dataframe' if empty? raise DataFrameArgumentError, 'No block given' unless block raise DataFrameArgumentError, "#{key} is not a key of self" unless key?(key) return self if key.nil? slicer = instance_eval(&block) return DataFrame.new unless slicer if slicer.is_a?(Range) from = slicer.begin from = if from.is_a?(String) self[key].index(from) elsif from.nil? 0 elsif from < 0 size + from else from end to = slicer.end to = if to.is_a?(String) self[key].index(to) elsif to.nil? size - 1 elsif to < 0 size + to else to end slicer = (from..to).to_a else slicer = slicer.map { |x| x.is_a?(String) ? self[key].index(x) : x } end taken = take(normalize_indices(Arrow::Array.new(slicer))) keep_key ? taken : taken.drop(key) end |
#tail(n_obs = 5) ⇒ DataFrame
Select records from the end.
813 814 815 816 817 |
# File 'lib/red_amber/data_frame_selectable.rb', line 813 def tail(n_obs = 5) raise DataFrameArgumentError, "Index is out of range #{n_obs}" if n_obs.negative? self[-[n_obs, size].min..] end |
#take(index_array) ⇒ DataFrame
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Select records by index Array to create a DataFrame.
-
TODO: support for option ‘boundscheck: true`
-
Supports indices in an Arrow::UInt8, UInt16, Uint32, Uint64 or an Array
-
Negative index is not supported.
900 901 902 |
# File 'lib/red_amber/data_frame_selectable.rb', line 900 def take(index_array) DataFrame.create(@table.take(index_array)) end |
#v(key) ⇒ Vector
#v(key) is faster then #[](key).
Select a variable by String or Symbol and return as a Vector.
179 180 181 182 183 |
# File 'lib/red_amber/data_frame_selectable.rb', line 179 def v(key) raise DataFrameArgumentError, "Key does not exist: [#{key}]" unless key?(key) variables[key.to_sym] end |