Module: RedAmber::DataFrameCombinable
- Included in:
- DataFrame
- Defined in:
- lib/red_amber/data_frame_combinable.rb
Overview
Mix-in for the class DataFrame
Instance Method Summary collapse
-
#anti_join(other, join_keys = nil, suffix: '.1', force_order: true) ⇒ Object
Return records of self that do not have a match in other.
-
#concatenate(*other) ⇒ DataFrame
(also: #concat, #bind_rows)
Concatenate other dataframes or tables onto the bottom of self.
-
#difference(other) ⇒ DataFrame
(also: #setdiff)
Select records appearing in self but not in other.
-
#full_join(other, join_keys = nil, suffix: '.1', force_order: true) ⇒ Object
(also: #outer_join)
Join another DataFrame or Table, leaving all records.
-
#inner_join(other, join_keys = nil, suffix: '.1', force_order: true) ⇒ Object
Join another DataFrame or Table, leaving only the matching records.
-
#intersect(other) ⇒ DataFrame
Select records appearing in both self and other.
- #join(other, join_keys = nil, type: :inner, suffix: '.1', force_order: false) ⇒ Object
-
#left_join(other, join_keys = nil, suffix: '.1', force_order: true) ⇒ Object
Join matching values to self from other.
-
#merge(*other) ⇒ DataFrame
(also: #bind_cols)
Merge other DataFrames or Tables.
-
#right_join(other, join_keys = nil, suffix: '.1', force_order: true) ⇒ Object
Join matching values from self to other.
-
#semi_join(other, join_keys = nil, suffix: '.1', force_order: true) ⇒ Object
Return records of self that have a match in other.
-
#set_operable?(other) ⇒ Boolean
Check if set operation with self and other is possible.
-
#union(other) ⇒ DataFrame
Select records appearing in self or other.
Instance Method Details
#anti_join(other, suffix: '.1', force_order: true) ⇒ DataFrame #anti_join(other, join_keys, suffix: '.1', force_order: true) ⇒ DataFrame #anti_join(other, join_key_pairs, suffix: '.1', force_order: true) ⇒ DataFrame
the order of joined results will be preserved by default. This is enabled by appending index column to sort after joining but it will cause some performance degradation. If you don’t matter the order of the result, set ‘force_order` option to `false`.
Return records of self that do not have a match in other.
-
Same as ‘#join` with `type: :left_anti`
-
A kind of filtering join.
620 621 622 |
# File 'lib/red_amber/data_frame_combinable.rb', line 620 def anti_join(other, join_keys = nil, suffix: '.1', force_order: true) join(other, join_keys, type: :left_anti, suffix: suffix, force_order: force_order) end |
#concatenate(*other) ⇒ DataFrame Also known as: concat, bind_rows
the ‘#types` must be same as `other#types`.
Concatenate other dataframes or tables onto the bottom of self.
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'lib/red_amber/data_frame_combinable.rb', line 36 def concatenate(*other) case other in [] | [nil] | [[]] return self in [Array => array] # Nop else array = other end table_array = array.map do |e| case e when Arrow::Table e when DataFrame e.table else raise DataFrameArgumentError, "#{e} is not a Table or a DataFrame" end end DataFrame.create(table.concatenate(table_array)) end |
#difference(other) ⇒ DataFrame Also known as: setdiff
Select records appearing in self but not in other.
-
Same as ‘#join` with `type: :left_anti` when keys in self are same with other.
-
A kind of set operations.
724 725 726 727 728 729 730 |
# File 'lib/red_amber/data_frame_combinable.rb', line 724 def difference(other) unless keys == other.keys.map(&:to_sym) raise DataFrameArgumentError, 'keys are not same with self and other' end join(other, keys, type: :left_anti) end |
#full_join(other, suffix: '.1', force_order: true) ⇒ DataFrame #full_join(other, join_keys, suffix: '.1', force_order: true) ⇒ DataFrame #full_join(other, join_key_pairs, suffix: '.1', force_order: true) ⇒ DataFrame Also known as: outer_join
the order of joined results will be preserved by default. This is enabled by appending index column to sort after joining but it will cause some performance degradation. If you don’t matter the order of the result, set ‘force_order` option to `false`.
Join another DataFrame or Table, leaving all records.
-
Same as ‘#join` with `type: :full_outer`
-
A kind of mutating join.
350 351 352 353 |
# File 'lib/red_amber/data_frame_combinable.rb', line 350 def full_join(other, join_keys = nil, suffix: '.1', force_order: true) join(other, join_keys, type: :full_outer, suffix: suffix, force_order: force_order) end |
#inner_join(other, suffix: '.1', force_order: true) ⇒ DataFrame #inner_join(other, join_keys, suffix: '.1', force_order: true) ⇒ DataFrame #inner_join(other, join_key_pairs, suffix: '.1', force_order: true) ⇒ DataFrame
the order of joined results will be preserved by default. This is enabled by appending index column to sort after joining but it will cause some performance degradation. If you don’t matter the order of the result, set ‘force_order` option to `false`.
Join another DataFrame or Table, leaving only the matching records.
-
Same as ‘#join` with `type: :inner`
-
A kind of mutating join.
280 281 282 |
# File 'lib/red_amber/data_frame_combinable.rb', line 280 def inner_join(other, join_keys = nil, suffix: '.1', force_order: true) join(other, join_keys, type: :inner, suffix: suffix, force_order: force_order) end |
#intersect(other) ⇒ DataFrame
Select records appearing in both self and other.
-
Same as ‘#join` with `type: :inner` when keys in self are same with other.
-
A kind of set operations.
659 660 661 662 663 664 665 |
# File 'lib/red_amber/data_frame_combinable.rb', line 659 def intersect(other) unless keys == other.keys.map(&:to_sym) raise DataFrameArgumentError, 'keys are not same with self and other' end join(other, keys, type: :inner) end |
#join(other, type: :inner, suffix: '.1', force_order: false) ⇒ DataFrame #join(other, join_keys, type: :inner, suffix: '.1', force_order: false) ⇒ DataFrame #join ⇒ DataFrame
the order of joined results may not be preserved by default. if you prefer to preserve the order of the result, set ‘force_order` option to `true`. This is enabled by appending index column to sort after joining so it will cause some performance degradation.
862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 |
# File 'lib/red_amber/data_frame_combinable.rb', line 862 def join(other, join_keys = nil, type: :inner, suffix: '.1', force_order: false) left_table = table right_table = case other when DataFrame other.table when Arrow::Table other else raise DataFrameArgumentError, 'other must be a DataFrame or an Arrow::Table' end if force_order left_index = :__LEFT_INDEX__ right_index = :__RIGHT_INDEX__ left_table = assign(left_index) { indices }.table other = DataFrame.create(other) if other.is_a?(Arrow::Table) right_table = other.assign(right_index) { indices }.table end left_table_keys = ensure_keys(left_table.keys) right_table_keys = ensure_keys(right_table.keys) # natural keys (implicit common keys) join_keys ||= left_table_keys.intersection(right_table_keys) type = Arrow::JoinType.try_convert(type) || type type_nick = type.nick plan = Arrow::ExecutePlan.new left_node = plan.build_source_node(left_table) right_node = plan.build_source_node(right_table) if join_keys.is_a?(Hash) left_keys = ensure_keys(join_keys[:left]) right_keys = ensure_keys(join_keys[:right]) else left_keys = ensure_keys(join_keys) right_keys = left_keys end context = [type_nick, left_table_keys, right_table_keys, left_keys, right_keys, suffix] = Arrow::HashJoinNodeOptions.new(type, left_keys, right_keys) case type_nick when 'inner', 'left-outer' .left_outputs = left_table_keys .right_outputs = right_table_keys - right_keys when 'right-outer' .left_outputs = left_table_keys - left_keys .right_outputs = right_table_keys end hash_join_node = plan.build_hash_join_node(left_node, right_node, ) merge_node = merge_keys(plan, hash_join_node, context) rename_node = rename_keys(plan, merge_node, context) joined_table = sink_and_start_plan(plan, rename_node) df = DataFrame.create(joined_table) if force_order sorter = case type_nick when 'right-semi', 'right-anti' [right_index] when 'left-semi', 'left-anti' [left_index] else [left_index, right_index] end df.sort(sorter) .drop(sorter) else df end end |
#left_join(other, suffix: '.1', force_order: true) ⇒ DataFrame #left_join(other, join_keys, suffix: '.1', force_order: true) ⇒ DataFrame #left_join(other, join_key_pairs, suffix: '.1', force_order: true) ⇒ DataFrame
the order of joined results will be preserved by default. This is enabled by appending index column to sort after joining but it will cause some performance degradation. If you don’t matter the order of the result, set ‘force_order` option to `false`.
Join matching values to self from other.
-
Same as ‘#join` with `type: :left_outer`
-
A kind of mutating join.
420 421 422 |
# File 'lib/red_amber/data_frame_combinable.rb', line 420 def left_join(other, join_keys = nil, suffix: '.1', force_order: true) join(other, join_keys, type: :left_outer, suffix: suffix, force_order: force_order) end |
#merge(*other) ⇒ DataFrame Also known as: bind_cols
the ‘#size` must be same as `other#size`.
self and other must not share the same key.
Merge other DataFrames or Tables.
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
# File 'lib/red_amber/data_frame_combinable.rb', line 86 def merge(*other) case other in [] | [nil] | [[]] return self in [Array => array] # Nop else array = other end hash = array.each_with_object({}) do |e, h| df = case e when Arrow::Table DataFrame.create(e) when DataFrame e else raise DataFrameArgumentError, "#{e} is not a Table or a DataFrame" end if size != df.size raise DataFrameArgumentError, "#{e} do not have same size as self" end k = keys.intersection(df.keys).any? raise DataFrameArgumentError, "There are some shared keys: #{k}" if k h.merge!(df.to_h) end assign(hash) end |
#right_join(other, suffix: '.1', force_order: true) ⇒ DataFrame #right_join(other, join_keys, suffix: '.1', force_order: true) ⇒ DataFrame #right_join(other, join_key_pairs, suffix: '.1', force_order: true) ⇒ DataFrame
the order of joined results will be preserved by default. This is enabled by appending index column to sort after joining but it will cause some performance degradation. If you don’t matter the order of the result, set ‘force_order` option to `false`.
Join matching values from self to other.
-
Same as ‘#join` with `type: :right_outer`
-
A kind of mutating join.
487 488 489 490 491 492 493 494 495 |
# File 'lib/red_amber/data_frame_combinable.rb', line 487 def right_join(other, join_keys = nil, suffix: '.1', force_order: true) join( other, join_keys, type: :right_outer, suffix: suffix, force_order: force_order ) end |
#semi_join(other, suffix: '.1', force_order: true) ⇒ DataFrame #semi_join(other, join_keys, suffix: '.1', force_order: true) ⇒ DataFrame #semi_join(other, join_key_pairs, suffix: '.1', force_order: true) ⇒ DataFrame
the order of joined results will be preserved by default. This is enabled by appending index column to sort after joining but it will cause some performance degradation. If you don’t matter the order of the result, set ‘force_order` option to `false`.
Return records of self that have a match in other.
-
Same as ‘#join` with `type: :left_semi`
-
A kind of filtering join.
559 560 561 |
# File 'lib/red_amber/data_frame_combinable.rb', line 559 def semi_join(other, join_keys = nil, suffix: '.1', force_order: true) join(other, join_keys, type: :left_semi, suffix: suffix, force_order: force_order) end |
#set_operable?(other) ⇒ Boolean
Check if set operation with self and other is possible.
637 638 639 |
# File 'lib/red_amber/data_frame_combinable.rb', line 637 def set_operable?(other) # rubocop:disable Naming/AccessorMethodName keys == other.keys.map(&:to_sym) end |
#union(other) ⇒ DataFrame
Select records appearing in self or other.
-
Same as ‘#join` with `type: :full_outer` when keys in self are same with other.
-
A kind of set operations.
689 690 691 692 693 694 695 |
# File 'lib/red_amber/data_frame_combinable.rb', line 689 def union(other) unless keys == other.keys.map(&:to_sym) raise DataFrameArgumentError, 'keys are not same with self and other' end join(other, keys, type: :full_outer, force_order: true) end |