Expand description
DataFusion Join implementations
Modules§
- utils
- Join related functionality used both on logical and physical plans
Structs§
- Cross
Join Exec - Cross Join Execution Plan
- Hash
Join Exec - Join execution plan: Evaluates equijoin predicates in parallel on multiple partitions using a hash table and an optional filter list to apply post join.
- Nested
Loop Join Exec - NestedLoopJoinExec is build-probe join operator, whose main task is to
perform joins without any equijoin conditions in
ON
clause. - Sort
Merge Join Exec - Join execution plan that executes equi-join predicates on multiple partitions using Sort-Merge join algorithm and applies an optional filter post join. Can be used to join arbitrarily large inputs where one or both of the inputs don’t fit in the available memory.
- Symmetric
Hash Join Exec - A symmetric hash join with range conditions is when both streams are hashed on the join key and the resulting hash tables are used to join the streams. The join is considered symmetric because the hash table is built on the join keys from both streams, and the matching of rows is based on the values of the join keys in both streams. This type of join is efficient in streaming context as it allows for fast lookups in the hash table, rather than having to scan through one or both of the streams to find matching rows, also it only considers the elements from the stream that fall within a certain sliding window (w/ range conditions), making it more efficient and less likely to store stale data. This enables operating on unbounded streaming data without any memory issues.
Enums§
- Partition
Mode - Hash join Partitioning mode
- Stream
Join Partition Mode - Partitioning mode to use for symmetric hash join