In this article let’s dive into the world of joins and how these can be tuned for performance.
Snowflake is renowned for its high-performance cloud data warehousing capabilities, enabling organisations to handle large volumes of structured and semi-structured data efficiently. A fundamental aspect of working with data in Snowflake involves using SQL joins, which combine data from multiple tables. Understanding joins—especially concepts such as flipped joins, logical join types, and the Search Optimization Service (SOS)—is essential for optimising queries and improving overall system performance.
Features change from time to time with new features being added regularly, it is recommended that you review the documentation for the latest on what specific features are included with any of the Editions.
Table of Contents
Open Table of Contents
Logical Join Types
Snowflake supports several logical join types, each serving a specific purpose:
Inner Join: Returns rows with matching values in both tables.
Left Outer Join: Returns all rows from the left table and matched rows from the right table; unmatched rows from the right table return NULL values.
Right Outer Join: Returns all rows from the right table and matched rows from the left table; unmatched rows from the left table return NULL values.
Full Outer Join: Combines results from both tables, returning NULL values for unmatched rows in either table.
What Is a Flipped Join?
A flipped join occurs when Snowflake mistakenly selects the larger table as the build side of the join, leading to excessive resource consumption. Ideally, the smaller table should always be chosen as the build side, as this reduces memory usage and speeds up processing. Recognising and preventing flipped joins significantly optimises query performance.
Joins In Practice
Performing Joins in Snowflake
A basic inner join between two tables in Snowflake looks like this:
SELECT a.column1, b.column2
FROM table_a AS a
INNER JOIN table_b AS b
ON a.id = b.id;
Avoiding Flipped Joins
To prevent flipped joins, consider these practical steps:
Check table sizes: Use Snowflake metadata queries to determine which table is smaller.
Explicitly control join order: Use the JOIN order explicitly to ensure the smaller table is the build side.
-- Smaller table is specified first
SELECT a.column1, b.column2
FROM small_table AS a
JOIN large_table AS b
ON a.id = b.id;
Leveraging Search Optimization Service (SOS)
Features change from time to time with new features being added regularly, it is recommended that you review the documentation for the latest on what specific features are included with any of the Editions.
Snowflake’s SOS improves query performance by automatically optimising table scans, especially beneficial for large tables frequently involved in joins.
Activate SOS with:
ALTER TABLE my_table ADD SEARCH OPTIMIZATION;
Deactivate SOS with:
ALTER TABLE my_table DROP SEARCH OPTIMIZATION;
Examples and Use Cases
Example of Logical Join Types
- Inner Join Example:
SELECT customer.name, orders.amount
FROM customer
INNER JOIN orders
ON customer.customer_id = orders.customer_id;
- Left Outer Join Example:
SELECT customer.name, orders.amount
FROM customer
LEFT JOIN orders
ON customer.customer_id = orders.customer_id;
Practical Scenario: Preventing Flipped Joins
If experiencing slow queries, first identify which side Snowflake selected as the build side:
EXPLAIN
SELECT * FROM large_table a
JOIN small_table b
ON a.id = b.id;
Review the execution plan output to ensure the smaller table is the build side.
Best Practices and Optimization Tips
Choose Appropriate Join Type
-
Use Inner Join for exact matches.
-
Use Left/Right Outer Joins when preserving all rows from one table.
-
Avoid unnecessary Full Outer Joins due to potential for larger datasets.
Monitor and Optimise Table Size
Regularly check your table size to maintain efficient join operations. Smaller tables should always be used as build tables.
Apply Search Optimization Service Wisely
Activate SOS strategically on tables frequently joined and scanned but consider associated storage costs and update overheads.
Use Join Filters and Dynamic Pruning
Features change from time to time with new features being added regularly, it is recommended that you review the documentation for the latest on what specific features are included with any of the Editions.
Leverage Snowflake’s built-in filters and dynamic pruning capabilities to eliminate irrelevant rows early in the query execution process, improving query speed.
Conclusion
Understanding basic joins in Snowflake is critical for efficient data querying and management. By focusing on logical join types, proactively avoiding flipped joins, and strategically employing the Search Optimization Service, you can significantly enhance query performance.