Loops in BigQuery Scripting
🎯 Introduction
Loops allow you to execute a block of SQL code repeatedly. Whether you need to process a list of tables, perform incremental data loads, or run a calculation until a goal is reached, loops provide the programmatic control necessary for complex automation in BigQuery.
Real-world use case: Iterating through a list of 100 country-specific tables to aggregate their data into a single global master table, rather than writing 100 identical `INSERT` statements.
🛠️ Prerequisites & IAM Roles
Before using loops, ensure you understand:
- DECLARE and SET statements for variables.
- Basic IF logic for loop termination.
💰 Cost Note
Warning: Loops are the most common cause of "Bill Shock" in BigQuery.
1. Multiplied Costs: Every query inside a loop is a **separate job**. If a loop runs 50 times and the query inside scans 20 GB, you will pay for **1,000 GB (1 TB)**.
2. Infinite Loops: A `WHILE` loop with a bug in its stopping condition can run until it hits the 24-hour limit, scanning data every few seconds.
3. Cost Protection: Always add a "safety counter" or a `LIMIT` to the source query of a `FOR` loop during testing.
🏗️ Syntax: Loop Types
BigQuery supports three primary loop constructs:
- WHILE: Runs while a condition is TRUE.
- FOR: Iterates over the result of a query.
- LOOP: Runs indefinitely until a `LEAVE` statement is hit.
💻 Examples
Example 1: The WHILE Loop (Counter-based)
SQL
BEGIN
DECLARE counter INT64 DEFAULT 1;
WHILE counter <= 5 DO
SELECT FORMAT("Processing iteration %d", counter) as status;
SET counter = counter + 1;
END WHILE;
END; Example 2: The FOR Loop (Query-based)
SQL
-- Iterating through existing tables in a dataset
FOR table_info IN (
SELECT table_name
FROM `my-project.my_dataset.INFORMATION_SCHEMA.TABLES`
WHERE table_name LIKE 'log_%'
)
DO
-- Use EXECUTE IMMEDIATE to run dynamic SQL for each table
EXECUTE IMMEDIATE FORMAT("SELECT COUNT(*) FROM `my-project.my_dataset.%s`", table_info.table_name);
END FOR; ⚠️ Common Pitfalls
- Row-by-Row vs. Set-based: Don't use a loop to update rows one by one. BigQuery is much faster and cheaper if you use a single `MERGE` or `UPDATE` statement with a `JOIN`.
- Missing LEAVE: If using the basic `LOOP` construct, forgetting `LEAVE` will result in an infinite loop.
- Variable Reset: Variables declared *outside* a loop maintain their state. If you need a fresh variable for each iteration, declare it *inside* the loop.
🚀 Best Practices
- ✅ Label Your Loops: Use labels (e.g., `process_tables: FOR ...`) to make your code more readable and to allow `LEAVE process_tables` in nested logic.
- ✅ Iterate Over Metadata: Use loops primarily for orchestrating tasks (like creating partitions or tables) rather than processing the data itself.
- ✅ Dry Run First: Check the "validator" in the BigQuery console; it will often warn you if a loop looks problematic.