Loops in BigQuery Scripting

🎯 Introduction

Loops allow you to execute a block of SQL code repeatedly. Whether you need to process a list of tables, perform incremental data loads, or run a calculation until a goal is reached, loops provide the programmatic control necessary for complex automation in BigQuery.

Real-world use case: Iterating through a list of 100 country-specific tables to aggregate their data into a single global master table, rather than writing 100 identical `INSERT` statements.

🛠️ Prerequisites & IAM Roles

Before using loops, ensure you understand:

💰 Cost Note

Warning: Loops are the most common cause of "Bill Shock" in BigQuery.

1. Multiplied Costs: Every query inside a loop is a **separate job**. If a loop runs 50 times and the query inside scans 20 GB, you will pay for **1,000 GB (1 TB)**.
2. Infinite Loops: A `WHILE` loop with a bug in its stopping condition can run until it hits the 24-hour limit, scanning data every few seconds.
3. Cost Protection: Always add a "safety counter" or a `LIMIT` to the source query of a `FOR` loop during testing.

🏗️ Syntax: Loop Types

BigQuery supports three primary loop constructs:

💻 Examples

Example 1: The WHILE Loop (Counter-based)

SQL

BEGIN
  DECLARE counter INT64 DEFAULT 1;

  WHILE counter <= 5 DO
    SELECT FORMAT("Processing iteration %d", counter) as status;
    SET counter = counter + 1;
  END WHILE;
END;

Example 2: The FOR Loop (Query-based)

SQL

-- Iterating through existing tables in a dataset
FOR table_info IN (
  SELECT table_name 
  FROM `my-project.my_dataset.INFORMATION_SCHEMA.TABLES`
  WHERE table_name LIKE 'log_%'
)
DO
  -- Use EXECUTE IMMEDIATE to run dynamic SQL for each table
  EXECUTE IMMEDIATE FORMAT("SELECT COUNT(*) FROM `my-project.my_dataset.%s`", table_info.table_name);
END FOR;

⚠️ Common Pitfalls

🚀 Best Practices