Cluster columns in BigQuery

In BigQuery, a clustered table physically organizes data based on the values of one or more selected columns. Clustering helps BigQuery optimize how data is stored and scanned during query execution.

Important Points about Clustering

Note: When clustering a table using multiple columns, the column order matters. The order determines which columns take precedence when BigQuery sorts and groups the data into storage blocks.

Create an Empty Clustered Table (UI)

Follow the steps below to create an empty clustered table using the BigQuery Console:

Step 1: Go to the BigQuery page here in the Google Cloud Console.

Step 2: In the Explorer pane, expand the project and select a dataset.

Step 3: In the Dataset info section, click + Create table.

Clustered table in BigQuery

Step 4: In the Create table panel, specify the following details:

Clustered table in BigQuery

Step 5: Define the schema of the table.

Clustered table in BigQuery

Step 6: In the Clustering order field, enter between one and four comma-separated column names.

Clustered table in BigQuery

Click Create table. After creation, the clustered columns are visible in the table details.

Clustered table in BigQuery

Create a Clustered Table using SQL

We can also create a partitioned and clustered table using SQL.

Example: Create a sales table clustered by country and customer_id, and partitioned by order_date.

SQL

CREATE TABLE `ashishcoder.Coding_Dataset.clustered_table`
(
  order_id     INT64,
  customer_id  INT64,
  country      STRING,
  order_date   DATE,
  total_amount FLOAT64
)
PARTITION BY order_date
CLUSTER BY country, customer_id;    

What’s Happening Here

Verify Clustering on a Table

We can verify clustering information using the INFORMATION_SCHEMA.

SQL

SELECT
  table_name,
  column_name,
  ordinal_position,
  clustering_ordinal_position
FROM `ashishcoder.Coding_Dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'my_clustered_table'
ORDER BY clustering_ordinal_position;    

The output of the above query is shown below:

Clustered table in BigQuery