DISTINCT keyword in BigQuery

The DISTINCT keyword in BigQuery is used to remove duplicate rows from a result set. This is useful when we want to retrieve only unique values from a column or a combination of columns.

SQL Syntax

SELECT DISTINCT column_name1, column_name2, ...
FROM `project_name.dataset_name.table_name`
[WHERE condition]
[ORDER BY column_name];

This query returns only the unique values from the specified column or columns.

Example: DISTINCT with One Column

In this example, we retrieve the distinct values from a single column.

SQL

SELECT DISTINCT state
FROM `data-to-insights.customer_insights.customer_web_data_cleaned`;

The result of the above query is shown below:

DISTINCT with Multiple Columns

When using DISTINCT with multiple columns, the combination of all specified columns must be unique.

SQL Syntax

SELECT DISTINCT column1, column2
FROM `project_name.dataset_name.table_name`;

Example:

SQL

SELECT DISTINCT state, Gender
FROM `data-to-insights.customer_insights.customer_web_data_cleaned`;

The result of the above query is shown below:

Count Total Records and Unique Values

We can use COUNT along with DISTINCT to find total records and unique values in a column.

SQL

SELECT
  COUNT(*) AS Total_Records,
  COUNT(DISTINCT State) AS `Unique States`
FROM `data-to-insights.customer_insights.customer_web_data_cleaned`;

The output of the above query is shown below:

Remove Duplicate Rows

We can remove duplicate rows from the entire table by using DISTINCT *.

SQL

SELECT DISTINCT *
FROM `data-to-insights.customer_insights.customer_web_data_cleaned`;

This query removes duplicate entire rows. A row is considered duplicate only if all column values are identical.