DISTINCT keyword in BigQuery

The DISTINCT keyword in BigQuery is used to remove duplicate rows from a result set. This is useful when we want to retrieve only unique values from a column or a combination of columns.

SQL Syntax

SELECT DISTINCT column_name1, column_name2, ...
FROM `project_name.dataset_name.table_name`
[WHERE condition]
[ORDER BY column_name];    

This query returns only the unique values from the specified column or columns.

Example: DISTINCT with One Column

In this example, we retrieve the distinct values from a single column.

SQL

SELECT DISTINCT state
FROM `data-to-insights.customer_insights.customer_web_data_cleaned`;    

The result of the above query is shown below:

DISTINCT keyword with one column

DISTINCT with Multiple Columns

When using DISTINCT with multiple columns, the combination of all specified columns must be unique.

SQL Syntax

SELECT DISTINCT column1, column2
FROM `project_name.dataset_name.table_name`;    

Example:

SQL

SELECT DISTINCT state, Gender
FROM `data-to-insights.customer_insights.customer_web_data_cleaned`;    

The result of the above query is shown below:

DISTINCT keyword with multiple columns

Count Total Records and Unique Values

We can use COUNT along with DISTINCT to find total records and unique values in a column.

SQL

SELECT
  COUNT(*) AS Total_Records,
  COUNT(DISTINCT State) AS `Unique States`
FROM `data-to-insights.customer_insights.customer_web_data_cleaned`;   

The output of the above query is shown below:

COUNT DISTINCT example

Remove Duplicate Rows

We can remove duplicate rows from the entire table by using DISTINCT *.

SQL

SELECT DISTINCT *
FROM `data-to-insights.customer_insights.customer_web_data_cleaned`;    

This query removes duplicate entire rows. A row is considered duplicate only if all column values are identical.