Query BigQuery Tables Using Python
BigQuery is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. In this tutorial, we will learn how to execute SQL queries on BigQuery tables using the google-cloud-bigquery Python library.
Prerequisites
- Install the library: pip install google-cloud-bigquery
- Authenticated environment (ADC or Service Account)
Python Code to Query BigQuery
The following example demonstrates how to initialize a BigQuery client, define a SQL query, and iterate through the results.
Python
# Import the packages
from dotenv import load_dotenv
import os
from google.oauth2 import service_account
from google.cloud import bigquery
def main():
# Loads environment variables from a .env file
load_dotenv()
# Use environment variables from .env file
credentials_path = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
project_name = os.getenv("project_id")
# Get the service account credentials
credentials = service_account.Credentials.from_service_account_file(credentials_path)
# Create the BigQuery client object
client = bigquery.Client(credentials=credentials, project=project_name)
# Perform a query
QUERY = """
SELECT
name,
SUM(number) AS total_count
FROM
`bigquery-public-data.usa_names.usa_1910_2013`
WHERE
state = 'TX'
GROUP BY
name
ORDER BY
total_count DESC
LIMIT 5
"""
print("Executing query...")
# Execute the query
query_job = client.query(QUERY)
print("The results are:")
# Waits for query to finish
rows = query_job.result()
for row in rows:
# Access column values by name
print(f"Name: {row['name']}, Total: {row['total_count']}")
if __name__ == "__main__":
main() Key Steps Explained
- bigquery.Client(): Initializes the connection to BigQuery.
- client.query(query): Sends the SQL query as an asynchronous job.
- query_job.result(): Waits for the query to finish and returns an iterator over the rows.
Note: When querying, ensure you use backticks (`) for table names if they contain special characters or are fully qualified (for example, project.dataset.table).