Create BigQuery Tables from Google Cloud Storage (Using SQL)
Creating BigQuery tables directly from files stored in Google Cloud Storage (GCS) is a common data engineering workflow. This method avoids manual downloads and enables scalable analytics pipelines.
Below is a step-by-step guide using BigQuery SQL.
Prerequisites
Before creating the table:
- A Google Cloud Storage bucket with files
- Supported file formats: CSV, JSON, Parquet, ORC, Avro
- BigQuery dataset created
- Required permissions:
- BigQuery Data Editor
- Storage Object Viewer
🎯 Goal
Load a CSV file stored in:
gs://new-bucket-via-python-sdk/Sample Data.csv
into a BigQuery table:
ashishcoder.Coding_Dataset.gcstable
Step 1: Create BigQuery Table
SQL
CREATE OR REPLACE TABLE `ashishcoder.Coding_Dataset.gcstable` ( EmployeeID INT64, Name STRING, Age INT64, Salary INT64, Department STRING, Gender STRING );
Explanation
- EmployeeID (INT64) → Unique employee ID
- Name (STRING) → Employee name
- Age (INT64) → Age
- Salary (INT64) → Salary
- Department (STRING) → Department
- Gender (STRING) → Gender
- CREATE OR REPLACE overwrites the table if it already exists
Step 2: Load Data from GCS into BigQuery
The LOAD DATA statement loads data from one or more files into a table.
SQL
LOAD DATA OVERWRITE `ashishcoder.Coding_Dataset.gcstable` FROM FILES ( format = 'CSV', uris = ['gs://new-bucket-via-python-sdk/Sample Data.csv'], skip_leading_rows = 1 );
Parameter Breakdown
LOAD DATA OVERWRITE
- Loads data into the table
- Replaces existing data
a) format → Specifies the file type. Supported formats:
- CSV
- NEWLINE_DELIMITED_JSON
- PARQUET
- AVRO
- ORC
b) uris → Specifies the location of files.
Examples:
uris = ['gs://bucket/file.csv'] # Multiple files uris = ['gs://bucket/*.csv']
c) skip_leading_rows → Skips header rows in the file.