BigQuery Partitioning for GA4 Data: Cut Query Costs by 90%

GhazanfarApril 17, 20265 min read

BigQuery charges for query processing based on bytes scanned. A GA4 property that generates 50 million events per day accumulates roughly 1.5 billion events per month. Without partitioning, every query against that table scans the entire dataset — even a simple query for yesterday’s page views reads 30 days or more of data. Proper partitioning reduces query costs by 80-95% for most analytics workloads and speeds up query execution dramatically. This guide explains how GA4 BigQuery exports are partitioned, how to extend that partitioning, and how to build optimized derived tables.

How GA4 BigQuery Export Partitioning Works

GA4’s BigQuery export uses sharded tables rather than native partitioned tables. Each day’s data is exported to a separate table with the suffix _YYYYMMDD (for example, events_20240115). When you query with _TABLE_SUFFIX BETWEEN ‘20240101’ AND ‘20240131’, BigQuery uses metadata to determine which tables to read, scanning only the relevant daily tables. This is functionally similar to partitioning but has important differences: each daily table is a separate object with its own metadata, and querying across many days creates significant query planning overhead.

GA4 also creates an events_intraday_YYYYMMDD table for the current day’s streaming data. This table is replaced by the final events_YYYYMMDD when the daily export completes. Always exclude intraday tables in production queries unless you specifically need today’s data, as they contain incomplete data that will change before the day closes.

Creating a Native Partitioned Table

For better performance and cost control, consolidate GA4 sharded tables into a single native partitioned table. Native partitioned tables support partition pruning without the overhead of table sharding, allow you to set partition expiration policies, and enable clustering for additional query optimization.

-- Create a partitioned + clustered table from GA4 export
CREATE TABLE `your_project.analytics_clean.events`
PARTITION BY DATE(event_date)
CLUSTER BY event_name, user_pseudo_id
AS
SELECT
  PARSE_DATE('%Y%m%d', event_date) AS event_date,
  event_timestamp,
  event_name,
  event_params,
  user_pseudo_id,
  user_id,
  device,
  geo,
  traffic_source,
  ecommerce,
  items
FROM `your_project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'

The PARTITION BY DATE(event_date) clause tells BigQuery to organize data by date. Queries that filter on event_date will only scan the relevant partitions. The CLUSTER BY clause further organizes data within each partition by event_name and user_pseudo_id — queries filtering on these columns will scan less data within each partition.

Setting Partition Expiration

GA4 BigQuery export retains data based on your BigQuery dataset settings. For cost control, set partition expiration on your derived tables to automatically delete old partitions. GA4 raw data is often kept for 12-24 months, but derived aggregation tables can have shorter retention since they can be regenerated from raw data.

-- Set partition expiration on an existing table (90 days)
ALTER TABLE `your_project.analytics_clean.events`
SET OPTIONS (
  partition_expiration_days = 90
);

-- Or create table with expiration built in
CREATE TABLE `your_project.analytics_clean.daily_summary`
PARTITION BY summary_date
OPTIONS (partition_expiration_days = 365)
AS SELECT ...

Automating Daily Appends

Once you have a native partitioned table, set up a scheduled query to append each new day’s GA4 export data automatically. The scheduled query runs daily after GA4 completes its export (typically by 09:00 UTC for the previous day’s data, though this can vary).

-- Daily append query (run at 10:00 UTC)
INSERT INTO `your_project.analytics_clean.events`
SELECT
  PARSE_DATE('%Y%m%d', event_date) AS event_date,
  event_timestamp,
  event_name,
  event_params,
  user_pseudo_id,
  user_id,
  device,
  geo,
  traffic_source,
  ecommerce,
  items
FROM `your_project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', DATE_SUB(@run_date, INTERVAL 1 DAY))

Use BigQuery’s @run_date parameter in scheduled queries to make the date dynamic. DATE_SUB(@run_date, INTERVAL 1 DAY) gives you yesterday’s date, which is what you want when the scheduled query runs today to append yesterday’s completed GA4 export. Set the destination table to APPEND mode so each run adds a new day without overwriting existing data.

Query Cost Comparison

To demonstrate the cost difference, consider a query that counts page views for a single day on a property with 100 million monthly events. Against sharded tables with _TABLE_SUFFIX filtering, BigQuery scans approximately one day’s worth of data (3-4 million rows, roughly 500MB). Against a native partitioned table with event_date filtering, BigQuery achieves the same result with identical scanning — the behavior is the same in this case. The cost difference becomes significant when your query does NOT include a date filter: on sharded tables, a query without _TABLE_SUFFIX still scans all tables BigQuery determines to check. On a partitioned table with a query that includes event_date, partition pruning ensures only relevant data is read.

To verify partition pruning is working, check the query’s bytes processed in the BigQuery console. Add LIMIT to your query and observe whether bytes processed changes — if it does not change, partition pruning is occurring before any row scanning. Use the BigQuery INFORMATION_SCHEMA.PARTITIONS view to audit partition sizes and identify unexpectedly large partitions that might indicate a data quality issue.

How GA4 BigQuery Export Partitioning Works

Creating a Native Partitioned Table

Setting Partition Expiration

Automating Daily Appends

Query Cost Comparison

Related guides

Leave a Comment Cancel reply