GA4’s Path Exploration report is powerful but limited: it maxes out at a few hundred thousand sessions, does not let you filter by specific user segments mid-path, and cannot be automated for recurring reports. BigQuery gives you the raw event data to build path analyses that scale to billions of events, support arbitrary filters, and can be scheduled to refresh automatically. This guide shows you how to replicate and extend GA4’s path exploration entirely in SQL.

The Data Model: How GA4 Stores Event Sequences

In the BigQuery GA4 export, every event row includes user_pseudo_id, ga_session_id (nested inside event_params), event_name, and event_timestamp in microseconds. To reconstruct a user’s path within a session, you need to order events by event_timestamp within each (user_pseudo_id, ga_session_id) combination. The first challenge is extracting ga_session_id from the event_params array using UNNEST and filtering by key = ‘ga_session_id’.

WITH session_events AS (
  SELECT
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
    event_name,
    event_timestamp,
    ROW_NUMBER() OVER (
      PARTITION BY user_pseudo_id,
        (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
      ORDER BY event_timestamp
    ) AS step_number
  FROM your_project.analytics_XXXXXX.events_20240101
  WHERE event_name NOT IN ('user_engagement', 'scroll')
)
SELECT * FROM session_events ORDER BY user_pseudo_id, session_id, step_number
LIMIT 1000

Building a Two-Step Path Frequency Table

The most common path analysis task is finding which events most frequently follow a given event. In SQL, you accomplish this with a self-join on the session_events CTE where the second event has step_number = first event’s step_number + 1.

SELECT
  e1.event_name AS step_1,
  e2.event_name AS step_2,
  COUNT(*) AS path_count
FROM session_events e1
JOIN session_events e2
  ON e1.user_pseudo_id = e2.user_pseudo_id
  AND e1.session_id = e2.session_id
  AND e2.step_num = e1.step_num + 1
WHERE e1.event_name = 'view_item'
GROUP BY 1, 2
ORDER BY path_count DESC
LIMIT 20

Multi-Step Path Analysis with LEAD

For deeper path analysis, the LEAD window function is more efficient than self-joins because it avoids the cartesian product. LEAD(event_name, 1) gives you the next event, LEAD(event_name, 2) gives the event after that. This lets you analyze 4-step sequences efficiently without multiple joins.

SELECT
  event_name AS step_1,
  LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY event_timestamp) AS step_2,
  LEAD(event_name, 2) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY event_timestamp) AS step_3,
  LEAD(event_name, 3) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY event_timestamp) AS step_4,
  COUNT(*) AS occurrences
FROM session_events
WHERE step_1 = 'page_view'
GROUP BY 1, 2, 3, 4
ORDER BY occurrences DESC
LIMIT 50
BigQuery GA4 Path Exploration Alternative: Build Funnel Paths in SQL

Filtering Paths by User Segment

One of the biggest advantages of SQL path analysis over GA4’s built-in explorer is the ability to filter by any combination of user attributes mid-query. You can restrict paths to users who eventually converted, users in a specific country, users who arrived via paid search, or users who had more than three sessions before converting — none of which are possible in GA4’s Explore interface without creating pre-defined segments.

-- Find paths for users who eventually purchased
WITH converters AS (
  SELECT DISTINCT user_pseudo_id
  FROM your_project.analytics_XXXXXX.events_*
  WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
    AND event_name = 'purchase'
)
SELECT
  event_name AS step,
  LEAD(event_name, 1) OVER (PARTITION BY e.user_pseudo_id, session_id ORDER BY event_timestamp) AS next_step,
  COUNT(*) AS transitions
FROM session_events e
INNER JOIN converters c ON e.user_pseudo_id = c.user_pseudo_id
GROUP BY 1, 2
ORDER BY transitions DESC
LIMIT 30

Page Path Analysis Using page_location

In BigQuery, you can use the page_location parameter from event_params to build URL-level path analysis. This is more granular than event names because you can distinguish between different product pages, category pages, and landing pages. Strip query parameters before grouping using REGEXP_REPLACE to avoid fragmenting data by query string variations.

Exit Point Analysis

To find where users exit your site (the last event in each session), identify rows where LEAD returns NULL, meaning no subsequent event exists in that session. This gives you a ranked list of exit pages that can be compared against your funnel steps to identify where users are dropping off.

Scheduling Path Analysis in Looker Studio

Once you have your path analysis SQL working in BigQuery, connect it to Looker Studio as a custom query data source. Create a Looker Studio report with a table showing step_1, step_2, and path_count. Users can filter by step_1 using a filter control, giving them a self-service path explorer that is not limited by GA4’s sampling or row count limits. For automated daily refresh, save the SQL as a BigQuery scheduled query that writes results to a destination table with WRITE_TRUNCATE disposition and point Looker Studio at the destination table.

Guide

Leave a Comment