GA4’s Path Exploration report is powerful but limited: it maxes out at a few hundred thousand sessions, does not let you filter by specific user segments mid-path, and cannot be automated for recurring reports. BigQuery gives you the raw event data to build path analyses that scale to billions of events, support arbitrary filters, and can be scheduled to refresh automatically. This guide shows you how to replicate and extend GA4’s path exploration entirely in SQL.
The Data Model: How GA4 Stores Event Sequences
In the BigQuery GA4 export, every event row includes user_pseudo_id, ga_session_id (nested inside event_params), event_name, and event_timestamp in microseconds. To reconstruct a user’s path within a session, you need to order events by event_timestamp within each (user_pseudo_id, ga_session_id) combination. The first challenge is extracting ga_session_id from the event_params array using UNNEST and filtering by key = ‘ga_session_id’.
WITH session_events AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
event_name,
event_timestamp,
ROW_NUMBER() OVER (
PARTITION BY user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
ORDER BY event_timestamp
) AS step_number
FROM your_project.analytics_XXXXXX.events_20240101
WHERE event_name NOT IN ('user_engagement', 'scroll')
)
SELECT * FROM session_events ORDER BY user_pseudo_id, session_id, step_number
LIMIT 1000
Building a Two-Step Path Frequency Table
The most common path analysis task is finding which events most frequently follow a given event. In SQL, you accomplish this with a self-join on the session_events CTE where the second event has step_number = first event’s step_number + 1.
SELECT
e1.event_name AS step_1,
e2.event_name AS step_2,
COUNT(*) AS path_count
FROM session_events e1
JOIN session_events e2
ON e1.user_pseudo_id = e2.user_pseudo_id
AND e1.session_id = e2.session_id
AND e2.step_num = e1.step_num + 1
WHERE e1.event_name = 'view_item'
GROUP BY 1, 2
ORDER BY path_count DESC
LIMIT 20
Multi-Step Path Analysis with LEAD
For deeper path analysis, the LEAD window function is more efficient than self-joins because it avoids the cartesian product. LEAD(event_name, 1) gives you the next event, LEAD(event_name, 2) gives the event after that. This lets you analyze 4-step sequences efficiently without multiple joins.
SELECT
event_name AS step_1,
LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY event_timestamp) AS step_2,
LEAD(event_name, 2) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY event_timestamp) AS step_3,
LEAD(event_name, 3) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY event_timestamp) AS step_4,
COUNT(*) AS occurrences
FROM session_events
WHERE step_1 = 'page_view'
GROUP BY 1, 2, 3, 4
ORDER BY occurrences DESC
LIMIT 50

Filtering Paths by User Segment
One of the biggest advantages of SQL path analysis over GA4’s built-in explorer is the ability to filter by any combination of user attributes mid-query. You can restrict paths to users who eventually converted, users in a specific country, users who arrived via paid search, or users who had more than three sessions before converting — none of which are possible in GA4’s Explore interface without creating pre-defined segments.
-- Find paths for users who eventually purchased
WITH converters AS (
SELECT DISTINCT user_pseudo_id
FROM your_project.analytics_XXXXXX.events_*
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
AND event_name = 'purchase'
)
SELECT
event_name AS step,
LEAD(event_name, 1) OVER (PARTITION BY e.user_pseudo_id, session_id ORDER BY event_timestamp) AS next_step,
COUNT(*) AS transitions
FROM session_events e
INNER JOIN converters c ON e.user_pseudo_id = c.user_pseudo_id
GROUP BY 1, 2
ORDER BY transitions DESC
LIMIT 30
Page Path Analysis Using page_location
In BigQuery, you can use the page_location parameter from event_params to build URL-level path analysis. This is more granular than event names because you can distinguish between different product pages, category pages, and landing pages. Strip query parameters before grouping using REGEXP_REPLACE to avoid fragmenting data by query string variations.
Exit Point Analysis
To find where users exit your site (the last event in each session), identify rows where LEAD returns NULL, meaning no subsequent event exists in that session. This gives you a ranked list of exit pages that can be compared against your funnel steps to identify where users are dropping off.
Scheduling Path Analysis in Looker Studio
Once you have your path analysis SQL working in BigQuery, connect it to Looker Studio as a custom query data source. Create a Looker Studio report with a table showing step_1, step_2, and path_count. Users can filter by step_1 using a filter control, giving them a self-service path explorer that is not limited by GA4’s sampling or row count limits. For automated daily refresh, save the SQL as a BigQuery scheduled query that writes results to a destination table with WRITE_TRUNCATE disposition and point Looker Studio at the destination table.
