ga4-bigquery

Why GA4 Sampling Hurts Your Data

If you have ever pulled a report in Google Analytics 4 and noticed a small yellow shield icon at the top of the screen, you have encountered data sampling. GA4 samples data when a report query processes more than a certain threshold of events. Once sampling kicks in, GA4 no longer shows you real numbers—it shows you an estimate extrapolated from a subset of your actual traffic. For businesses running large campaigns or high-traffic ecommerce stores, sampling can distort conversion rates by 15–30% or more.

The solution is to send your GA4 events directly to BigQuery—Google’s fully managed, serverless data warehouse—where every single event is stored raw and unsampled, forever. BigQuery never samples your data. Every query you run hits the full dataset, down to the individual event level. And with GA4’s native BigQuery Export, setting this up requires zero engineering work on your part. In this guide you will learn exactly how to enable the export, understand the data schema, and run your first unsampled query.

Prerequisites Before You Start

Before enabling the BigQuery export from GA4, you need three things in place. First, a Google Cloud Platform project—go to console.cloud.google.com and create one if needed. The project needs billing enabled. Even if you stay within the free tier (10 GB storage and 1 TB query processing per month), BigQuery requires a billing account attached. For most websites you will never pay anything. Second, you need Editor or Owner access to the Cloud project. Viewer access is not sufficient because GA4 needs to write data into your project. Third, you need Admin access to your GA4 property—only property admins can configure the BigQuery link.

Step 1: Create a BigQuery Dataset

Log into the Google Cloud Console and navigate to BigQuery. In the Explorer panel on the left, click the three-dot menu next to your project name and select “Create dataset.” Give the Dataset ID a clear name like ga4_export or analytics_raw. For data location, choose the region closest to your audience—US is the default and works for most businesses. Leave the default table expiration blank unless you have a specific data retention policy. Click “Create dataset.”

One critical note: the data location you choose here is permanent. You cannot move a BigQuery dataset to a different region after it is created. Choose carefully the first time to avoid a painful migration later.

Step 2: Link GA4 to BigQuery

Open Google Analytics 4 and navigate to Admin (gear icon). Under the Property column, click “BigQuery Links,” then “Link.” GA4 will prompt you to select your Google Cloud project. Next, choose the dataset location and export frequency. You have two export options: Daily and Streaming.

Daily export runs once per day and creates a table named events_YYYYMMDD. Streaming export sends events to BigQuery in near real-time (within minutes) into an events_intraday_YYYYMMDD table. Daily is free beyond storage costs. Streaming incurs additional insert costs of roughly $0.01 per 200 MB, which adds up quickly at high volumes. Start with daily export and upgrade to streaming only if you need real-time dashboards. Select which data streams to include (web, iOS, Android), then click “Submit.”

Step 3: Understanding the BigQuery Schema

Once data starts flowing, you will see tables named events_YYYYMMDD. Each row represents one event. The schema is nested and repeated—different from a flat SQL table. Top-level fields include event_date, event_timestamp, event_name, event_params, user_pseudo_id, device, geo, and traffic_source.

The trickiest field for newcomers is event_params—a REPEATED RECORD where each event row stores multiple parameter key-value pairs as an array. To extract a specific parameter like page_location, you must unnest this array:

SELECT
  event_name,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_id') AS session_id
FROM
  `your_project.ga4_export.events_20260417`
WHERE
  event_name = 'page_view'
LIMIT 1000;

You must specify whether you are extracting a string_value, int_value, float_value, or double_value. String parameters (page URL, campaign name) use string_value. Numeric parameters (session ID, engagement time in milliseconds) use int_value.

Step 4: Your First Unsampled Report — Sessions by Source/Medium

Here is a practical report that would get sampled in the GA4 UI but returns 100% accurate data in BigQuery—total sessions by traffic source and medium:

SELECT
  traffic_source.source AS source,
  traffic_source.medium AS medium,
  COUNT(DISTINCT CONCAT(user_pseudo_id, CAST(
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_id') AS STRING
  ))) AS sessions
FROM
  `your_project.ga4_export.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260401' AND '20260417'
  AND event_name = 'session_start'
GROUP BY
  source, medium
ORDER BY
  sessions DESC;

This query uses a wildcard table suffix (events_*) to query across multiple days at once. The _TABLE_SUFFIX BETWEEN clause restricts which tables are scanned, keeping your query costs low. Sessions are counted by combining user_pseudo_id and session_id—because GA4 does not expose a native session identifier at the top row level.

Step 5: Querying Conversion Events Without Sampling

Pulling unsampled conversion data is one of the most valuable use cases. If you track purchases as GA4 conversion events, you can get the exact count per campaign with this query:

SELECT
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'campaign') AS campaign,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
  COUNT(*) AS conversions,
  SUM((SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')) AS revenue
FROM
  `your_project.ga4_export.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260401' AND '20260417'
  AND event_name = 'purchase'
GROUP BY
  campaign, source
ORDER BY
  revenue DESC;

This returns exact purchase counts and revenue totals pulled from 100% of your raw event data. No sampling, no threshold, no extrapolation—every single transaction that fired a purchase event during the specified period.

Managing BigQuery Costs

BigQuery bills by the amount of data your queries scan, not by the number of queries. The free tier covers 1 TB of query processing per month. A typical GA4 daily export for a medium-traffic website (50,000–200,000 events per day) produces roughly 5–20 MB per day. At that rate you could query an entire year of data repeatedly and never leave the free tier.

To keep costs predictable as your data grows, always filter by _TABLE_SUFFIX when using wildcard tables. Select only the columns you need rather than using SELECT *—BigQuery is a columnar database and only charges for the bytes in the columns you query. Create BigQuery views to pre-aggregate common dimensions so your Looker Studio dashboards read from lightweight summaries rather than scanning gigabytes of raw events on each refresh.

Connecting BigQuery to Looker Studio for Live Dashboards

Once your unsampled data is in BigQuery, connect it to Looker Studio for visualization. Go to lookerstudio.google.com, create a new report, add a data source, and choose “BigQuery” as the connector. Select your project, dataset, and either a specific table or a custom query. Custom queries are powerful because you can pre-aggregate data in BigQuery—reducing how much data Looker Studio processes on each dashboard load and making your reports significantly faster and cheaper to run.

A recommended pattern: create a BigQuery view called daily_sessions_by_channel that pre-runs your sessions-by-source query and materializes the results. Your Looker Studio line chart then reads from this lightweight view—fast, cheap, and always showing 100% unsampled data.

Troubleshooting Common Issues

The most common issue is that data does not appear in BigQuery on the first day. This is expected—the daily export runs once per day in the early morning UTC hours. If you linked GA4 to BigQuery on April 17, your first table appears on April 18. You cannot manually trigger the export. Optionally, during initial setup, GA4 offers to backfill up to the last 13 months of historical data—enable this option to get immediate historical depth.

The second common issue is permission errors. If the BigQuery link shows an error in GA4, verify that the GA4 service account has been granted the BigQuery Data Editor role on your Cloud project. This is usually granted automatically during linking, but in enterprise environments with restricted IAM policies, a Cloud project administrator may need to add the permission manually.

Conclusion

Sending GA4 events to BigQuery is the single most impactful upgrade you can make to your analytics stack. It eliminates sampling entirely, gives you permanent data retention beyond GA4’s 14-month limit, and unlocks SQL-powered analysis that far exceeds what the GA4 interface offers. The setup takes under 30 minutes, fits within Google’s free tier for most websites, and immediately pays dividends the next time you need a campaign’s true conversion rate or want to build a custom attribution model. Start today—and you will quickly find it unthinkable to go back to sampling.

Leave a Comment