GA4 Predictive Audiences and BigQuery ML Integration: Build a Purchase Propensity Model

GhazanfarApril 17, 20266 min read

What Are GA4 Predictive Audiences and How Do They Connect to BigQuery ML?

GA4’s predictive audiences use Google’s machine learning models to identify users who are likely to perform a specific action in the future—most commonly, users likely to purchase in the next seven days or users likely to churn (stop visiting) in the next seven days. These predictions are generated from behavioral patterns GA4 observes across your property: how users navigate, what they add to their cart, how many sessions they have, and dozens of other signals. GA4 then lets you use these predicted audiences for Google Ads targeting, retargeting campaigns, and bid strategy adjustments.

BigQuery ML extends this concept by letting you build your own predictive models on your GA4 BigQuery Export data, using SQL syntax with no Python or data science tooling required. With BigQuery ML, you can train a purchase propensity model on your specific customer data rather than relying on Google’s generic model trained across all GA4 properties. The result is a propensity score tailored to your customers’ actual behavior patterns—often more accurate than the generic GA4 prediction, and exportable to any downstream system.

GA4 Native Predictive Audiences: Setup and Requirements

Before GA4’s predictive capabilities activate for your property, you must meet minimum data requirements. GA4 needs at least 1,000 returning users per week who triggered the relevant positive signal (purchase) and at least 1,000 users per week who did not trigger the signal. For the “Likely 7-day purchasers” audience, this means your property must have at least 1,000 weekly sessions from users who made a purchase and 1,000 weekly sessions from users who did not. Properties that do not meet this threshold will not see predictive audiences in the GA4 interface.

If your property meets the requirements, predictive audiences appear automatically in GA4 under Advertising → Audiences. Click “New audience” and you will see a “Predictive” section with options including “Likely 7-day purchasers,” “Likely 7-day churning users,” and “Predicted 28-day top spenders.” Select “Likely 7-day purchasers,” give the audience a name, and save it. Google Ads linked to your GA4 property can then use this audience for campaign targeting, bid adjustments, or remarketing exclusions within 24–48 hours.

Building a Custom Purchase Propensity Model in BigQuery ML

If your property does not meet GA4’s data requirements, or if you want a model tailored to your specific data, BigQuery ML lets you build one with SQL. The following example trains a logistic regression model to predict whether a user will make a purchase in the next 7 days based on their behavior in the current session:

-- Step 1: Create training data
CREATE OR REPLACE TABLE `your_project.your_dataset.propensity_training` AS
WITH user_features AS (
  SELECT
    user_pseudo_id,
    COUNT(DISTINCT (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_id')) AS session_count,
    COUNT(CASE WHEN event_name = 'view_item' THEN 1 END) AS item_views,
    COUNT(CASE WHEN event_name = 'add_to_cart' THEN 1 END) AS add_to_cart_count,
    COUNT(CASE WHEN event_name = 'begin_checkout' THEN 1 END) AS checkout_starts,
    MAX(CASE WHEN event_name = 'purchase' AND PARSE_DATE('%Y%m%d', event_date) 
        BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE() 
        THEN 1 ELSE 0 END) AS purchased_in_7_days
  FROM `your_project.ga4_export.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20260410'
  GROUP BY user_pseudo_id
)
SELECT * FROM user_features WHERE session_count > 0;

-- Step 2: Train the model
CREATE OR REPLACE MODEL `your_project.your_dataset.purchase_propensity_model`
OPTIONS (
  model_type = 'logistic_reg',
  input_label_cols = ['purchased_in_7_days'],
  auto_class_weights = TRUE
) AS
SELECT
  session_count,
  item_views,
  add_to_cart_count,
  checkout_starts,
  purchased_in_7_days
FROM `your_project.your_dataset.propensity_training`;

The auto_class_weights = TRUE option is important because your dataset is likely highly imbalanced—most users do not purchase. This option tells BigQuery ML to automatically weight the minority class (purchasers) more heavily so the model does not simply predict “no purchase” for everyone and achieve high accuracy while being useless.

Scoring Current Users with the Model

After training, use the model to score users from the most recent 7 days who have not yet purchased:

SELECT
  user_pseudo_id,
  predicted_purchased_in_7_days_probs[OFFSET(1)].prob AS purchase_probability
FROM
  ML.PREDICT(
    MODEL `your_project.your_dataset.purchase_propensity_model`,
    (
      SELECT
        user_pseudo_id,
        COUNT(DISTINCT (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_id')) AS session_count,
        COUNT(CASE WHEN event_name = 'view_item' THEN 1 END) AS item_views,
        COUNT(CASE WHEN event_name = 'add_to_cart' THEN 1 END) AS add_to_cart_count,
        COUNT(CASE WHEN event_name = 'begin_checkout' THEN 1 END) AS checkout_starts
      FROM `your_project.ga4_export.events_*`
      WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
        AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
      GROUP BY user_pseudo_id
    )
  )
ORDER BY purchase_probability DESC
LIMIT 1000;

This query returns the top 1,000 users ranked by purchase probability for the last 7 days. Users at the top of this list are your highest-value remarketing targets—they have demonstrated behaviors strongly correlated with purchasing and have not yet converted. Prioritize your remarketing budget toward these users for maximum efficiency.

Evaluating Model Performance

After training, evaluate the model’s accuracy using BigQuery ML’s built-in evaluation function:

SELECT *
FROM ML.EVALUATE(MODEL `your_project.your_dataset.purchase_propensity_model`);

The evaluation returns precision, recall, accuracy, F1 score, and ROC AUC. For a purchase propensity model, focus on ROC AUC (area under the receiver operating characteristic curve)—a value above 0.75 indicates the model is meaningfully better than random guessing, and above 0.85 indicates a strong model. If your AUC is low, add more features to the training data: days since last visit, total sessions in the past 30 days, average session duration, device category, or first acquisition channel.

Exporting Scores to GA4 for Audience Creation

The most powerful use of your BigQuery ML propensity scores is feeding them back into GA4 as a custom audience for Google Ads targeting. This requires sending a GA4 event for each high-propensity user that GA4 can use to build an audience. The most practical implementation sends a server-side event via the GA4 Measurement Protocol for each user in your top-scoring bucket (e.g., purchase_probability > 0.7). These users then appear in a GA4 custom audience segment that Google Ads can target with higher bids or specific creatives.

Conclusion

GA4 predictive audiences offer a quick path to machine-learning-powered remarketing for properties that meet the data volume requirements. For properties that do not qualify, or for teams that want full control over their predictive models, BigQuery ML provides a SQL-accessible alternative that trains on your specific customer data and produces propensity scores you can use anywhere in your marketing stack. Both approaches answer the same question—who is most likely to buy next week—and focusing your remarketing budget on those users consistently outperforms broad retargeting at significantly lower cost per conversion.