Looker Studio’s native data blending feature has severe limitations: it only supports left outer joins, requires exact field name matches for join keys, and breaks when either source has more than one row per join key dimension combination. Most real-world analytics scenarios involve multiple data sources with mismatched schemas and one-to-many relationships. The solution is to do your blending in BigQuery using proper SQL joins, then connect Looker Studio to the pre-blended BigQuery result. This guide covers the architecture for BigQuery-first blending, common join patterns, and how to build refreshing Looker Studio reports on top.

Why Looker Studio Native Blending Fails

Looker Studio blending evaluates joins at query time, pulling data from each source separately and joining the results in Looker Studio’s own processing layer. This creates several problems: data from each source is aggregated before joining, so you lose row-level detail. If your GA4 source returns 1000 rows and your CRM source returns 50 rows, the join operates on aggregated data rather than individual records. Any many-to-one or many-to-many relationship produces unexpected results or errors. Large datasets cause timeouts because the join happens in Looker Studio’s processing layer, not in BigQuery’s optimized query engine.

The BigQuery View Architecture

Create a BigQuery view (or scheduled query destination table) that performs the join logic. Connect Looker Studio to this single pre-joined source. The view is free to create and query costs only incur when Looker Studio fetches data. For frequently-accessed dashboards, a scheduled query that writes to a destination table is more cost-effective than a view that re-runs the full join on every dashboard load.

-- BigQuery view blending GA4 sessions with CRM revenue data
CREATE OR REPLACE VIEW `your_project.reporting.ga4_crm_blend` AS
SELECT
  g.session_date,
  g.traffic_source,
  g.campaign,
  g.sessions,
  g.conversions,
  COALESCE(c.crm_revenue, 0) AS crm_revenue,
  COALESCE(c.qualified_leads, 0) AS qualified_leads
FROM (
  SELECT
    DATE(TIMESTAMP_MICROS(event_timestamp)) AS session_date,
    traffic_source.source AS traffic_source,
    traffic_source.name AS campaign,
    COUNT(DISTINCT CONCAT(user_pseudo_id, CAST(
      (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING
    ))) AS sessions,
    COUNTIF(event_name = 'purchase') AS conversions
  FROM `your_project.analytics_XXXXXX.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
  GROUP BY session_date, traffic_source, campaign
) g
LEFT JOIN (
  SELECT
    date,
    utm_source,
    utm_campaign,
    SUM(revenue) AS crm_revenue,
    COUNT(*) AS qualified_leads
  FROM `your_project.crm.opportunities`
  WHERE date BETWEEN '2024-01-01' AND '2024-01-31'
  GROUP BY date, utm_source, utm_campaign
) c
ON g.session_date = c.date
  AND g.traffic_source = c.utm_source
  AND g.campaign = c.utm_campaign
img

Handling Date Granularity Mismatches

GA4 data is available by the minute in BigQuery. CRM data may be available daily, weekly, or by deal close date. Mismatched granularity requires explicit aggregation before joining. Always aggregate to the coarser granularity before the join — if your CRM closes deals weekly, aggregate GA4 sessions to weekly totals before joining on week. Joining daily GA4 data to weekly CRM data and then aggregating in Looker Studio produces incorrect results because Looker Studio aggregates the joined records, potentially double-counting.

Refreshing Blended Data Automatically

For production dashboards, create a BigQuery scheduled query that writes the blended result to a destination table daily. This approach eliminates view computation costs, makes dashboard loads instant (reading from a pre-computed table instead of running a join), and lets you add data freshness indicators to Looker Studio by including a last_updated timestamp in the destination table.

-- Scheduled query: daily blend refresh
DELETE FROM `your_project.reporting.ga4_crm_daily`
WHERE session_date = DATE_SUB(@run_date, INTERVAL 1 DAY);

INSERT INTO `your_project.reporting.ga4_crm_daily`
SELECT
  [same join query as view above, filtered to yesterday],
  CURRENT_TIMESTAMP() AS last_updated
FROM ...
WHERE session_date = DATE_SUB(@run_date, INTERVAL 1 DAY)

In Looker Studio, connect to the destination table rather than the view. The report loads instantly because the data is pre-computed. Add a scorecard showing the maximum last_updated value from the table so stakeholders can verify data freshness. Set the Looker Studio report cache to 1 hour to prevent redundant BigQuery queries when multiple users view the dashboard simultaneously — the data only updates once daily anyway, so there is no value in hitting BigQuery more than once per hour.

Multi-Source Attribution Blending

The most powerful application of BigQuery-first blending is combining GA4 attribution data with spend data from Google Ads, Meta, and TikTok. Pull spend data from each platform’s API into BigQuery daily (or use a data pipeline tool like Fivetran or Stitch), then join it to GA4 conversion data on campaign name and date. The resulting table shows ROAS by campaign across all platforms in a single Looker Studio report, without the walled-garden limitations of each platform’s native attribution.

guide

Leave a Comment