Read the screenplay: FANNIEGATE — $7 trillion. 17 years. The biggest fraud in American capital markets.
#46🤖 AI & AgentsPainful

Turning On Einstein Prediction Builder with Dirty Data

AI trained on garbage data gives you garbage predictions with confidence scores.

What Happened

Client was excited about Einstein Prediction Builder. We turned it on to predict Opportunity close probability. Problem: their historical data was a mess. Won deals were never updated from 'Prospecting' stage. Lost deals sat in 'Negotiation' forever. The AI learned that 'Prospecting' was the best stage to be in because that's where most won deals lived. It gave 90% close probability to brand new leads. The sales VP started making revenue forecasts based on these predictions.

The Wrong Way

Einstein Prediction Builder Setup:
  Object: Opportunity
  Predict: IsWon (Boolean)

  Data Quality:
  → 40% of Won opps never moved past "Prospecting" stage
  → 30% of Lost opps still show "Negotiation" stage
  → CloseDate is the creation date on 60% of records (never updated)
  → Amount is $0 on 25% of Won opps

  Result: "Prospecting" stage = 90% win probability
  Reality: Garbage in, garbage out with a confidence score

The Right Way

// Step 1: Clean your data BEFORE enabling Einstein
// Run data quality report:
SELECT StageName, IsWon, IsClosed, COUNT(Id) ct
FROM Opportunity
WHERE CreatedDate = LAST_N_YEARS:2
GROUP BY StageName, IsWon, IsClosed
ORDER BY StageName

// Step 2: Fix historical data
// - Update Won opps to Closed Won stage
// - Update Lost opps to Closed Lost stage
// - Populate missing Amount values
// - Correct CloseDate to actual close dates

// Step 3: Add validation rules to prevent future dirty data
// - Require Amount before moving past Qualification
// - Auto-update CloseDate when stage = Closed Won/Lost
// - Require Close Reason on Closed Lost

// Step 4: Wait for 2 cycles of clean data (6-12 months)

// Step 5: THEN enable Einstein with clean training data
// Step 6: Validate predictions against actual outcomes monthly

The Lesson

AI amplifies your data quality problems. If your data is dirty, Einstein will confidently make terrible predictions. Clean first, predict later.

Don't make this mistake.

Hire someone who already did.

View Consulting →

Enjoyed this? Get more like it.

Glen's Musings — AI, investing, and building things. Occasional. Free.

More AI & Agents Mistakes