Merlin Schema Evolution Engine - Forge

Merlin Schema Evolution Engine

The Schema Time Machine: Work with tomorrow's data model, even if your application is stuck in yesterday's API

The Future of Forge: This flagship feature eliminates the need for costly API migrations while delivering modern analytics.

What is Merlin?

Merlin is Forge's schema normalization and evolution engine that automatically translates data from ANY version of an API into a modern, canonical "golden schema."

Whether your application is stuck on Stripe API v2019.12.01 or Salesforce from 2015, Merlin intelligently maps legacy fields to their modern equivalents—enabling you to run cutting-edge analytics without touching a single line of production code.

Average ROI: Companies save $200K-$500K by avoiding costly application upgrades while immediately accessing modern data models for analytics.

The Problem Merlin Solves

Every company with legacy applications faces the same painful dilemma:

1 Legacy Application

Built on Stripe API v2019, Salesforce from 2015, or internal microservice v1

2 Cannot Upgrade

Breaking changes would risk production systems, require 6-12 months of engineering

3 Analytics Stuck

Data team forced to work with outdated schemas, manual transformation code everywhere

4 Technical Debt Compounds

Years of brittle mapping code, ongoing maintenance burden, talent drain

Merlin's Breakthrough: Automatically normalize ALL versions of an API to a single golden schema, eliminating the upgrade requirement entirely.

How Merlin Works

graph LR A["Legacy Data
(Any API Version)"]:::input --> B["Version Detection"]:::process B --> C["Field Mapping"]:::process C --> D["Type Conversion"]:::process D --> E["Enrichment"]:::process E --> F["Validation"]:::process F --> G["Golden Schema
(Modern Tables)"]:::output H["Knowledge Base
2000+ APIs"]:::knowledge -.-> C H -.-> D H -.-> E classDef input fill:#dc3545,stroke:#fff,color:#fff classDef process fill:#6c757d,stroke:#fff,color:#fff classDef output fill:#28a745,stroke:#fff,color:#fff classDef knowledge fill:#17a2b8,stroke:#fff,color:#fff

Figure 1: Merlin Pipeline - Legacy data to modern golden schema in milliseconds

The Transformation Process:
Version Detection

Analyzes field structure, presence/absence patterns to identify the exact API version—even without explicit version headers.

Intelligent Mapping

Maps legacy fields to modern equivalents (e.g., Stripe's sourcepayment_method) using pre-built transformation rules.

Type Conversion

Handles breaking type changes (string → object, arrays → nested structures) while preserving data integrity.

Smart Enrichment

Infers new required fields from context, adds metadata about transformation confidence, and preserves audit trails.

Multi-Version Support

Process data from Stripe API v2019, v2020, v2021, v2022, v2023, v2024, v2025—all normalized to the latest golden schema automatically. No code changes required.

40+ Versions Auto-Detect Single Output

Confidence Scoring

Every field mapping includes a confidence score (0-1.0). Low-confidence transformations are flagged for manual review, ensuring data quality while maximizing automation.

Automated audit trails
Manual review workflows

Bidirectional Translation

Translate modern schemas back to legacy formats for backward compatibility. Enable zero-downtime migrations where applications continue using old APIs while warehouses use modern schemas.

Legacy → Modern Modern → Legacy

Golden Schema Library

Pre-built transformation rules for 2,000+ public APIs from APIGurus, plus community-contributed mappings. Define custom golden schemas for proprietary internal APIs.

Stripe, Salesforce, HubSpot, Shopify
Custom schema support

Example: Stripe Payment Intent Evolution

Legacy Schema (2019-12-01)
{
"id": "pi_123",
"amount": 1000,
"currency": "usd",
"source": "card_456"  // DEPRECATED
}
Modern Schema (2025-01-16)
{
"id": "pi_123",
"amount": 1000,
"currency": "usd",
"payment_method": "pm_789", // NEW
"payment_method_types": ["card"]
}
Merlin Transformation
{
"id": "pi_123",
"amount": 1000,
"currency": "usd",
"payment_method": "MAPPED_FROM_SOURCE_card_456",  // Auto-mapped!
"payment_method_types": ["card"],  // Inferred
"_merlin_source_version": "2019-12-01",  // Metadata
"_merlin_confidence": 0.95  // Quality score
}
Result: Your analytics queries use modern field names (payment_method) while your application continues using the 2019 API (source). Zero production risk.

When to Use Merlin

Scenario: E-commerce platform built on Stripe API v2018

Challenge: Need modern analytics (MRR, churn, cohorts) but upgrading would cost $300K+ and risk production payments

Merlin Solution: Point Merlin at legacy webhook data → auto-normalize to 2025 golden schema → analytics team works with modern model

ROI: Save $150K/year in engineering time, eliminate technical debt

Scenario: B2B SaaS with 500 customers each on different API versions (v1, v2, v3)

Challenge: Supporting 3 parallel pipelines, can't deprecate old versions, cross-customer analytics impossible

Merlin Solution: Normalize ALL customer data to v3 golden schema → single analytics pipeline → finally sunset v1/v2

Value: Unlock multi-tenant benchmarking, reduce infrastructure costs by 66%

Scenario: Company acquires competitor with different Salesforce schema (custom fields, relationships)

Challenge: Need unified board reporting, traditional migration costs $500K-$2M and takes 6-12 months

Merlin Solution: Define golden "Customer" schema → Merlin maps both companies' data automatically → unified analytics in 2 weeks

ROI: Save $450K-$1.95M, accelerate integration timeline by 10x

Scenario: Implementing Palantir Foundry with 50+ data sources spanning 10 years

Challenge: Each source evolved independently, ontology mapping requires normalized schemas

Merlin Solution: Auto-normalize all 50 sources to golden schemas → consistent field naming → straightforward ontology mapping

Impact: Cut implementation time from 9 months to 4 months (50% faster)

The Complete Forge Intelligence Stack

graph TB A["Data Sources
(Any API Version)"]:::input --> B["Forge Core
Parse JSON → Tables"]:::layer1 B --> C["Excalibur
Pattern Recognition"]:::layer2 C --> D["Pridwen
PII Classification"]:::layer3 D --> E["MERLIN
Schema Normalization"]:::layer4 E --> F["Modern Warehouse
(BigQuery, Snowflake, Databricks)"]:::output G["Golden Schemas
2000+ APIs"]:::knowledge -.-> E classDef input fill:#6c757d,stroke:#fff,color:#fff classDef layer1 fill:#17a2b8,stroke:#fff,color:#fff classDef layer2 fill:#ffc107,stroke:#fff,color:#000 classDef layer3 fill:#e83e8c,stroke:#fff,color:#fff classDef layer4 fill:#6610f2,stroke:#fff,color:#fff classDef output fill:#28a745,stroke:#fff,color:#fff classDef knowledge fill:#fd7e14,stroke:#fff,color:#fff

Figure 2: Merlin as the capstone of the Forge intelligence stack

Merlin works seamlessly with the entire Forge ecosystem:

  • Forge Core: Parses any JSON structure into relational tables
  • Excalibur: Detects API patterns and suggests field mappings for Merlin
  • Pridwen: Classifies PII after schema normalization for compliance
  • Merlin: Translates all versions to golden schemas for consistent analytics
Together: The only data platform that handles parsing, intelligence, governance, AND schema evolution in a single unified workflow.

Technical Implementation

Transformation Approach:
Rule-Based

Hand-crafted transformation rules for top 50 APIs (Stripe, Salesforce, etc.)

Phase 1 80% Coverage
ML-Powered

Semantic similarity and pattern matching for unknown proprietary schemas

Phase 2 15% Coverage
Community

User-contributed mappings and corrections improve accuracy over time

Phase 3 5% Coverage
Golden Schema Sources:
  • APIGurus: 2,000+ public API specifications with version history
  • Community Library: User-defined golden schemas (privacy-preserving)
  • Auto-Learning: ML models trained on millions of transformations
Metadata Preservation: Merlin always includes _merlin_source_version, _merlin_confidence, and _merlin_mapping_rules fields for full auditability.

Pricing

Standard Schemas
Included

50+ pre-built API golden schemas (Stripe, Salesforce, HubSpot, Shopify, etc.)

  • Automatic version detection
  • Bidirectional translation
  • Confidence scoring
Custom Schemas
$2K/mo

Per custom golden schema for proprietary internal APIs

  • Define your own golden schema
  • ML-powered mapping suggestions
  • Manual review workflows
Enterprise
$10K/mo

Multi-team governance with SLA guarantees

  • Unlimited custom schemas
  • Dedicated schema architect
  • 99.9% transformation SLA
ROI Comparison: Traditional schema migration costs $200K-$500K. Merlin delivers the same outcome (modern analytics) for $2K-$24K/year—a 10x-25x cost reduction.

Getting Started

Ready to eliminate your schema migration backlog?

  1. Identify Legacy Sources: Find applications stuck on old API versions
  2. Select Golden Schema: Choose from 50+ pre-built schemas or define your own
  3. Connect Data: Point Forge at your legacy webhooks, database exports, or API responses
  4. Validate Transformations: Review confidence scores and approve mappings
  5. Query Modern Schema: Start using modern field names in your analytics immediately

Frequently Asked Questions

Merlin always preserves your original data—transformations are non-destructive. Low-confidence mappings are flagged for manual review. You can override any automatic mapping, and Merlin learns from your corrections to improve future transformations.

Yes! While we have pre-built golden schemas for 50+ public APIs, you can define custom golden schemas for your internal microservices. Merlin's ML engine will suggest mappings based on field naming and type patterns.

Yes! Merlin transformations happen at the Forge layer, before warehouse deployment. The normalized golden schemas work seamlessly with BigQuery, Snowflake, Databricks, Redshift, and any other target supported by Forge.

dbt is excellent for transforming data that's already in your warehouse, but you still need to write manual transformation code for every API version. Merlin automates this entirely—no SQL or Python required. Think of Merlin as "dbt for schema evolution," with 2,000+ pre-built transformations out of the box.