Merlin Schema Evolution Engine
The Schema Time Machine: Work with tomorrow's data model, even if your application is stuck in yesterday's API
What is Merlin?
Merlin is Forge's schema normalization and evolution engine that automatically translates data from ANY version of an API into a modern, canonical "golden schema."
Whether your application is stuck on Stripe API v2019.12.01 or Salesforce from 2015, Merlin intelligently maps legacy fields to their modern equivalents—enabling you to run cutting-edge analytics without touching a single line of production code.
The Problem Merlin Solves
Every company with legacy applications faces the same painful dilemma:
Built on Stripe API v2019, Salesforce from 2015, or internal microservice v1
Breaking changes would risk production systems, require 6-12 months of engineering
Data team forced to work with outdated schemas, manual transformation code everywhere
Years of brittle mapping code, ongoing maintenance burden, talent drain
How Merlin Works
(Any API Version)"]:::input --> B["Version Detection"]:::process B --> C["Field Mapping"]:::process C --> D["Type Conversion"]:::process D --> E["Enrichment"]:::process E --> F["Validation"]:::process F --> G["Golden Schema
(Modern Tables)"]:::output H["Knowledge Base
2000+ APIs"]:::knowledge -.-> C H -.-> D H -.-> E classDef input fill:#dc3545,stroke:#fff,color:#fff classDef process fill:#6c757d,stroke:#fff,color:#fff classDef output fill:#28a745,stroke:#fff,color:#fff classDef knowledge fill:#17a2b8,stroke:#fff,color:#fff
Figure 1: Merlin Pipeline - Legacy data to modern golden schema in milliseconds
The Transformation Process:
Version Detection
Analyzes field structure, presence/absence patterns to identify the exact API version—even without explicit version headers.
Intelligent Mapping
Maps legacy fields to modern equivalents (e.g., Stripe's source →
payment_method) using pre-built transformation rules.
Type Conversion
Handles breaking type changes (string → object, arrays → nested structures) while preserving data integrity.
Smart Enrichment
Infers new required fields from context, adds metadata about transformation confidence, and preserves audit trails.
Multi-Version Support
Process data from Stripe API v2019, v2020, v2021, v2022, v2023, v2024, v2025—all normalized to the latest golden schema automatically. No code changes required.
Confidence Scoring
Every field mapping includes a confidence score (0-1.0). Low-confidence transformations are flagged for manual review, ensuring data quality while maximizing automation.
Manual review workflows
Bidirectional Translation
Translate modern schemas back to legacy formats for backward compatibility. Enable zero-downtime migrations where applications continue using old APIs while warehouses use modern schemas.
Golden Schema Library
Pre-built transformation rules for 2,000+ public APIs from APIGurus, plus community-contributed mappings. Define custom golden schemas for proprietary internal APIs.
Custom schema support
Example: Stripe Payment Intent Evolution
Legacy Schema (2019-12-01)
{
"id": "pi_123",
"amount": 1000,
"currency": "usd",
"source": "card_456" // DEPRECATED
}
Modern Schema (2025-01-16)
{
"id": "pi_123",
"amount": 1000,
"currency": "usd",
"payment_method": "pm_789", // NEW
"payment_method_types": ["card"]
}
Merlin Transformation
{
"id": "pi_123",
"amount": 1000,
"currency": "usd",
"payment_method": "MAPPED_FROM_SOURCE_card_456", // Auto-mapped!
"payment_method_types": ["card"], // Inferred
"_merlin_source_version": "2019-12-01", // Metadata
"_merlin_confidence": 0.95 // Quality score
}
payment_method)
while your application continues using the 2019 API (source). Zero production risk.
When to Use Merlin
Scenario: E-commerce platform built on Stripe API v2018
Challenge: Need modern analytics (MRR, churn, cohorts) but upgrading would cost $300K+ and risk production payments
Merlin Solution: Point Merlin at legacy webhook data → auto-normalize to 2025 golden schema → analytics team works with modern model
Scenario: B2B SaaS with 500 customers each on different API versions (v1, v2, v3)
Challenge: Supporting 3 parallel pipelines, can't deprecate old versions, cross-customer analytics impossible
Merlin Solution: Normalize ALL customer data to v3 golden schema → single analytics pipeline → finally sunset v1/v2
Scenario: Company acquires competitor with different Salesforce schema (custom fields, relationships)
Challenge: Need unified board reporting, traditional migration costs $500K-$2M and takes 6-12 months
Merlin Solution: Define golden "Customer" schema → Merlin maps both companies' data automatically → unified analytics in 2 weeks
Scenario: Implementing Palantir Foundry with 50+ data sources spanning 10 years
Challenge: Each source evolved independently, ontology mapping requires normalized schemas
Merlin Solution: Auto-normalize all 50 sources to golden schemas → consistent field naming → straightforward ontology mapping
The Complete Forge Intelligence Stack
(Any API Version)"]:::input --> B["Forge Core
Parse JSON → Tables"]:::layer1 B --> C["Excalibur
Pattern Recognition"]:::layer2 C --> D["Pridwen
PII Classification"]:::layer3 D --> E["MERLIN
Schema Normalization"]:::layer4 E --> F["Modern Warehouse
(BigQuery, Snowflake, Databricks)"]:::output G["Golden Schemas
2000+ APIs"]:::knowledge -.-> E classDef input fill:#6c757d,stroke:#fff,color:#fff classDef layer1 fill:#17a2b8,stroke:#fff,color:#fff classDef layer2 fill:#ffc107,stroke:#fff,color:#000 classDef layer3 fill:#e83e8c,stroke:#fff,color:#fff classDef layer4 fill:#6610f2,stroke:#fff,color:#fff classDef output fill:#28a745,stroke:#fff,color:#fff classDef knowledge fill:#fd7e14,stroke:#fff,color:#fff
Figure 2: Merlin as the capstone of the Forge intelligence stack
Merlin works seamlessly with the entire Forge ecosystem:
- Forge Core: Parses any JSON structure into relational tables
- Excalibur: Detects API patterns and suggests field mappings for Merlin
- Pridwen: Classifies PII after schema normalization for compliance
- Merlin: Translates all versions to golden schemas for consistent analytics
Technical Implementation
Transformation Approach:
Rule-Based
Hand-crafted transformation rules for top 50 APIs (Stripe, Salesforce, etc.)
Phase 1 80% CoverageML-Powered
Semantic similarity and pattern matching for unknown proprietary schemas
Phase 2 15% CoverageCommunity
User-contributed mappings and corrections improve accuracy over time
Phase 3 5% CoverageGolden Schema Sources:
- APIGurus: 2,000+ public API specifications with version history
- Community Library: User-defined golden schemas (privacy-preserving)
- Auto-Learning: ML models trained on millions of transformations
_merlin_source_version,
_merlin_confidence, and _merlin_mapping_rules fields for full auditability.
Pricing
Standard Schemas
50+ pre-built API golden schemas (Stripe, Salesforce, HubSpot, Shopify, etc.)
- Automatic version detection
- Bidirectional translation
- Confidence scoring
Custom Schemas
Per custom golden schema for proprietary internal APIs
- Define your own golden schema
- ML-powered mapping suggestions
- Manual review workflows
Enterprise
Multi-team governance with SLA guarantees
- Unlimited custom schemas
- Dedicated schema architect
- 99.9% transformation SLA
Getting Started
Ready to eliminate your schema migration backlog?
- Identify Legacy Sources: Find applications stuck on old API versions
- Select Golden Schema: Choose from 50+ pre-built schemas or define your own
- Connect Data: Point Forge at your legacy webhooks, database exports, or API responses
- Validate Transformations: Review confidence scores and approve mappings
- Query Modern Schema: Start using modern field names in your analytics immediately