Data Infrastructure Automation

Forge - Your Data Engineer in a Box. Automate schema management, governance, and pipeline maintenance.

Your data team spends 30% of their time maintaining brittle ETL pipelines and chasing schema changes.

Forge uses AI-powered automation to handle schema classification, PII governance, and multi-warehouse deployment—so your engineers can focus on insights, not infrastructure.

300+
Hours Saved/Year
10 min
Time to First Job
4
Warehouses Supported
Start Free Trial
30 days free • No credit card • Deploy in minutes
Source: High-Volume
Event Stream
{ "event_id": "evt_123", "user": { "id": 42, "traits": { "plan": "pro" } }, "items": [ { "sku": "A-1", "price": 10.00 }, { "sku": "B-2", "price": 5.50 } ] }
Output: AI-Governed Schema
(Auto-Classified & Protected)
-- Excalibur: Classified as "customer" data TABLE `events_v1` ( event_id STRING, _ingested_at TIMESTAMP ); -- Pridwen: PII detected → hash applied TABLE `events_v1__user` ( _parent_id STRING, id_hash STRING, -- Protected traits_plan STRING ); -- Auto-normalized child table TABLE `events_v1__items` ( _parent_id STRING, sku STRING, price FLOAT64 );
"Data infrastructure that classifies, governs, and evolves itself."

Our Data Advantage

Platform Intelligence Network

Every schema processed makes Excalibur smarter. Our growing pattern library powers industry-leading classification accuracy.

40K+
Training Patterns
Privacy-preserving fingerprints
475+
API Categories
Stripe, Salesforce, AWS, and more
2.4K+
APIs Analyzed
Growing daily via Bellows

Updated in real-time

View Full Stats →
Excalibur Schema Explorer - 3D Visualization
🔴 LIVE

Interactive Demo

Explore 40,000+ API Patterns in 3D

See how Excalibur classifies real-world API schemas. Toggle between structural similarity (nesting, field types) and field name patterns (domain vocabulary). Search by schema name and find similar patterns across 475+ API categories.

  • Real training data from Stripe, Salesforce, AWS & more
  • Dual projection modes: Structure vs Field Names
  • Find similar schemas with one click
Launch Schema Explorer →

Under The Hood

The AI That Powers Your Savings

Forge's 4-layer intelligence stack eliminates manual work at every stage of your data pipeline—from schema classification to governance to evolution.

⚒️ Forge Core

Production

Multi-Warehouse Compiler

The foundation. Transforms deeply nested JSON into production-ready dbt models with automatic normalization and type inference.

  • BigQuery, Snowflake, Databricks, Redshift
  • 5+ levels deep unnesting
  • Automatic dbt docs & lineage

⚔️ Excalibur

Production

Schema Intelligence & Embeddings

Dual ML models (GNN + RandomForest) classify schemas and generate 128-dim embeddings for similarity search. Privacy-preserving—field names never leave your environment.

  • Dual GNN + RandomForest with A/B testing
  • Embedding API for similarity search
  • Privacy-first fingerprinting

🛡️ Pridwen

Production

Governance ML

Hybrid 3-layer system (Rules + ML + Crowd) that detects PII and recommends transformations like hash, mask, and encrypt.

  • 15 SQL transformation templates
  • Day-1 intelligence out of the box
  • Gets smarter with every customer

🐴 Llamrei

Q3 2026

Schema Evolution

Automatically detects legacy API versions and normalizes them to modern golden schemas. Save $200K-$500K per avoided migration.

  • 50+ API golden schemas
  • Stripe, Salesforce, Shopify support
  • Non-destructive transformations

🧙 Merlin

Q4 2026

Autonomous Agent

"Set up my data pipelines and maintain them." LLM-powered agent that understands goals, plans workflows, and self-heals failures.

  • Natural language commands
  • Multi-step planning & execution
  • Learns from outcomes
Learn more about our AI stack →

The Hidden Cost of Manual Data Infrastructure

Most data teams don't realize how much time and money they're losing to preventable infrastructure problems.

💸

$150K+ per Data Engineer

Average all-in cost. If 30% of their time is maintenance, that's $45K/year wasted per engineer.

2-4 Weeks per Migration

Every warehouse change or API version update requires manual schema translation.

⚠️

Compliance Risk

Manual PII governance is error-prone. One missed field can mean GDPR/CCPA fines.

How Forge Solves This

🤖

AI-Powered Automation

Excalibur classifies schemas instantly. Pridwen auto-detects PII. Zero manual rules.

🔄

One Codebase, 4 Warehouses

Write once, deploy to BigQuery, Snowflake, Databricks, or Redshift. No vendor lock-in.

10-Minute Setup

First production job in under 10 minutes. No infrastructure to manage.

📊 Calculate Your Savings

See how much Forge could save your team based on headcount and current pipeline complexity.

5 Engineers
→ Save ~$225K/year
10 Engineers
→ Save ~$450K/year
20+ Engineers
→ Save ~$900K+/year
Talk to Sales About ROI

🧠 AI Classification

Excalibur's Graph Neural Network instantly understands your data patterns. Privacy-preserving design means field names never leave your environment.

🛡️ Automatic Governance

Pridwen detects PII and recommends transformations like hash, mask, and encrypt. Day-1 intelligence that gets smarter with every customer.

🔄 Multi-Warehouse

One JSON source compiles to optimized SQL for BigQuery, Snowflake, Databricks, and Redshift. Write once, deploy anywhere.

One JSON Source → Four Optimized Warehouses

Forge transforms nested JSON into query-ready native structs—ready to traverse with dot notation.
60-90% cost savings on repeated queries vs parsing raw JSON every time.

// Raw JSON (parsed every query = expensive)
get_json_object(get_json_object(root, '$.patient'), '$.name')
⬇️ Forge Rollup (native struct traversal)
// Query any warehouse with dot notation
frg.root.patient[0].name
BigQuery Snowflake Databricks Redshift
Compute Costs
60-90% savings
Query Speed
15x faster
Nesting Depth
5+ levels auto

How is Forge Different?

Traditional ETL tools replicate structured data. Forge understands, governs, and transforms any data—with AI intelligence built in.

Capability Traditional ETL (Fivetran, Stitch) Forge
JSON Handling Loads raw JSON into a single `VARIANT` column. Requires manual parsing. Automatically unnests nested objects and arrays into clean, queryable tables with proper keys.
Schema Intelligence No understanding of data semantics. Schemas are just column names. Excalibur GNN classifies data patterns (payment, customer, inventory) to enable smart defaults.
Data Governance Manual PII detection. Compliance is your problem. Pridwen automatically detects PII and recommends hash, mask, or encrypt transformations.
Multi-Warehouse Separate connectors/config per warehouse. Limited dialect support. One source → simultaneous output for BigQuery, Snowflake, Databricks, and Redshift with optimized SQL.
Schema Evolution Pipelines break when fields change. Manual intervention required. Automatically detects and adapts to schema changes—no downtime. (Llamrei coming Q3 2026)

Focus on Analytics, Not Engineering

Building a robust JSON processing pipeline is a significant engineering effort. Here's how Forge accelerates your time to insight.

Data Engineering Task Manual Process (Time Estimate) With Forge
Initial Schema Discovery Write scripts to scan data, identify fields, and determine data types. (2-4 hours) Automatic. Done in minutes during the first run.
Data Classification Manually review schemas to understand data semantics. (1-2 hours) Excalibur AI. Instant classification.
PII Detection & Governance Security review, manual tagging, compliance documentation. (1-3 days) Pridwen AI. Automatic detection and transformation recommendations.
Write Parsing & Unnesting Logic Develop and debug complex SQL or Python code. (1-3 days) Automatic. Forge handles all unnesting logic.
Total Time to Value Days to Weeks Under 15 Minutes

Security & Compliance: Your Cloud, Your Rules

For organizations with strict data residency and security requirements, Forge Enterprise offers a unique deployment model that keeps you in control.

Requirement Traditional SaaS Forge Enterprise
Data Location Sent to vendor's cloud Stays in YOUR cloud
AI/ML Privacy Your data trains their models Privacy-first: only structural fingerprints
Security Review Time 3-6 months 2-4 weeks
Works with PII/PHI Requires extensive compliance Yes, by architecture

From Black Box to Glass Box: Complete Transparency

Forge isn't just a parser; it's a fully-managed, transparent data cataloging system with AI intelligence. Every run generates rich metadata that gives you unprecedented visibility.

📜

Automatic dbt Docs

Forge uses dbt Core to power its transformations. After every run, it generates and hosts a full dbt documentation site for your project.

🔍

Complete Code Transparency

Ever wonder what a transformation tool is actually doing? With Forge, you can inspect the exact SQL code used to generate every table.

🗺️

End-to-End Data Lineage

The generated dbt docs include a complete, interactive DAG showing how data flows from your raw JSON sources to the final tables.

🌐

Write Once, Deploy Anywhere

Forge generates native dbt models for BigQuery, Snowflake, Redshift, and Databricks simultaneously from one JSON source.

🛠️

Full dbt Infrastructure

Get production-ready dbt models executed automatically—no separate dbt Cloud subscription needed (saves $300-500/month).

Get Started with Forge

💰 Transform JSON into AI-governed, query-ready tables for BigQuery, Snowflake, Databricks, or Redshift. Most customers see 3-5x ROI in the first month through compute savings alone.

Start Your Free Trial

🎁

30-Day Free Trial

Full access to test Forge

🧠

AI Features Included

Excalibur + Pridwen in trial

💳

No Credit Card

Start immediately

Start Free Trial →

Pricing Options

Basic

$99/month

5 tables, 1 user, basic transforms.

Starter

$499/month

25 tables, 3 users, community support.

Professional

$1,999/month

100 tables, 10 users, API + scheduled jobs.

Enterprise

Custom pricing

Unlimited, SSO, SLA. Contact sales.

View full pricing details →