Forge SaaS Architecture

SaaS Architecture

Serverless, Scalable, Multi-Cloud Processing

Cloud Run Serverless Model

graph TD subgraph Customer [Customer GCP Project] direction TB BQ[BigQuery Datasets] Storage[Cloud Storage - Archives] end subgraph Foxtrot [Foxtrot Communications - SaaS Infrastructure] direction TB Anvil[Anvil UI - Cloud Run] Hammer[Hammer Scheduler - Cloud Run] Forge[Forge Processor - Cloud Run Jobs] PubSub[Cloud Pub/Sub - Event Bus] Firestore[Firestore - Metadata] end %% Flow Anvil -- Trigger --> PubSub Hammer -- Schedule --> PubSub PubSub -- Execute --> Forge Forge -- Read/Write --> BQ Forge -- Archive --> Storage Forge -- Metadata --> Firestore Anvil -- Read --> Firestore

Figure 1: Data stays in your BigQuery. Control plane runs in Foxtrot's infrastructure.

Forge SaaS leverages Google Cloud Run for serverless, auto-scaling execution. Your data never leaves your BigQuery datasets—Forge connects via service account credentials to perform in-place transformations. This minimizes data egress costs while maintaining separation of concerns.

Serverless Scale

Auto-scales from 0 to 1000+ concurrent jobs. Pay only for actual execution time, not idle infrastructure.

Data Stays in Place

Forge reads and writes directly to your BigQuery. Only control plane metadata is stored in Foxtrot's infrastructure.

Multi-Cloud Ready

Generate dbt models for BigQuery, Snowflake, Redshift, and Databricks from a single parse.

How Data Flows

  1. User triggers job via Anvil UI or API
  2. Pub/Sub message enqueued with job profile details
  3. Cloud Run Job starts - Forge container spins up
  4. Forge connects to BigQuery using customer's service account
  5. Parses JSON schema and builds dbt models (in-memory)
  6. Generates cross-warehouse models for Snowflake, Redshift, Databricks
  7. Executes dbt build to transform data in BigQuery
  8. Archives artifacts (models, docs, manifest) to customer's GCS bucket
  9. Job completes - container shuts down, user sees results in Anvil

One Parse, Four Warehouses

Forge doesn't just support multiple warehouses—it generates native dbt models for all four simultaneously from a single parsing job. This is cross-warehouse compilation.

Example: Run one job on your BigQuery data → Download dbt projects for BigQuery, Snowflake, Databricks, and Redshift.
What You Get Per Job Run:
BigQuery models
Native Standard SQL with struct/array handling
Snowflake models
VARIANT types with FLATTEN operations
Databricks models
Delta Lake optimized with explode/posexplode
Redshift models
SUPER type support with JSON path queries
Migration Flexibility: This means you can migrate between warehouses without re-running jobs. All models are already generated and ready to deploy to your target platform.

Security & Access Model

What Foxtrot Can Access:

  • BigQuery datasets (read/write) via service account you provide
  • Cloud Storage buckets (write) for archiving dbt artifacts
  • Job execution metadata (run history, errors) stored in Firestore

What Foxtrot Cannot Access:

  • Your production data outside of specified BigQuery datasets
  • Other GCP resources (Compute, VMs, App Engine)
  • PII/sensitive data - only schema structure is analyzed

Least Privilege Principle:

The service account you provide to Forge should have BigQuery Data Editor and Storage Object Creator roles scoped to only the datasets and buckets Forge needs. This ensures zero lateral movement risk.

SaaS vs. Enterprise

Feature Forge SaaS (Team) Forge Enterprise
Deployment Hosted by Foxtrot Communications Your GCP Project (GKE)
Data Egress Control Plane Access (Data stays in BigQuery) Zero Egress (In-situ)
Execution Engine Cloud Run Jobs Kubernetes Jobs (Long-running)
Multi-Cloud Compilation ✅ 4 Warehouses ✅ 4 Warehouses
VPC Peering N/A Supported
Billing GCP Marketplace (Usage-based) GCP Marketplace (License Fee)
Setup Time 5 minutes 2-4 weeks