Pridwen Data Governance - Forge

Pridwen Data Governance

Privacy-preserving collective intelligence for data classification and transformations

Coming Soon: This feature is currently in development.

The Shield That Learns

Imagine if every data team's knowledge about PII detection, field classifications, and transformation best practices could be shared across the entire Forge community—without compromising anyone's privacy or revealing sensitive schema details.

That's Pridwen: Like King Arthur's legendary shield that protected him in battle, Pridwen is a privacy-preserving collective intelligence layer that shields your sensitive data while learning from the community to provide smarter defaults, better recommendations, and automated compliance checks for everyone.

The Paradox: Full transparency within your organization, cryptographic privacy when sharing externally.

Field-Level Annotations

Mark fields as PII, sensitive, or public. Classify data by regulatory requirements (GDPR, HIPAA, CCPA). Your annotations are aggregated with the community to provide consensus-based recommendations.

Example: 342 users marked "email-like fields" as PII → Forge auto-suggests masking for new email fields

Transformation Library

Community-contributed dbt transformations for common patterns: hashing, encryption, masking, tokenization. Vote on the best approaches and share your own solutions.

Example: "Hash email addresses (SHA-256)" has 234 votes → Auto-generate dbt macro on one click

Smart Query Suggestions

Based on your schema patterns, get recommended analytical queries that thousands of other users have found valuable. Pre-built metrics for MRR, LTV, churn, and more.

Example: Your schema has subscription data → "1,247 users run this MRR query weekly"

Network Effects

The more users contribute, the smarter the system becomes. Every annotation, transformation, and query strengthens recommendations for the entire community.

Result: New users get instant best practices from thousands of experienced data teams

Privacy-First Architecture

We use three complementary privacy-preserving techniques to protect your data while enabling collective intelligence:

1. Semantic Pattern Classification

Instead of storing exact field names (like "customer_email" or "user_contact_address"), we classify fields into semantic patterns like "email_pattern" or "address_pattern". This means:

  • Nobody knows your actual field names
  • Recommendations work across similarly-structured data
  • No rainbow table attacks possible
2. k-Anonymity (k=5)

Recommendations are only shown when at least 5 different organizations have contributed annotations for a pattern. This ensures:

  • Individual contributions are hidden in the crowd
  • No single organization's practices can be identified
  • Consensus-based recommendations, not outliers
3. Differential Privacy

Aggregated statistics (like "85% of users mark this as PII") include calibrated statistical noise using the Laplace mechanism. This provides:

  • Plausible deniability for individual votes
  • Mathematically provable privacy guarantees (ε = 1.0)
  • Prevents reverse-engineering of contributions
Technical Detail: We use the same privacy techniques as Google's RAPPOR, Apple's Differential Privacy, and the U.S. Census Bureau.

How Crowdsourced Governance Works

graph LR A["User Annotates Field"]:::user --> B["Classify Semantic Pattern"]:::process B --> C["Hash Organization ID"]:::security C --> D{"k ≥ 5?"}:::decision D -->|"No"| E["Store, Don't Show"]:::hidden D -->|"Yes"| F["Add Laplace Noise"]:::privacy F --> G["Show to Community"]:::community H["Future User Sees Field"]:::user --> I["Detect Semantic Pattern"]:::process I --> J{"Match in Library?"}:::decision J -->|"Yes"| G J -->|"No"| K["No Suggestion"]:::normal G --> L["User Applies Recommendation"]:::action L --> M["Vote/Usage Tracking"]:::feedback M --> F classDef user fill:#17a2b8,stroke:#fff,color:#fff classDef process fill:#6c757d,stroke:#fff,color:#fff classDef security fill:#fd7e14,stroke:#fff,color:#fff classDef decision fill:#ffc107,stroke:#fff,color:#000 classDef hidden fill:#dc3545,stroke:#fff,color:#fff classDef privacy fill:#6610f2,stroke:#fff,color:#fff classDef community fill:#28a745,stroke:#fff,color:#fff classDef normal fill:#6c757d,stroke:#fff,color:#fff classDef action fill:#20c997,stroke:#fff,color:#fff classDef feedback fill:#e83e8c,stroke:#fff,color:#fff

Figure: Privacy-preserving contribution and recommendation flow

Real-World Use Cases

Automated PII Detection

Problem: Data teams spend hours manually identifying PII in new datasets.

Solution: When you ingest a new dataset with email-like fields, Forge automatically suggests: "89% of users classify this as PII. Recommended: Hash with SHA-256 (234 votes)"


GDPR Compliance Templates

Problem: Implementing data retention and deletion policies is complex.

Solution: Community-contributed dbt macros for "right to erasure" workflows, tested by 500+ GDPR-compliant organizations.


SaaS Metric Definitions

Problem: Every SaaS company reinvents MRR, churn, and LTV calculations.

Solution: Pre-built metric definitions from 2,000+ SaaS companies, auto-adapted to your schema patterns with one-click deployment.


Data Quality Tests

Problem: Knowing which dbt tests to write requires deep domain expertise.

Solution: "79% of users apply NOT NULL constraint to email fields. 76% add regex validation for email format." → Auto-generate test suite.

Future Capabilities (Roadmap)

Pre-Built Dashboards

Auto-detect schema type (SaaS, e-commerce, IoT) and suggest complete dashboard templates used by your industry.

Anomaly Detection Rules

Crowdsourced anomaly detection patterns: "Flag duplicate emails with high transaction values" (456 votes).

Smart Join Suggestions

Detect common identifier patterns across tables and suggest how to join them based on community patterns.

Query Optimization

Learn which queries are slow for similar patterns and suggest partitioning, clustering, or materialized views.

Control & Transparency

Enterprise Opt-Out

Enterprise customers can completely disable crowdsourced contributions if required for compliance or privacy policies. You'll still get all other Forge features, but won't contribute to or receive community recommendations.

What's Collected
  • Semantic patterns - "email_pattern", "temporal_pattern", etc.
  • Aggregated votes - "234 users recommend hashing"
  • Hashed organization IDs - For k-anonymity tracking
What's NEVER Collected
  • Exact field names - "customer_email", "user_id", etc.
  • Actual data values - No PII, no business data
  • Query results - Transformed data stays private
  • Organization identity - Only SHA-256 hashes stored

Help Shape Pridwen

We're currently refining Pridwen's privacy-preserving architecture.

Have feedback on this approach? Questions about privacy? Want early access to Pridwen?

Share Your Pridwen Feedback