Pridwen Data Governance
Privacy-preserving collective intelligence for data classification and transformations
The Shield That Learns
Imagine if every data team's knowledge about PII detection, field classifications, and transformation best practices could be shared across the entire Forge community—without compromising anyone's privacy or revealing sensitive schema details.
That's Pridwen: Like King Arthur's legendary shield that protected him in battle, Pridwen is a privacy-preserving collective intelligence layer that shields your sensitive data while learning from the community to provide smarter defaults, better recommendations, and automated compliance checks for everyone.
Field-Level Annotations
Mark fields as PII, sensitive, or public. Classify data by regulatory requirements (GDPR, HIPAA, CCPA). Your annotations are aggregated with the community to provide consensus-based recommendations.
Transformation Library
Community-contributed dbt transformations for common patterns: hashing, encryption, masking, tokenization. Vote on the best approaches and share your own solutions.
Smart Query Suggestions
Based on your schema patterns, get recommended analytical queries that thousands of other users have found valuable. Pre-built metrics for MRR, LTV, churn, and more.
Network Effects
The more users contribute, the smarter the system becomes. Every annotation, transformation, and query strengthens recommendations for the entire community.
Privacy-First Architecture
We use three complementary privacy-preserving techniques to protect your data while enabling collective intelligence:
1. Semantic Pattern Classification
Instead of storing exact field names (like "customer_email" or "user_contact_address"), we classify fields into semantic patterns like "email_pattern" or "address_pattern". This means:
- Nobody knows your actual field names
- Recommendations work across similarly-structured data
- No rainbow table attacks possible
2. k-Anonymity (k=5)
Recommendations are only shown when at least 5 different organizations have contributed annotations for a pattern. This ensures:
- Individual contributions are hidden in the crowd
- No single organization's practices can be identified
- Consensus-based recommendations, not outliers
3. Differential Privacy
Aggregated statistics (like "85% of users mark this as PII") include calibrated statistical noise using the Laplace mechanism. This provides:
- Plausible deniability for individual votes
- Mathematically provable privacy guarantees (ε = 1.0)
- Prevents reverse-engineering of contributions
How Crowdsourced Governance Works
Figure: Privacy-preserving contribution and recommendation flow
Real-World Use Cases
Automated PII Detection
Problem: Data teams spend hours manually identifying PII in new datasets.
Solution: When you ingest a new dataset with email-like fields, Forge automatically suggests: "89% of users classify this as PII. Recommended: Hash with SHA-256 (234 votes)"
GDPR Compliance Templates
Problem: Implementing data retention and deletion policies is complex.
Solution: Community-contributed dbt macros for "right to erasure" workflows, tested by 500+ GDPR-compliant organizations.
SaaS Metric Definitions
Problem: Every SaaS company reinvents MRR, churn, and LTV calculations.
Solution: Pre-built metric definitions from 2,000+ SaaS companies, auto-adapted to your schema patterns with one-click deployment.
Data Quality Tests
Problem: Knowing which dbt tests to write requires deep domain expertise.
Solution: "79% of users apply NOT NULL constraint to email fields. 76% add regex validation for email format." → Auto-generate test suite.
Future Capabilities (Roadmap)
Pre-Built Dashboards
Auto-detect schema type (SaaS, e-commerce, IoT) and suggest complete dashboard templates used by your industry.
Anomaly Detection Rules
Crowdsourced anomaly detection patterns: "Flag duplicate emails with high transaction values" (456 votes).
Smart Join Suggestions
Detect common identifier patterns across tables and suggest how to join them based on community patterns.
Query Optimization
Learn which queries are slow for similar patterns and suggest partitioning, clustering, or materialized views.
Control & Transparency
Enterprise Opt-Out
Enterprise customers can completely disable crowdsourced contributions if required for compliance or privacy policies. You'll still get all other Forge features, but won't contribute to or receive community recommendations.
What's Collected
- Semantic patterns - "email_pattern", "temporal_pattern", etc.
- Aggregated votes - "234 users recommend hashing"
- Hashed organization IDs - For k-anonymity tracking
What's NEVER Collected
- Exact field names - "customer_email", "user_id", etc.
- Actual data values - No PII, no business data
- Query results - Transformed data stays private
- Organization identity - Only SHA-256 hashes stored
Help Shape Pridwen
We're currently refining Pridwen's privacy-preserving architecture.
Have feedback on this approach? Questions about privacy? Want early access to Pridwen?
Share Your Pridwen Feedback