Updated
about 3 hours
ago.
No love.
1 follower.
Goals:
- Prevent Data Corruption: Implement hard-blocking assertions that stop the data model execution before bad data is written to target tables.
- Standardize Rules: Define a tiered library of DQ rules (Uniqueness, Nullability, Referential Integrity) applied to critical Intermediate tables.
- [Bonus] Observability: Implement automated alerting via Google Cloud Monitoring when DQ checks fail.
- [Bonus] Auditability: Establish a logging mechanism to track the history of assertion passes/failures for trend analysis.
- [Bonus] Visualization: Build a DQ Dashboard to visualize the health of the pipeline over time.
Plan:
Phase 1: Architecture & Definition
- Choose tooling (Dataform Assertions vs Great Expectations(or any other DQ framework))
- Define rule catalog (e.g. uniquekey, notnull, valueinset, freshness, etc.)
- Choose scope of data objects that would be affected (e.g. Salesforce and SAP sources in intermediate layer)
- [Bonus] Introduce tiering for both rules and tables (Mark tables that are more important than others)
Phase 2: Implementation
- Implement inline assertions in .slqx files
- [Bonus] Manual assertions in separate file checking more complex custom logic
- [Bonus] Optimize assertions for big partitioned tables to run against latest partitions only
- [Bonus] CI/CD Integration to ensure assertions run as part of PR validation
Phase 3: Operations & Logging
- Insert assertions result in a audit log table
- [Bonus] Setup slack alerts upon failure
- [Bonus] Create Looker Studio dashboard for monitoring all problems related to DQ executions
Looking for hackers with the skills:
Nothing? Add some keywords!
This project is part of:
Hack Week 25
Comments
Be the first to comment!
Similar Projects
This project is one of its kind!