Goals:

  • Prevent Data Corruption: Implement hard-blocking assertions that stop the data model execution before bad data is written to target tables.
  • Standardize Rules: Define a tiered library of DQ rules (Uniqueness, Nullability, Referential Integrity) applied to critical Intermediate tables.
  • [Bonus] Observability: Implement automated alerting via Google Cloud Monitoring when DQ checks fail.
  • [Bonus] Auditability: Establish a logging mechanism to track the history of assertion passes/failures for trend analysis.
  • [Bonus] Visualization: Build a DQ Dashboard to visualize the health of the pipeline over time.

Plan:

Phase 1: Architecture & Definition

  • Choose tooling (Dataform Assertions vs Great Expectations(or any other DQ framework))
  • Define rule catalog (e.g. uniquekey, notnull, valueinset, freshness, etc.)
  • Choose scope of data objects that would be affected (e.g. Salesforce and SAP sources in intermediate layer)
  • [Bonus] Introduce tiering for both rules and tables (Mark tables that are more important than others)

Phase 2: Implementation

  • Implement inline assertions in .slqx files
  • [Bonus] Manual assertions in separate file checking more complex custom logic
  • [Bonus] Optimize assertions for big partitioned tables to run against latest partitions only
  • [Bonus] CI/CD Integration to ensure assertions run as part of PR validation

Phase 3: Operations & Logging

  • Insert assertions result in a audit log table
  • [Bonus] Setup slack alerts upon failure
  • [Bonus] Create Looker Studio dashboard for monitoring all problems related to DQ executions

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 25

Activity

  • about 3 hours ago: sbuldeev started this project.
  • about 3 hours ago: sbuldeev originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    This project is one of its kind!