Data Vault Satellite Stored Procedure
Purpose
Section titled “Purpose”This code-generation template creates a SQL Server stored procedure that loads a Data Vault Satellite table from a Persistent Staging Area (PSA) source — the physical, table-based counterpart to the Satellite view template.
A Satellite stores the time-variant descriptive attributes attached to a Hub or Link. This procedure compacts incoming data so that only changed records are written, preserving full history without redundancy.
Motivation
Section titled “Motivation”In a physical Data Vault deployment, a Satellite must:
- Detect which incoming PSA rows represent a real change to the attributes (versus a re-arrival of the same values).
- Hash the parent Hub/Link business key into the Satellite’s surrogate key.
- Calculate a row checksum so subsequent loads can compare incoming values to the most recent stored values.
- Insert only the changed rows, with their inscription timestamps.
The generated procedure implements all of that against the PSA, using window functions (LAG) to compare each incoming row against the previous arrival.
Applicability
Section titled “Applicability”- SQL Server family databases; the template uses Microsoft SQL (T-SQL) syntax.
Design Pattern
Section titled “Design Pattern”Schema Type
Section titled “Schema Type”Output Type
Section titled “Output Type”Implementation guidelines
Section titled “Implementation guidelines”This template is intended to be assigned to Data Object Mappings whose target is a Satellite Data Object hanging off a Hub.
The generated procedure:
- Reads an audit-trail
@ModuleInstanceIdparameter and evaluates its load window viaCreateLoadWindow. - Builds the parent surrogate key by hashing the business-key columns.
- Builds a row checksum by hashing the change-data column together with every mapped descriptive attribute.
- Uses a
LAG-based window function (compacting mechanism) to compare each incoming row’s checksum against the previous arrival for the same parent key. - Writes only the rows where the checksum changed.
Considerations and consequences
Section titled “Considerations and consequences”- The compacting (LAG-based change detection) means the procedure can be replayed safely — re-running it for the same window inserts nothing new, because the previous-row comparison already excluded unchanged records.
- The procedure preserves the change-data column (typically
Cfor insert/update,Dfor delete) so downstream consumers can interpret the row’s intent. - MD5 is used for both the surrogate key and the row checksum.
- The Satellite is loaded by a single mapping at a time. A Satellite that aggregates data from multiple sources will need one procedure per source mapping.
Extensions
Section titled “Extensions”The conventions auditTrailIdColumn, inscriptionTimeStampColumn, inscriptionRecordIdColumn, changeDataColumn, checksumColumn, sourceTimestampColumn, and controlFramework* must all be defined on the project.