Data Vault Link Stored Procedure
Purpose
Section titled “Purpose”This code-generation template creates a SQL Server stored procedure that loads a Data Vault Link table from a Persistent Staging Area (PSA) source — the physical, table-based counterpart to the Link view template.
A Link records a unique relationship between two or more Hubs; this procedure inserts those relationships from the PSA into the materialised Link table.
Motivation
Section titled “Motivation”In a physical Data Vault, the Link table is the persistent record of which Hubs are related to which. The load procedure needs to:
- Build the Link hash key from the combined business keys that define the relationship.
- Build each referenced Hub hash key from its own business-key columns, so the Link rows can be joined back to the Hubs.
- Insert only previously unseen relationships, in time-of-first-arrival order.
Applicability
Section titled “Applicability”- SQL Server family databases; the template uses Microsoft SQL (T-SQL) syntax.
Design Pattern
Section titled “Design Pattern”Schema Type
Section titled “Schema Type”Output Type
Section titled “Output Type”Implementation guidelines
Section titled “Implementation guidelines”This template is intended to be assigned to Data Object Mappings whose target is a Link Data Object.
The generated procedure:
- Reads an audit-trail
@ModuleInstanceIdparameter. - Calls
CreateLoadWindowin the control framework to determine the date range to process. - Walks the mapping’s
businessKeyDefinitionsto construct two kinds of hash keys:- The first business-key definition is treated as the Link’s own surrogate key, hashed across all participating components.
- Subsequent definitions are the referenced Hub hash keys, each hashed independently.
- Reads candidate rows from the PSA, filters them by load window, and de-duplicates with
MINon the inscription timestamp. - Inserts the resulting rows into the Link table.
Considerations and consequences
Section titled “Considerations and consequences”- Business-key definitions whose
businessKeyClassificationindicates a non-hashed key (for example, a degenerate dimension carried directly into the Link) are emitted as raw column values rather than hash keys. - The de-duplication keeps the earliest inscription timestamp per business-key combination — consistent with the Data Vault rule that a relationship’s “discovery” timestamp is the first time it was observed.
- MD5 is used for all hash keys. Replace it consistently across the surrogate-key build and any downstream join logic if you need a stronger algorithm.
Extensions
Section titled “Extensions”The conventions auditTrailIdColumn, inscriptionTimeStampColumn, and controlFramework* must all be defined on the project. The template also reads datastore / location extensions on the source and target Data Connections.