Skip to content

Data Vault Link Stored Procedure

This code-generation template creates a SQL Server stored procedure that loads a Data Vault Link table from a Persistent Staging Area (PSA) source — the physical, table-based counterpart to the Link view template.

A Link records a unique relationship between two or more Hubs; this procedure inserts those relationships from the PSA into the materialised Link table.

In a physical Data Vault, the Link table is the persistent record of which Hubs are related to which. The load procedure needs to:

  • Build the Link hash key from the combined business keys that define the relationship.
  • Build each referenced Hub hash key from its own business-key columns, so the Link rows can be joined back to the Hubs.
  • Insert only previously unseen relationships, in time-of-first-arrival order.
  • SQL Server family databases; the template uses Microsoft SQL (T-SQL) syntax.

This template is intended to be assigned to Data Object Mappings whose target is a Link Data Object.

The generated procedure:

  1. Reads an audit-trail @ModuleInstanceId parameter.
  2. Calls CreateLoadWindow in the control framework to determine the date range to process.
  3. Walks the mapping’s businessKeyDefinitions to construct two kinds of hash keys:
    • The first business-key definition is treated as the Link’s own surrogate key, hashed across all participating components.
    • Subsequent definitions are the referenced Hub hash keys, each hashed independently.
  4. Reads candidate rows from the PSA, filters them by load window, and de-duplicates with MIN on the inscription timestamp.
  5. Inserts the resulting rows into the Link table.
  • Business-key definitions whose businessKeyClassification indicates a non-hashed key (for example, a degenerate dimension carried directly into the Link) are emitted as raw column values rather than hash keys.
  • The de-duplication keeps the earliest inscription timestamp per business-key combination — consistent with the Data Vault rule that a relationship’s “discovery” timestamp is the first time it was observed.
  • MD5 is used for all hash keys. Replace it consistently across the surrogate-key build and any downstream join logic if you need a stronger algorithm.

The conventions auditTrailIdColumn, inscriptionTimeStampColumn, and controlFramework* must all be defined on the project. The template also reads datastore / location extensions on the source and target Data Connections.