Skip to content

Data Vault Link View

This code-generation template creates a Data Vault Link object; the physical representation of a Natural Business Relationship.

The Link is created as a database view, for easy refactoring and quick prototyping. The view can be used the same way as a database table, and it shows the data as a Link table would be loaded by various data logistics (‘ETL’) processes.

The intent for this template is to be represent (parts of) the data solution as views, in line with the virtual data warehouse concept. This approach enables data teams to deliver the data solution exactly the same way as when database tables or files would be created and managed through data load processes.

  • SQL Server family databases; the template uses Microsoft SQL (T-SQL) syntax.

The design pattern, in summary, specifies that the Link is a distinct list of business key combinations to which (internal) data solution keys (‘surrogate key’) are assigned.

To create and maintain this distinct list of keys in a traditional load process, each incoming key combination (relationship) is ‘looked up’ in the target Link table to check if it already exists, or not. This is typically done individually for each data source that has been mapped to the Link, as an singular data logistics process.

If the key does not yet exist in the Link, the key is inserted and a data solution surrogate key is assigned. TBecause the lookup will always check if a new Link record can be inserted, this template is inherently idempotent and rerunnable.

In a view, this process is arguably even easier because only the unique combinations of keys across all mapped data sources are required to achieve the same result.

The template implements this as a collection of UNION statements, one for each mapped data source. The UNION statement will deduplicate any records that may appear more than once across the data sets.

Further deduplication is implemented through an overarching GROUP BY statement which selects the minimum inscription timestamp. This results in a distinct list of key values, and the first moment they were received by the data solution.

  • This template requires a Persistent Staging Area (PSA) or equivalent immutable transaction data set.
  • The view will be created directly on a PSA dataset, and therefore typically located in the same database as where the PSA resides. If the view is intended to be created in a different database, the template requires both the source and target databases to be accessible. This can be done either via a Linked Server, on-premise cross-database query feature.
  • Using views, performance is always a consideration. Each time the view is called the underlying query is executed. For large data sets, or with insufficient resources, the time it takes to return the data might not meet requirements.
  • The template uses a MD5 hashing algorithm to implement hash keys and record comparison checksums.

Not applicable.

CREATE OR ALTER VIEW [{{conventions.vdwSchemaName}}].[{{dataObjectMappings.0.targetDataObject.name}}]
AS
--
-- Link View definition for {{dataObjectMappings.0.targetDataObject.name}}.
--
-- This template represents a standard Data Vault style 'Link' table as a view.
-- The view shows the same data as would otherwise be the case if the table would be created and all data logistics processes run.
--
-- Generated from template '200 Data Vault Link View'.
--
SELECT link.*
FROM
( {{#each dataObjectMappings}}{{#if @first}}
SELECT
HASHBYTES('MD5', {{#each businessKeyDefinitions}} {{#unless @first}}{{!-- Create the Surrogate Key using the combined Business Keys --}}{{#each dataItemMappings}}
ISNULL(RTRIM(CONVERT(NVARCHAR(100), [{{targetDataItem.name}}], 'N/A') + '#~!'{{/each}}
{{#each businessKeyComponentMappings}}
ISNULL(RTRIM(CONVERT(NVARCHAR(100), {{targetDataItem.name}}{{@../index }})), 'N/A') + '#~!'{{#unless @last}} +{{/unless}}{{/each}}{{#unless @last}} +{{/unless}}{{/unless}}{{/each}}
{{#each dataItemMappings}} +
ISNULL(RTRIM(CONVERT(NVARCHAR(100), [{{targetDataItem.name}}])), 'N/A') + '#~!'{{/each}}{{#each businessKeyDefinitions}} {{#if @first}} ) AS {{surrogateKey}},{{/if}}{{/each}}
-1 AS {{../conventions.etlProcessAttribute}}, {{!-- List out the ETL Framework attributes --}}
MIN({{../conventions.loadDateTimeAttribute}}) AS {{../conventions.loadDateTimeAttribute}},
{{#each businessKeyDefinitions}}{{#unless @first}} {{!-- List out the separate (Hub) business keys, by skipping the first generic relationship key. --}}
{{#unless businessKeyClassification}}HASHBYTES('MD5',
{{#each businessKeyComponentMappings}}
ISNULL(RTRIM(CONVERT(NVARCHAR(100), {{targetDataItem.name}}{{@../index }})), 'N/A') + '#~!'{{#unless @last}} +{{/unless}}{{/each}}
) AS {{surrogateKey}},
{{/unless}}{{#if businessKeyClassification}}{{#each businessKeyComponentMappings}}{{targetDataItem.name}}{{@../index }}{{/each}} AS {{surrogateKey}},
{{/if}} {{/unless}} {{/each}}{{#each dataItemMappings}}
[{{targetDataItem.name}}],{{/each}}
ROW_NUMBER() OVER (PARTITION BY {{#each businessKeyDefinitions}} {{#unless @first}} {{#each businessKeyComponentMappings}} {{!-- Create a row number to enable selection only the earliest row, ordered by date/time --}}
{{targetDataItem.name}}{{@../index }}{{#unless @last}},{{/unless}}{{/each}}{{#unless @last}},{{/unless}}{{/unless}} {{/each}}
{{#each dataItemMappings}},[{{targetDataItem.name}}]{{/each}}
ORDER BY
MIN({{../conventions.loadDateTimeAttribute}})
) AS ROW_NR
FROM
({{/if}}{{/each}}
{{#each dataObjectMappings }}
SELECT {{#each businessKeyDefinitions}} {{#unless @first}} {{#each businessKeyComponentMappings}}
CAST({{sourceDataItems.0.name}} AS NVARCHAR(100)) AS {{targetDataItem.name}}{{@../index }},{{/each}}{{/unless}}{{/each}}{{#each dataItemMappings}}
[{{sourceDataItems.0.name}}] AS [{{targetDataItem.name}}],{{/each}}
MIN({{../conventions.loadDateTimeAttribute}}) AS {{../conventions.loadDateTimeAttribute}}
FROM [{{lookupExtension sourceDataObjects.0.extensions "datastore"}}].[{{lookupExtension sourceDataObjects.0.extensions "location"}}].[{{sourceDataObjects.0.name}}]
WHERE {{#each businessKeyDefinitions}} {{#unless @first}} {{#each businessKeyComponentMappings}}
{{sourceDataItems.0.name}} IS NOT NULL{{#unless @last}} AND{{/unless}}{{/each}}{{#unless @last}} AND{{/unless}}{{/unless}} {{/each}}
{{#each dataItemMappings}}
AND [{{sourceDataItems.0.name}}] IS NOT NULL
{{/each}}
{{#if filterCriterion}}{{#stringcompare filterCriterion ""}}{{else}}AND {{filterCriterion}}{{/stringcompare}}{{/if}}
GROUP BY
{{#each businessKeyDefinitions}}{{#unless @first}}{{#each businessKeyComponentMappings}}{{#each sourceDataItems}}{{!-- Exclude hard-coded values from the group by --}}{{#if extensions}}{{#each extensions}}{{#stringcompare key "isHardCodedValue"}}{{/stringcompare}}{{/each}}{{else}} {{name}}{{#unless @last}},{{/unless}}{{/if}}{{/each}}{{/each}}{{#unless @last}},{{/unless}}{{/unless}} {{/each}}
{{#each dataItemMappings}},[{{sourceDataItems.0.name}}] -- Degenerate column{{/each}}
{{#unless @last}}UNION{{/unless}}
) LINK_selection
GROUP BY {{#each businessKeyDefinitions}} {{#unless @first}} {{#each businessKeyComponentMappings}}
{{targetDataItem.name}}{{@../index }}{{#unless @last}},{{/unless}}{{/each}}{{#unless @last}},{{/unless}}{{/unless}}{{/each}}
{{#each dataItemMappings}}
,[{{targetDataItem.name}}]{{/each}}{{/each}}
) link
WHERE ROW_NR = 1