Skip to content

Data Vault Hub View

This code-generation template creates a Data Vault Hub object; the physical representation of a Core Business Entity.

The Hub is created as a database view, for easy refactoring and quick prototyping. The view can be used the same way as a database table, and it shows the data as a Hub table would be loaded by various data logistics (‘ETL’) processes.

The intent for this template is to be represent (parts of) the data solution as views, in line with the virtual data warehouse concept. This approach enables data teams to deliver the data solution exactly the same way as when database tables or files would be created and managed through data load processes.

  • SQL Server family databases; the template uses Microsoft SQL (T-SQL) syntax.

This template is inherently idempotent and rerunnable. The design pattern, in summary, specifies that the Hub is a distinct list of business key values to which (internal) data solution keys (‘surrogate key’) are assigned.

To create and maintain this distinct list of keys in a traditional load process, each incoming key is ‘looked up’ in the target Hub table to check if it already exists, or not. This is typically done individually for each data source that has been mapped to the Hub, as an singular data logistics process.

If the key does not yet exist in the Hub, the key is inserted and a data solution surrogate key is assigned. This pattern alone makes it possible to run the process at any time.

In a view, this process is arguably even easier because only the unique key values across all mapped data sources are required to achieve the same result.

The template implements this as a collection of UNION statements, one for each mapped data source. The UNION statement will deduplicate any records that may appear more than once across the data sets.

Further deduplication is implemented through an overarching GROUP BY statement which selects the minimum inscription timestamp. This results in a distinct list of key values, and the first moment they were received by the data solution.

  • This template requires a Persistent Staging Area (PSA) or equivalent immutable transaction data set.
  • The view will be created directly on a PSA dataset, and therefore typically located in the same database as where the PSA resides. If the view is intended to be created in a different database, the template requires both the source and target databases to be accessible. This can be done either via a Linked Server, on-premise cross-database query feature.
  • Using views, performance is always a consideration. Each time the view is called the underlying query is executed. For large data sets, or with insufficient resources, the time it takes to return the data might not meet requirements.
  • The template uses a MD5 hashing algorithm to implement hash keys and record comparison checksums.

Not applicable.

CREATE OR ALTER VIEW [{{conventions.vdwSchemaName}}].[{{dataObjectMappings.0.targetDataObject.name}}]
AS
--
-- Hub View definition for {{dataObjectMappings.0.targetDataObject.name}}.
--
-- This template represents a standard Data Vault style 'Hub' table as a view.
-- The view shows the same data as would otherwise be the case if the table would be created and all data logistics processes run.
--
-- Generated from template '200 Data Vault Hub View'.
--
SELECT hub.*
FROM
(
{{#each dataObjectMappings}}{{#if @first}}
SELECT
HASHBYTES('MD5', {{#each businessKeyDefinitions}} {{#if @first}} {{!-- Create the Surrogate Key using the Business Key and components --}}
{{#each businessKeyComponentMappings}}
ISNULL(RTRIM(CONVERT(NVARCHAR(100), {{targetDataItem.name}})), 'N/A') + '#~!'{{#unless @last}} +{{/unless}}
{{/each}}
) AS {{surrogateKey}},{{/if}}{{/each}}
-1 AS {{../conventions.etlProcessAttribute}}, {{!-- List out the ETL Framework attributes --}}
MIN({{../conventions.loadDateTimeAttribute}}) AS {{../conventions.loadDateTimeAttribute}},
--{{../conventions.recordSourceAttribute}},{{#each businessKeyDefinitions}} {{!-- The Hub business key --}}
{{#each businessKeyComponentMappings}}
{{targetDataItem.name}},{{/each}}{{/each}}
ROW_NUMBER() OVER (PARTITION BY {{#each businessKeyDefinitions}}{{#each businessKeyComponentMappings}} {{!-- Create a row number to enable selection only the earliest row, ordered by date/time --}}
{{targetDataItem.name}}{{#unless @last}},{{/unless}}{{/each}}
{{/each}}
ORDER BY
MIN({{../conventions.loadDateTimeAttribute}})
) AS ROW_NR
FROM
( {{/if}}{{/each}}
{{#each dataObjectMappings }}
{{#if enabled}}{{#stringcompare enabled "true"}}
SELECT {{#each ../businessKeyDefinitions}}{{#each businessKeyComponentMappings}}
CAST({{sourceDataItems.0.name}} AS NVARCHAR(100)) AS {{targetDataItem.name}},{{/each}}{{/each}}
MIN({{../../conventions.loadDateTimeAttribute}}) AS {{../../conventions.loadDateTimeAttribute}}
FROM [{{lookupExtension sourceDataObjects.0.extensions "datastore"}}].[{{lookupExtension sourceDataObjects.0.extensions "location"}}].[{{sourceDataObjects.0.name}}]
WHERE {{#each businessKeyDefinitions}}{{#each businessKeyComponentMappings}}
{{sourceDataItems.0.name}} IS NOT NULL{{#unless @last}} AND{{/unless}}{{/each}}{{/each}}{{#if filterCriterion}}{{#stringcompare filterCriterion ""}}{{else}}AND {{filterCriterion}}{{/stringcompare}}{{/if}}
GROUP BY
{{#each businessKeyDefinitions}}{{#each businessKeyComponentMappings}}{{#each sourceDataItems}}{{!-- Exclude hard-coded values from the group by --}}{{#if extensions}}{{#each extensions}}{{#stringcompare key "isHardCodedValue"}}{{/stringcompare}}{{/each}}{{else}} {{name}}{{#unless @last}},{{/unless}}{{/if}}{{/each}}{{/each}}{{/each}}
--
{{#unless @last}}UNION{{/unless}}
{{/stringcompare}}{{/if}}
{{/each}}
) HUB_selection {{#each dataObjectMappings}}{{#if @first}}
GROUP BY {{#each businessKeyDefinitions}}{{#each businessKeyComponentMappings}}
{{targetDataItem.name}}{{#unless @last}},{{/unless}}{{/each}}{{/each}}
--{{../conventions.recordSourceAttribute}} {{/if}}{{/each}}
) hub
WHERE ROW_NR = 1
UNION
SELECT
0x00000000000000000000000000000000,
- 1,
'1900-01-01',
--'Data Warehouse',{{#each dataObjectMappings}}{{#if @first}}{{#each businessKeyDefinitions}}{{#each businessKeyComponentMappings}}
'Unknown',{{/each}}{{/each}}{{/if}}{{/each}}
1 AS ROW_NR