Persistent Staging Area — Microsoft Fabric (PySpark)

[!NOTE] This sample is currently in preview. The metadata, conventions, and connection structure are set up for a Microsoft Fabric deployment, with PySpark notebook code generation as the target. Some bundled templates carry SQL Server-style names while the PySpark output templates are being finalised.

The Microsoft Fabric Persistent Staging Area (PSA) starter solution adapts the PSA pattern to Microsoft Fabric, with delta detection executed via PySpark notebooks rather than Stored Procedures.

The PSA is a historized (time-variant) archive of all data changes that were presented to the data solution. It is a foundational part of many data solutions and ensures that downstream solutions can be modified without re-acquiring history from source systems.

What’s in the sample

The sample is based on the same wealth-management case study (SaveMore) as the other PSA samples — a fictional company managing customer plans, offers, memberships, and personalised costings. It provides:

A Source connection representing the operational system.
A Landing Area connection — the intermediate landing layer where raw data is staged.
A Persistent Staging Area connection — the historized archive.
A Control Framework connection — the orchestration metadata store.
Data Objects for every layer (source, landing, PSA), with full data-item definitions, business keys, classifications, and relationships.
Data Object Mappings wiring source → landing → PSA.
Templates for table generation, stored procedures (initial SQL templates included), deployment, documentation, and sample-data seeding.

Why Microsoft Fabric?

Microsoft Fabric brings unified storage (OneLake), Spark compute, and SQL endpoints into a single platform. Implementing a PSA on Fabric typically uses:

OneLake / Lakehouse as the storage layer for the PSA tables.
PySpark notebooks for the delta-detection and load logic, scheduled via Fabric pipelines.
SQL endpoints for ad-hoc query and exposing the PSA to downstream consumers.

Deploying the sample

This sample is preview — generated output is intended as a starting point that you adapt to your Fabric workspace. The general flow is:

Pick the sample from the Marketplace on the Home screen.
Review the bundled metadata on the Data Objects, Connections, and Data Object Mappings screens.
Customize template mappings on the Data Objects screen to point at your preferred Fabric-targeted templates.
Generate output via the Code Generator, then move the resulting notebooks and definitions into your Fabric workspace.

Persistent Staging Area (SQL Server) — The SQL Server variant.
Persistent Staging Area (Snowflake) — The Snowflake variant.
How ADL Works — The metadata-and-templates foundation.

Persistent Staging Area — Microsoft Fabric (PySpark)

What’s in the sample

Why Microsoft Fabric?

Deploying the sample

Related