Lakehouse in Microsoft Fabric

Lakehouse, a modern data architecture that combines the strengths of data lakes and relational data warehouses into a single, unified platform for storing, managing, and analyzing structured, semi-structured, and unstructured data.

Lakehouse is an analytical store that combines the file storage flexibility of a data lake with the SQL-based query capabilities of a data warehouse.

Let’s create a Lakehouse in Microsoft Fabric:

Step 1: Go to the workspace in which we want to create the Lakehouse. Click on the + New item and then click on the Lakehouse from the given options.

Lakehouse in Microsoft Fabric

Let’s give a name to the workspace and click on Create.

Lakehouse in Microsoft Fabric

In the workspace we can see that the Lakehouse is created. It has three components: • Lakehouse is the lakehouse storage and metadata, where we add and interact with files, folders, and table data. • Semantic model (default) is an automatically created data model based on the tables in the lakehouse. Power BI reports can be built from the semantic model. • SQL analytics endpoint is a read-only SQL endpoint, enables us to use SQL to query the tables in the lakehouse and manage its relational data model.

Lakehouse in Microsoft Fabric

Step 2: Click on the Lakehouse to ingest data into a lakehouse. There are many ways to load data into a Fabric lakehouse, including:

Lakehouse in Microsoft Fabric

Click on Upload files. Select the file from the local computer by clicking on the folder icon and then click on Upload.

Lakehouse in Microsoft Fabric

We can see that the file is loaded successfully, as shown in the image below:

Lakehouse in Microsoft Fabric

Step 3: We can create a notebook in the Fabric, which uses the files stored in the lakehouse and do the transformation on that. The notebook enables interactive Spark coding. The SQL analytics endpoint mode doesn't support interactive Spark code.

To create a notebook, click on “New notebook”.

Lakehouse in Microsoft Fabric

We can see a new notebook is created in the workspace and we can add the Data sources or use the one which are in the Lakehouse and Warehouse in the workspace. As we have uploaded one file in the lakehouse, we can see that the one item is added in the Lakehouse.

Lakehouse in Microsoft Fabric

We can select the language, that we use in the notebook.

Lakehouse in Microsoft Fabric

Let’s we have selected Python, now click on the Lakehouses to use the resources from them. In the Files we can see our uploaded file, click on the more option next to the file name, then click on the Load data and then click on the Pandas. You will see that a block of code is added to the notebook.

Lakehouse in Microsoft Fabric

The block of code basically imports the panadas package, read the file and display it on the notebook result. Now it’s time to run the code. To run first click on the Connect, to start the machine on which the code runs.

Lakehouse in Microsoft Fabric

After the session is started, it is changed to Connected. Now click on the run icon of the cell to execute it. We can see the corresponding output, as shown in the image below:

Lakehouse in Microsoft Fabric

We have the option to download the result in the CSV, JSON and XML format, as shown in the image below:

Lakehouse in Microsoft Fabric

We have the option to execute the notebook on a specified schedule. To enable it click on the Run and then click on Schedule.

Lakehouse in Microsoft Fabric

In the following window, we can see we can schedule it on hourly, monthly, weekly, daily and by the minutes also. Also specify the start time and end date of the schedule run. We have the option to select the timezone.

Lakehouse in Microsoft Fabric

At the end click on Apply to save the changes.

Lakehouse in Microsoft Fabric