Managing Lakhouse in Microsoft Fabric using python

Managing Lakhouse in Microsoft Fabric using python

Problem

When working with Microsoft Fabric, you’ll likely want to automate the creation and management of various resources. Managing lakehouses is a key part of this process, and you may need to create, update, or delete lakehouses using Python. Relax and grab a cup of tea—I’ll walk you through exactly how to manage lakehouses in Python, covering creation, updates, and deletion.

Solutions

To manage the lakehouse, we’ll use a combination of notebookutils and the semantic-link library, both of which offer extensive functionality for working with Microsoft Fabric. For more details, you can check out the resources linked here and notebookutils.

Before working with any Fabric items, we’ll need the workspace ID, which can be obtained using the following code:

import sempy.fabric as fabric
fabric.list_workspaces()

The code above returns a Pandas DataFrame containing details of all available workspaces.

With the workspace ID in hand, we can proceed to work with Fabric items within the selected workspace.

Creating Lakehouse

To create the lakehouse we can use below code:

fabric.create_lakehouse(display_name="test", workspace="55024177-0fe6-45e6-9bf9-e7569053138f")

# Returns lakehouseID
'229358fa-e041-4371-a160-479805d4eff7'

We can see the created lakehouse in workspace.

If the requested lakehouse already exists, an exception will be raised indicating that the lakehouse is already in use.

We can also use notebookutils to create the lakehouse.

# creating the lakehouse in the workspace using notebookutils
notebookutils.lakehouse.create(name='sales_raw',workspaceId='55024177-0fe6-45e6-9bf9-e7569053138f')

Updating the lakehouse

We can update the lakehouse using the same notebookutils:

Lakhouse name before updating:

notebookutils.lakehouse.update("sales_raw", "sales_updated", "Updated description", '55024177-0fe6-45e6-9bf9-e7569053138f')

Lakhouse name after updating:

Deleting lakehouse

Deleting a lakehouse is as straightforward as creating one using notebookutils. To delete a lakehouse from the workspace, you’ll need its ID. You can then use the following code to remove the lakehouse:

Get the lakehouse ID:

notebookutils.lakehouse.get(name='sales_updated',workspaceId='55024177-0fe6-45e6-9bf9-e7569053138f')
# Returns
`{'id': 'd33ae7b7-6aed-4899-a5ac-bae493a65b50',
 'type': 'Lakehouse',
 'displayName': 'sales_updated',
 'description': 'Updated description',
 'workspaceId': '55024177-0fe6-45e6-9bf9-e7569053138f',
 'properties': {'abfsPath': 'abfss://55024177-0fe6-45e6-9bf9-e7569053138f@onelake.dfs.fabric.microsoft.com/d33ae7b7-6aed-4899-a5ac-bae493a65b50'}}`

Delete the lakehouse:

notebookutils.lakehouse.delete(name='sales_updated',workspaceId='55024177-0fe6-45e6-9bf9-e7569053138f')

Modularized code

We can create modularized code to efficiently manage lakehouses within a workspace. This approach will allow us to structure the code into reusable functions for creating, updating, deleting, and attaching lakehouses, making the management process more organized and scalable.

"""
Author: Vijay Kumar
Email: vijay.kumar.1997@outlook.com
"""

import sempy.fabric as fabric
import pandas as pd
import json

#------------------------Lakehouse Utilities class-------------------#
class Lakehouse_Utilities:
    def __init__(self,workspace_name):
        """
        A utility class to facilitate working with lakehouses in a Microsoft Fabric workspace. This class 
        provides methods to retrieve and interact with lakehouse objects within a given workspace.
        Arguments:
            workspace_name (str): The name of the Microsoft Fabric workspace to interact with.
        """
        if not workspace_name or workspace_name=='':
            raise ValueError("Workspace name is required.")
        self.workspace_name=workspace_name

        # get workspace id
        self.workspaceId=fabric.list_workspaces().query(f"Name=='{workspace_name}'").reset_index(drop=True)['Id'][0]

        # get list of lakehouse present in current workspace
        self.lakehouse_df=pd.DataFrame(notebookutils.lakehouse.list(workspaceId=self.workspaceId))

    def create_lakehouse(self,lakehouse_list):
        """
        Creates one or more new lakehouses in the specified Microsoft Fabric workspace. If any of the lakehouses 
        already exist, it will notify the user and avoid duplicate creation for those specific lakehouses.
        Arguments:
            lakehouse_list (str): A comma-separated string of lakehouse names to be created.
        Behavior:
            - If the workspace has no existing lakehouses, it directly creates each lakehouse in the provided list.
            - If a lakehouse already exists within the workspace, it notifies the user and skips its creation.
            - If a lakehouse does not exist, it creates the lakehouse and prints the lakehouse ID upon successful creation.
        Example:
            lakehouse_utils.create_lakehouse('lakehouse_1,lakehouse_2')
        """
        try:
            lakehouse_list=lakehouse_list.split(",")
            if len(self.lakehouse_df)==0:
                for lakehouse in lakehouse_list:
                    lh = fabric.create_lakehouse(display_name=lakehouse, workspace=self.workspace_name)
                    print(f"Lakehouse {lakehouse} with ID {lh} successfully created")
            else:
                for lakehouse in lakehouse_list:
                    filter_df=self.lakehouse_df.query(f"displayName=='{lakehouse}'")
                    if len(filter_df)==0:
                        lh = fabric.create_lakehouse(display_name=lakehouse, workspace=self.workspace_name)
                        print(f"Lakehouse {lakehouse} with ID {lh} successfully created")
                    else:
                        print(f"Lakehouse {lakehouse} already existed.")       
        except Exception as e:
            raise e

    def update_lakehouse(self, lakehouse_name, new_name, description: str = None):
        """
        Update the properties of a specified lakehouse in the workspace.

        This function attempts to update the name and description of a lakehouse identified by 
        its current name. If the update is successful, the lakehouse will have its properties 
        modified as specified.

        Parameters:
            lakehouse_name (str): The current name of the lakehouse to be updated.
            new_name (str): The new name to assign to the lakehouse.
            description (str, optional): A new description for the lakehouse. If not provided, the description will remain unchanged.

        Raises:
            Exception: If the update operation fails, an exception will be raised to indicate the 
                    error encountered during the update process.
        """
        try:
            notebookutils.lakehouse.update(
                lakehouse_name, 
                new_name, 
                description,
                self.workspaceId
            )
        except Exception as e:
            raise e

    def delete_lakehouse(self,lakehouse_list):
        """
        Deletes one or more specified lakehouses from the Microsoft Fabric workspace.
        Arguments:
            lakehouse_list (str): A comma-separated string of lakehouse names to be deleted.
        Behavior:
            - If no lakehouses exist in the workspace, it informs the user that no lakehouses are available for deletion.
            - For each lakehouse in the provided list:
                - If the lakehouse exists, it is deleted from the workspace, and a confirmation message is printed.
                - If the lakehouse does not exist, it informs the user that the specified lakehouse is not found.
        Example:
            lakehouse_utils.delete_lakehouse('Lakehouse1,Lakehouse2')
        """
        lakehouse_list=lakehouse_list.split(",")
        if len(self.lakehouse_df)==0:
            print(f"No lakehouse exist in workspace {self.workspace_name}.")
        else:
            # check if lakehouse exists
            for lakehouse in lakehouse_list:
                filter_df=self.lakehouse_df.query(f"displayName=='{lakehouse}'")
                if len(filter_df)!=0:
                    notebookutils.lakehouse.delete(name=lakehouse,workspaceId=self.workspaceId)
                    print(f"Lakehouse {lakehouse} deleted.")
                else:
                    print(f"Lakehouse {lakehouse} does not exist.")
    
    def list_lakehouse(self):
        """
        Return the available lakehouse in a workspace.
        """
        return self.lakehouse_df

Please note that this code has been fully self-tested and is functioning correctly in my environment. However, as Fabric continues to evolve, you may need to make updates to the code to accommodate any changes in the platform.

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *