Replace PySpark Notebooks in Microsoft Fabric using Livy API - No More Clicking Around the UI

Replace PySpark Notebooks in Microsoft Fabric using Livy API - No More Clicking Around the UI

The Problem

Microsoft Fabric is a super nice platform that supports Spark-based workloads on top of the Lakehouse architecture. It provides integrated notebooks for running PySpark code but these notebooks are tied to the UI and are not designed for repeatable, automated, or production-grade execution workflows.

In many real-world data engineering and analytics scenarios, teams need the ability to:

  • Trigger Spark jobs automatically as part of a pipeline
  • Submit jobs from external systems such as APIs, schedulers, or CI/CD pipelines
  • Version and manage Spark code with Git, like any other software component
  • Reuse, test, and modularize PySpark logic
  • Avoid manual interaction with UI-based notebooks

However, Fabric notebooks are not ideal for these cases yet. They require a UI, cannot be triggered programmatically with ease..

And, this makes automation, deployment, and maintenance of Spark workloads difficult. If you are working on this, hope you will agree with me :D


The Solution

Microsoft Fabric provides a REST interface called the Livy API, which allows users to submit and execute Spark code directly against a Fabric Lakehouse without needing to create or manage notebook artifacts.

By using the Livy Batch API, you can submit PySpark code to Microsoft Fabric entirely through Python code. This means Spark jobs can be triggered programmatically, integrated into orchestration pipelines, and executed remotely without user interaction.

In this setup:

  • PySpark code is written in a .py file
  • The py file is placed inside Lakehouse, and it's abfss path needs to be sent via "file" param in Payload.
  • Microsoft Fabric executes the job, with Spark automatically attached to the specified Lakehouse
  • Azure CLI authentication is used to keep it simple and secret-free but ofcourse, you can change to use Service principal authentication for production.


What Is Livy and Why It Matters in Fabric

Microsoft Fabric exposes a Livy-compatible REST API that lets you submit Spark jobs via HTTP. This means you can:

Run Spark code from a Python script, CLI, or Azure Function

Fully automate Spark workloads

Skip the GUI and directly integrate with deployment pipelines

For teams, it mean that you could:

  • Use Git + VS Code for Spark code versioning
  • Run Spark jobs from a scheduler or CI agent
  • Monitor and retry jobs programmatically

All without opening a notebook.


Types of Livy Jobs in Microsoft Fabric

The Fabric Livy API supports two types of jobs:

Session Jobs

Session jobs involve creating a persistent Spark session that remains active across multiple commands. These are useful for interactive workloads where state or cached data must be preserved between executions. The session ends after 20 minutes of inactivity or when terminated explicitly.

Batch Jobs

Batch jobs are for one-off execution. Each job is isolated and starts its own Spark session. This is ideal for production-style pipelines where each job is independent, stateless, and designed to run to completion.

In this article, we are using the Batch job approach, which aligns best with automated, programmatic workloads.


Complete Python Code: Submit a Spark Job to Fabric Using Livy Batch API

You can get the Livy API endpoint URL from your lakehouse like below:


Article content

Python Code to sent request to API:

Article content
Article content

sample_spark_code.py -- This should be placed inside Lakehouse and It's abfss path needs to be sent to LivyAPI

Article content

Code Explanation

Authentication

The client uses AzureCliCredential to authenticate. This means the user must be logged in to Azure via the CLI (az login), and no secrets are stored in the code. This is a safe and developer-friendly choice for interactive and automation environments.

Spark Code Submission

Instead of submitting a JAR or referencing a notebook, we read the .py file path as a string and send it as part of the Livy batch configuration.

Lakehouse Configuration

We pass the spark.targetLakehouse parameter with the Lakehouse name so that Spark knows which OneLake location to attach to. This is required when writing data to Fabric Lakehouse tables or files.

Livy Batch Submission

A POST request is made to the Livy endpoint for the given Lakehouse. If successful, the response includes a batch_id, which can be used to query the status, logs, or output later if needed.

Session Management

A requests.Session object is used to manage the HTTP connection and closed properly at the end of the submission process.


And lastly, You can ofcourse monitor these jobs submissions in Monitor HUB.

Let me know your thoughts guys.


Sampath Kumar Mani

Snowflake, Microsoft Fabric, Azure Data Engineer

4d

Very informative

Naveen Upadhye

Data engineer | ETL Developer | SQL | Azure Cloud |MS Fabric| Informatica Power Center | AGILE work environment | CAE analyst

1w

Good insight

Harsha Guggilla

Data Engineer | Microsoft Azure and Fabric | Cloudera Hadoop I just love working with data and trying to use it's full potential

1w
Ravindra K

Multi-Cloud Data Engineer | AWS Certified Solutions Architect – Associate | Azure-Fabric | Databricks | Snowflake | Airflow | Spark | Terraform | SQL | Python | GitHub Actions | Power BI | Tableau | ETL Automation

1w

Thanks for sharing, Harsha. Sounds intresting and let's give a try to escape from tha lag

To view or add a comment, sign in

More articles by Harsha Guggilla

Insights from the community

Others also viewed

Explore topics