Running LLMs with Docker Desktop

Running LLMs with Docker Desktop

The coming release of Docker Desktop from Docker, Inc is going to include the Docker Model Runner - a new experimental feature that enables the run of Large Language Models natively with Docker Desktop. This feature will enable users to run LLMs via the CLI following the common Docker workflow:

  • Pull models from registries (e.g., Docker Hub)
  • Run models locally with GPU acceleration
  • Integrate models into their development workflows

Of course, the LLMs' performance is derived from the model size and resources available locally.

The feature is specifically optimized to utilize Apple Silicon Mac's GPU resources for efficient model inference. Windows support with NVIDIA GPU is coming soon. (early April 2025).

How does it work?

The Docker Model Runner runs AI models using a host-installed inference server. It leverages llama.ccp framework on the backend to serve the models. This enables the server to "talk" directly with the hardware GPU acceleration on Apple Silicon. Therefore, it enables better optimization of local resources and faster deployment, as opposed to running LLM inside a container.

Getting Started with Docker Model CLI

Currently, the Docker Model Runner (Beta) is available only for Mac with Apple silicon (i.e., M1/M2/M3/M4). The installation of the Docker Model Runner is straightforward, and it simply requires the installation (or update if already installed) of Docker Desktop version 4.40 and above.

As this feature is currently in Beta, you will have to enable it first. On the Docker Desktop settings menu, go to the Features in development option (marked in pink) and select the Enable Docker Model Runner option (marked in yellow):

Article content

To validate that the Model Runner is set and working, go to the CLI and use the docker model status commend to validate that the feature is working:

Article content

The --help argument will return the available commands, which are straightforward and intuitive.

Download a Model

Now that the feature is up and running, the next step is to download a model from Docker Hub. Currently, the following models are available for download:

  • ai/gemma3
  • ai/llama3.2
  • ai/qwq
  • ai/mistral-nemo
  • ai/mistral
  • ai/phi4
  • ai/qwen2.5
  • ai/deepseek-r1-distill-llama

This list is expected to expand rapidly in the near future, and you can find the up-to-date models over here.

To download a model, we will use the pull command. For example, let's pull the gemma3 model:

Article content

You can get the list of models available locally (e.g., downloaded) using the list command.

Likewise, the inspect command returns the model details in a JSON format:


Article content

Launch a Model

The run command enables you to run the model in interactive mode. Let's launch the model and ask it to create a Python function that calculates the sum of two numbers:


Article content

Applications

The goal of the Model Runner is to enable Docker users and GenAI application developers to integrate LLM capabilities with their Docker workflow. Hence, the application can be developed and tested locally inside a container, and the code can be shipped seamlessly with the container to the host server.

There are three methods to interact with the Model Runner:

  • From within the container using internal DNS name
  • From the host via the Docker socket
  • From the host via TCP host

Here is a cool example of an advanced usage of the Model Runner to run a Goose inside a containerized environment:

The Model Runner will be available (Beta) in the coming release of Docker Desktop (version 4.40), which is scheduled to be released next week.


Horst Polomka

Change Agent und Creator | Bildung (MINT, BNE) | Transformation | Future Skills | Impressum und Datenschutzerklärung: Links in der Kontaktinfo | „Jede Reise über 1000 Meilen beginnt mit dem ersten Schritt.“ (Laozi)

1mo

I'm running Docker Desktop on my Mac for a while and it works quite well. The actual Ollama container runs also fast. Please take a look at the attached screenshots

  • No alternative text description for this image
Senen Fernandez

#GettingThingsDone #Software #Creator #ProcessInnovation

1mo

Cool 😃

Vincent Valentine 🔥

CEO UnOpen.Ai | exCEO Cognitive.Ai | Building Next-Generation AI Services | Available for Podcast Interviews | Partnering with Top-Tier Brands to Shape the Future

1mo

How might Docker Model Runner transform our development processes for GenAI applications? Local integration streamlines workflows significantly, paving the way for innovative solutions. #docker

To view or add a comment, sign in

More articles by Rami Krispin

Insights from the community

Others also viewed

Explore topics