Running LLMs with Docker Desktop
The coming release of Docker Desktop from Docker, Inc is going to include the Docker Model Runner - a new experimental feature that enables the run of Large Language Models natively with Docker Desktop. This feature will enable users to run LLMs via the CLI following the common Docker workflow:
Of course, the LLMs' performance is derived from the model size and resources available locally.
The feature is specifically optimized to utilize Apple Silicon Mac's GPU resources for efficient model inference. Windows support with NVIDIA GPU is coming soon. (early April 2025).
How does it work?
The Docker Model Runner runs AI models using a host-installed inference server. It leverages llama.ccp framework on the backend to serve the models. This enables the server to "talk" directly with the hardware GPU acceleration on Apple Silicon. Therefore, it enables better optimization of local resources and faster deployment, as opposed to running LLM inside a container.
Getting Started with Docker Model CLI
Currently, the Docker Model Runner (Beta) is available only for Mac with Apple silicon (i.e., M1/M2/M3/M4). The installation of the Docker Model Runner is straightforward, and it simply requires the installation (or update if already installed) of Docker Desktop version 4.40 and above.
As this feature is currently in Beta, you will have to enable it first. On the Docker Desktop settings menu, go to the Features in development option (marked in pink) and select the Enable Docker Model Runner option (marked in yellow):
To validate that the Model Runner is set and working, go to the CLI and use the docker model status commend to validate that the feature is working:
The --help argument will return the available commands, which are straightforward and intuitive.
Download a Model
Now that the feature is up and running, the next step is to download a model from Docker Hub. Currently, the following models are available for download:
This list is expected to expand rapidly in the near future, and you can find the up-to-date models over here.
To download a model, we will use the pull command. For example, let's pull the gemma3 model:
Recommended by LinkedIn
You can get the list of models available locally (e.g., downloaded) using the list command.
Likewise, the inspect command returns the model details in a JSON format:
Launch a Model
The run command enables you to run the model in interactive mode. Let's launch the model and ask it to create a Python function that calculates the sum of two numbers:
Applications
The goal of the Model Runner is to enable Docker users and GenAI application developers to integrate LLM capabilities with their Docker workflow. Hence, the application can be developed and tested locally inside a container, and the code can be shipped seamlessly with the container to the host server.
There are three methods to interact with the Model Runner:
Here is a cool example of an advanced usage of the Model Runner to run a Goose inside a containerized environment:
The Model Runner will be available (Beta) in the coming release of Docker Desktop (version 4.40), which is scheduled to be released next week.
Change Agent und Creator | Bildung (MINT, BNE) | Transformation | Future Skills | Impressum und Datenschutzerklärung: Links in der Kontaktinfo | „Jede Reise über 1000 Meilen beginnt mit dem ersten Schritt.“ (Laozi)
1moI'm running Docker Desktop on my Mac for a while and it works quite well. The actual Ollama container runs also fast. Please take a look at the attached screenshots
#GettingThingsDone #Software #Creator #ProcessInnovation
1moCool 😃
CEO UnOpen.Ai | exCEO Cognitive.Ai | Building Next-Generation AI Services | Available for Podcast Interviews | Partnering with Top-Tier Brands to Shape the Future
1moHow might Docker Model Runner transform our development processes for GenAI applications? Local integration streamlines workflows significantly, paving the way for innovative solutions. #docker