A better way to setup your Machine Learning enviroment with or without GPU support using Docker (PyTorch and TensorFlow)

A better way to setup your Machine Learning enviroment with or without GPU support using Docker (PyTorch and TensorFlow)

If you’ve started working with machine learning, particularly with neural network libraries like PyTorch and TensorFlow, you might have encountered difficulties in setting up your coding environment perfectly. There are often warnings, code that runs slower than expected, and an overall feeling that something isn’t quite right. This happens because the correctness of your code depends not only on the Python version and library versions but also on the setup of your operating system and other installed libraries in your virtual environment. This challenge becomes even more complex when working with GPU environments, as the drivers and programs involved add another layer of complexity.

The solution to all these issues is to use Docker. Docker is a tool designed to simplify the creation, deployment, and running of applications using containers. Docker provides a reproducible development environment and a comprehensive ecosystem of tools. Docker and virtual environments, such as Python’s virtualenv, both offer isolated environments for running applications. However, the key difference lies in how they achieve this isolation.

A virtual environment, like Python’s virtualenv, only encapsulates Python dependencies. It allows you to switch between Python versions and dependencies easily, but you’re still within your own operating system. In other words, a virtual environment “containerizes” only the Python interpreter and Python libraries. On the other hand, a container refers to a lightweight, standalone, executable package of software that includes all the necessary libraries, configuration files, dependencies, and other components required to run the application. Docker provides a higher level of abstraction and isolation compared to a virtual environment. It can have its own process space, file system, network space, IPC space, and more. Therefore, Docker is closer to a virtual machine than a virtual environment.

If you are already familiar with Docker and have a basic understanding of how it works, the commands will be fairly simple. And if you are new to Docker, you will be able to follow the steps to achieve what you need, but further study on the tool will help you make better use of this article.


Download Docker

The process of downloading Docker will vary depending on your Linux distribution. A simple Google search, such as “install Docker on <linux_distro>,” will lead you to an official article on docs.docker.com. You can also follow the provided link to see if it provides easier instructions. For Windows, you will need to install Docker Desktop (check the provided link).

Enable your GPU to be used on docker (Optional)

This step is optional because you may not have a GPU or may not want to use it. The process for enabling GPU support will depend on your GPU brand and operating system. I will provide two articles that cover NVIDIA GPUs on some Linux distributions and Windows. However, you might need to search for more specific instructions related to your OS and GPU. If you are choosing a Linux distribution to work with machine learning, I recommend Debian.

Linux

Windowns

Pull and Run Images

Images contain instructions for creating the environment. TensorFlow offers multiple options to choose from, including GPU-enabled images and images with Jupyter Notebook preinstalled. PyTorch has a single base image that can be used with or without GPU support, but it does not come with Jupyter Notebook preinstalled. I will demonstrate how to customize your images in later steps.

To pull the images, use the following commands:

docker pull tensorflow/tensorflow:latest-gpu-jupyter
docker pull nvcr.io/nvidia/pytorch:23.10-py3        

The most updated image for pytorch gpu can be found here.

To run the images, use the following command as an example:

docker run -it --rm --user $(id -u):$(id -g) --gpus all -v $PWD:/tf/data -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter        

Let’s break down the different parts of this command:

  • docker run: This command is used to run a Docker container.
  • -it: This flag specifies running the container in interactive mode with a pseudo-TTY allocated.
  • --rm: This flag specifies automatically removing the container when it exits.
  • --user $(id -u):$(id -g): This flag specifies the user and group ID to use inside the container. In this case, it uses the user and group ID of the current user on the host system. This allows you to create files within the container that can be accessed without root permissions outside the container.
  • --gpus all: This flag specifies that all available GPUs on the host system should be accessible by the container (only necessary for GPU integration).
  • -v $PWD:/tf/data: This flag specifies a volume mount, where the current working directory ($PWD) on the host system is mounted to the /tf/data directory inside the container. This ensures that the files in your current directory are accessible within the container.
  • -p 8888:8888: This flag specifies a port mapping, mapping port 8888 on the host system to port 8888 inside the container.
  • tensorflow/tensorflow:latest-gpu-jupyter: This is the name of the Docker image to use. In this case, it refers to the latest version of the TensorFlow GPU image with Jupyter Notebook installed.

You need to run this command in the terminal within the project directory so that the files are accessible to the container. The image name used should match the image you pulled in the previous step. When the image starts, it will output the link you can use to access Jupyter Notebook in your browser: http://127.0.0.1:8888/?token=...

How to customize your image

What if you need additional libraries that are not included in the standard image or if you want to use PyTorch with Jupyter Notebook? In that case, you will need to customize your image based on the official images. Create a new directory and within it, create a file named Dockerfile and a file named requirements.txt to specify the new libraries you want to add. The Dockerfile contains the instructions to build your system, including installing Jupyter Notebook for the PyTorch case.

Example requirements.txt:

notebook
# Other libraries        

Example Dockerfile:

# Start from the PyTorch image
FROM nvcr.io/nvidia/pytorch:23.10-py3

# Copy the requirements file into the image
COPY requirements.txt .

# Install dependencies from the requirements file
RUN pip install -r requirements.txt

# Start Jupyter Notebook
CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--no-browser", "--allow-root", "--NotebookApp.password='argon2:$argon2id$v=19$m=10240,t=10,p=8$BM5f6z8PxSwwx7RenVUb4Q$AWjm2iQ0hS7Rw0pzdj+vFdKyMPaeCZVVzaWdnCESArc'"]        

The “ — NotebookApp” part of the command is optional, it sets the user and password of the jupiter notebook to user: admin; password: admin.

To build this customized image, navigate to the directory containing these files and run the following command:

docker build -t pytorch-jupyter .        

This command creates a custom image named “pytorch-jupyter” (you can use any name you prefer). To run it, use the name of this custom image in the previous steps. If you want to customize the TensorFlow image, you only need to change the FROM directive to the appropriate starting image. After you run the image, your Jupyter Server is online on localhost in port 8888. Keep in mind that if you didn’t run the docker run command with the -d option (detached), just exiting the terminal will terminate the server. If you used the -d option, you can run:

docker stop {container_name}        

Access Jupyter Notebook

Now to access Jupyter Notebook there are 2 main options:

Accessing directly via browser: After allowing the container to run, Jupyter Notebook becomes available locally at http://127.0.0.1:8888. To access it via your browser, simply type in the URL. If you’ve set a password in the Dockerfile, use “admin” to log in. Keep in mind that you’ll only see files configured in the run command explained above.

Connecting VSCode to the kernel inside the container:

  • Open a notebook file in VSCode and click on “Select Kernel” in the top-right corner.
  • From the drop-down list, choose “Select Other Kernel.”
  • Click on the option “Jupyter Server.”
  • Insert the URL of the local server (https://127.0.0.1:8888).
  • Enter the configured admin password.
  • Select the only available kernel afterward.

After these steps, VSCode will establish a connection to the running Jupyter Server and execute your notebook within the container.


In conclusion, Docker provides an efficient solution for setting up your machine learning environment, whether you require GPU support or not. By utilizing Docker, you can easily create isolated and reproducible environments for working with popular libraries like PyTorch and TensorFlow. Additionally, Docker enables customization through Dockerfiles and allows you to pull and run preconfigured images, simplifying the setup process. With Docker, you can focus more on your machine learning tasks and less on the complexities of environment configuration.

Gabriel Fadel

Software Engineer | Back-End Developer | Python (Django, FastAPI, Flask) | AWS | Golang | Full-Stack Engineer | LLM Integration | Microservices | SQL | CI/CD | Agile Methodologies

1mo

Very instructive!

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics