Machine Learning Setup – Part 1: Python Virtual Environments

Introduction

Before you get started with your journey of Machine Learning, it is important to understand the underlying technology that you will be using. In most cases, this will involve using TensorFlow with Python.

ML workloads are quite compute intensive, this is why TensorFlow is written in C++. The Python interface is meant for you to use TensorFlow more effectively and easily without getting bogged down by C++ implementations details. Similar interfaces exist for JavaScript and a few other languages.

We will be using Python for our setup. Also, TensorFlow comes with a GPU specific implementation meant to take advantage of Nvidia GPUs, to keep things simple we will ignore that as well.

SSDNodes TensorFlow Template!

If you are already comfortable with it, or want to jump straight into running Python snippets, try out one of our TensorFlow templates at SSDNodes which will get you started with a Ubuntu 18.04 environment preconfigured with Jupyter Labs and TensorFlow. There are a lot of advantages in offloading your ML workload to our cloud, a few include:  

  1. Blazing fast internet connection to download huge sets of training data.
  2. Tonnes of memory and compute at low low prices! It will burn through the data before you are done with your coffee break.
  3. Freedom to break things. If you screwup, just click reinstall and you have a clean slate to start over again!

Python and its packaging problem

Python is one of the easiest languages to pickup, with readable syntax and projects like TensorFlow that have real world use cases you would be hard pressed to find a better language to begin your journey into computing.

However, some of its success has resulted in a lot of similar projects and confusion surrounding them. Some unintended consequences. Below are a few:

  1. Incompatible versions like Python 3 and Python 2
  2. Multiple revisions of Python 3 and the differences between them.
  3. Setups having multiple installations of Python. For example, from Anaconda, miniconda, base OS, etc.
  4. Differently named package managers, e.g, pip and pip3.
  5. Obsolete versions of pip being shipped by OS vendors.

In this post, we would explore the optimum balance between stability and having the most cutting edge version of Python/Pip possible. 

This is tested on Ubuntu 18.04 & Debian 10 but similar principles can be applied to other Linux distros.  

Virtual Environments

To begin with, we will stick to the version Python3 that gets shipped with your OS. In case of Ubuntu 18.04, this is going to be version 3.6 at the time of this writing. Next we need to minimize our footprint on system-wide installations. We use Virtual Environments for this.

Virtual environments help us install Pip, and other Python packages inside a single local directory, and easily  use it for our scripts, instead of cluttering your entire system with unknown/conflicting packages.  

Like everything in Python community there are a myriad of implementations of this idea like virtualenv, pyenv, venv. But we will just use venv, which is officially supported and very easy to use.

To achieve this we will need only one extra package. On Debian-based systems this would be `python3-venv` and we won’t need any version of system-wide pip, TensorFlow or any other Python specific package.

$ sudo apt-install python3-venv

Using Virtual Environments

  1. To create a virtual environment, pick a name for your virtual environment, like newenv, and simply use the command:
    $ python3 -m venv newenv
    This command will create a directory `newenv` inside your current directory. I usually create one inside the directory of my project, but you can create it anywhere, and use it anywhere.
  2. To activate it, simply run:
    $ source ./newenv/bin/source
    For Windows and other Operating systems the command might be slightly different, but the effect will be still the same. The prompt will change to show you are in a different environment, i.e, (newenv) $
    Once that is done, you can start using it.
  3. Now you can use it, you will notice that now you have pip for managing Python packages, even if you don’t have it installed system-wide:
    (newenv)$ pip3 --version
    pip 18.1 from /home/r/newenv/lib/python3.7/site-packages/pip (python 3.7)
  4. You can upgrade pip, install TensorFlow and install TensorFlow as:
    (newenv)$ python3 -m pip --upgrade install pip
    (newenv)$ pip install tensorflow jupyterlab 
    With the virtual environment activated you can Python scripts using the simple `python yourscript.py` and they will make use of packages inside the newenv folder.
  5. And once you are done with it, you can simply deactivate the environment:
    (newenv)$ deactivate
    Delete the newenv directory to get rid of all the libraries, pacakges and other cruft that was installed.

Tip

Before deleting a virtual environment, activate it and run $ pip freeze > requirements.txt and grab the list of all the packages (with proper version number) into requirements.txt file.

Later can install the packages into a new environment by simply running:
(newenv)$ pip install -r requirements.txt
T
his would prevent your application from breaking because of any significant changes in the packages or their API.

Conclusion

The name of the game is to type less, get organized, and get more done. Everything from requirements.txt to virtual environments helps you get rid of unnecessary setup. Learn about your setup only once and rinse and repeat!