Thirty-two Python Tools and Package Libraries to Increase your Machine Learning Productivity
These are tools, packages, and libraries that my colleagues and I use to increase Machine Learning pipeline development and production deployment productivity. What follows is a snapshot of our favorites as of December 24, 2020.
Python
We used Python predominately (95%) over the last seven years because:
- Almost all new Machine Learning models, cloud, GPUs, and many other are available as a Python API;
- The assortment and number of free code and packages is the largest we have seen;
- Native Python is slower than C by 20+ times, but almost all Python packages are near C speed as they are thin APIs over CPython or use some other speedup technique.
We used C to speedup Python when Numba could not be used. We tried Go, but it did not work out.
My journey to speed up Python: Setting Up a GoLang Development Environment and Benchmarking
4. Python GIL (lack of concurrency on multicore machines) is bypassed more and more each day by the cloud, Spark, package implementation (i.e.,XGBoost), and strong typing with the introduction of type hinting starting in Python 3.5.
I discuss why type hints can future-proof your Python code.medium.com
Python’s runtime speed seems to gather the majority of criticism. A lot of criticism may disappear if some way is found to compile Python. Meanwhile, Python is the predominant choice for machine learning.
Python IDEs
We used EMACS for 15 years. We were those people who learned computer science and accidentally absorbed some software engineering along the way coding in LISP.
We stopped porting EMACS or using someone else’s port to new hardware and OS platform. We started using other IDEs, as we worked with Java, Scala, R, Matlab, Go, and Python.
We discuss only Python-related tools, such as IDEs, for the rest of this blog. We think Python will be eventually drop in popularity as the first choice for Machine Learning, just not in the next few years.
I think there are three good choices for Python IDE.
Jupyter Notebook or JupyterLab
Jupyter Notebook enables you to embed text, embed code, and run code interactively. It is based on the lab notebook.
Project Jupyter exists to develop open-source software, open-standards, and interactive computing services across dozens of programming languages. — Jupyter Project
Adding Jupyter Notebook Extensions to a Docker Image
nbdev
fast.ai coded a complete set of Jupyter Notebook tools. Please look them over.
It is a Python programming environment called nbdev, which allows you to create complete python packages, including tests and a rich documentation system, all in Jupyter Notebooks. — Jeremy Howard
nbdev
fast.ai coded a complete set of Jupyter Notebook tools. Please look them over.
It is a Python programming environment called nbdev, which allows you to create complete python packages, including tests and a rich documentation system, all in Jupyter Notebooks. — Jeremy Howard
PyCharm or VSCode
PyCharm and VSCode are the most popular IDEs (Interactive Development Environments) for Python.
We use PyCharm (or VSCode) to develop, document, test and debug. Both integrate with inline documentation formatting, version control (git or GitHub), testing packages, coverage, linters, type hint checkers, and code formats.
The Python IDE for Professional Developers — JetBrains
PyCharm: An Animated Guide to Creating Projects and Setting their Behavior
Python Development Tools
Black
Black formats your code into a superset of the PEP-8 standard. We use it to format all code files in a project triggered by PyCharm, VSCode, or GitHub actions.
codacy
Codacy is currently our favorite “pain-in-the-***” (PITA) development tool. It catches more errors and suspect code than pylint, some of the stylistic warning we ignore. We think of today as an automated code review tool. As codacy states in their tag-line: Automate code reviews on your commits and pull requests.