Background & Objective:
Despite the rapid growth of neural networks and related hardware advancements in recent years, traditional ML models continue to dominate the industry. This is mainly due to the fact that they can be interpreted better and the cost of development is relatively lower. However, when it comes to scoring in real-time, traditional ML models are relatively slow and do not take full advantage of the computing power that is available.
Traditional ML libraries are built to run on CPU environments and they do not use common abstraction(such as tensors) to represent their computation. So, traditional ML models cannot use the advanced hardware accelerations like deep learning frameworks! This is a big constraint in terms of the performance and scaling of ML models.
A group of engineers from Microsoft, recently released a library – HUMMINGBIRD, to address this very problem. This article intends to give you an introduction to this amazing library!
HUMMINGBIRD – An Introduction!
HUMMINGBIRD – a Python library was released recently by a group of engineers from the Microsoft research team. This library compiles steps in Traditional ML models into a small set of tensor operations to enhance the efficient computations for both CPU and hardware accelerators (GPU, TPU). HUMMINGBIRD converts your trained traditional ML models into PyTorch.
This enables traditional ML models to exploit the power of advanced hardware accelerations, all with very few lines of code. Even existing models can be re-engineered using the library with few lines of codes, thus saving huge reengineering costs. With HUMMINGBIRD ML models will benefit as below:
- All the current and future optimizations from neural network frameworks
- Support native hardware acceleration
- Support for both traditional and neural network models
- No re-engineering required
Before we move onto further sections, I would want you to understand an important term that is used across the article – ML Operators. The term “ML Operators” simply means a single operation in an ML pipeline such as a trained model, a data processing step etc.
As on date, HUMMINGBIRD supports the following ML operators:
Basic Intuition of the Architecture:
HUMMINGBIRD is developed on the principles of the following simple yet key observation:
- A trained model can be represented as a function that transforms given input features to a prediction score and this agnostic of the training algorithm used.
- The same observation applies to featurizers that fit the data.
Hence, It will suffice if the solution converts prediction function for each operator into a tensor computation and stitch them accordingly. HUMMINGBIRD does that Exactly! HUMMINGBIRD takes a pre-trained ML pipeline as input and compiles it into a DAG of tensor computations.
High Level Architecture
This is achieved by the following three components HUMMINBIRD’s architecture:
The most basic operation of this package is to convert representative algorithmic operators into tensor computations. HUMMINGBIRD uses different strategies for compiling a different kind of ML operators based on run-time statistics (e.g., batch size) and tree structure. In general, there is no written rule on why one strategy is better than the other. Following are HUMMINGBIRD compilation strategies available as of Now:
- Tree-based Models:
- GEneric Matrix Multiplication (GEMM): Evaluation of a decision tree is cast as a series of three GEMM operations interleaved by two logical operators.
- TreeTraversal: An improvement over GEMM – reducing redundancy by mimicking the typical tree traversal implemented using tensor operators
- PerfectTreeTraversal: Similar to the previous one in spirit. But, it requires all trees in the ensemble to be completely balanced trees.
Outside these strategies available for tree-based ML operators, we also have a few more compilation strategies for other ML operators. Also, the following articles should give a good insight into different compilation strategies and the architecture and I encourage you to explore the same:
I hope you have a basic intuition of the architecture of the library! Now let’s see a coded example of HUMMINGBIRD implementation and its benefits.
Experimental Implementation :
In today’s experimentation, Let us build a RandomForest classifier and evaluate the performance efficiencies achieved through Hummingbird.
Installation and Setup:
HUMMINGBIRD is available as a PyPI package and can be pip installed from your notebooks. For this exercise, I used google colab as we need access to GPU. Installation can be done using the following command :
Once the installation is complete we can import the Hummingbird package our environment which can be
Let’s continue to build the model as usual:
Now that our model is trained all it takes is just one line of code to convert the model to PyTorch. We simply pass on the trained model and the target framework to the convert method. For a detailed list of arguments and options follow the official documentation here.
That’s it! Our model is ready to be run on a DNN environment! Let’s time the predictions and evaluate the scoring performance
Above are time metrics from PyTorch CPU Vs GPU & you can clearly see the performance gain and it has achieved by using the GPU. The following are the experimental evaluation results shared by the authors in their blog. An average 65x efficiency is gained from scikit-learn to PyTorch(GPU).
Hummingbird is a great package to enable your traditional models to exploit the power of advanced computing power that is available today. The fact that you can re-engineer your traditional models with few lines of code makes this package unique and its potential is great. Also, there is a strong roadmap for this library, and support for additional ML operators are on the card. I encourage all of you to visit the official git page to understand the future implementations planned. I hope this article was helpful! Keep Learning!