Recommendation systems are used in a variety of industries, from retail to news and media. If you’ve ever used a streaming service or ecommerce site that has surfaced recommendations for you based on what you’ve previously watched or purchased, you’ve interacted with a recommendation system. With the availability of large amounts of data, many businesses are turning to recommendation systems as a critical revenue driver. However, finding the right recommender algorithms can be very time consuming for data scientists. This is why Microsoft has provided a GitHub repository with Python best practice examples to facilitate the building and evaluation of recommendation systems using Azure Machine Learning services.
What is a recommendation system?
There are two main types of recommendation systems: collaborative filtering and content-based filtering. Collaborative filtering (commonly used in e-commerce scenarios), identifies interactions between users and the items they rate in order to recommend new items they have not seen before. Content-based filtering (commonly used by streaming services) identifies features about users’ profiles or item descriptions to make recommendations for new content. These approaches can also be combined for a hybrid approach.
Recommender systems keep customers on a businesses’ site longer, they interact with more products/content, and it suggests products or content a customer is likely to purchase or engage with as a store sales associate might. Below, we’ll show you what this repository is, and how it eases pain points for data scientists building and implementing recommender systems.
Easing the process for data scientists
The recommender algorithm GitHub repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:
- Data preparation – Preparing and loading data for each recommender algorithm
- Modeling – Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM)
- Evaluating – Evaluating algorithms with offline metrics
- Model selection and optimization – Tuning and optimizing hyperparameters for recommender models
- Operationalizing – Operationalizing models in a production environment on Azure
Several utilities are provided in reco utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are provided for self-study and customization in an organization or data scientists’ own applications.
In the image below, you’ll find a list of recommender algorithms available in the repository. We’re always adding more recommender algorithms, so go to the GitHub repository to see the most up-to-date list.
Let’s take a closer look at how the recommender repository addresses data scientists’ pain points.
-
It’s time consuming to evaluate different options for recommender algorithms
- One of the key benefits of the recommender GitHub repository is that it provides a set of options and shows which algorithms are best for solving certain types of problems. It also provides a rough framework for how to switch between different algorithms. If model performance accuracy isn’t enough, an algorithm better suited for real-time results is needed, or the originally chosen algorithm isn’t the best fit for the type of data being used, a data scientist may want to switch to a different algorithm.
-
Choosing, understanding, and implementing newer models for recommender systems can be costly
- Selecting the right recommender algorithm from scratch and implementing new models for recommender systems can be costly as they require ample time for training and testing as well as large amounts of compute power. The recommender GitHub repository streamlines the selection process, reducing costs by saving data scientists time in testing many algorithms that are not a good fit for their projects/scenarios. This, coupled with Azure’s various pricing options, reduces data scientists’ costs on testing and organization’s costs in deployment.
-
Implementing more state-of-the-art algorithms can appear daunting
- When asked to build a recommender system, data scientists will often turn to more commonly known algorithms to alleviate the time and costs needed to choose and test more state-of-the-art algorithms, even if these more advanced algorithms may be a better fit for the project/data set. The recommender GitHub repository provides a library of well-known and state-of-the-art recommender algorithms that best fit certain scenarios. It also provides best practices that, when followed, make implementing more state-of-the-art algorithms easier to approach.
-
Data scientists are unfamiliar with how to use Azure Machine Learning service to train, test, optimize, and deploy recommender algorithms
- Finally, the recommender GitHub repository provides best practices for how to train, test, optimize, and deploy recommender models on Azure and Azure Machine Learning (Azure ML) service. In fact, there are several notebooks available on how to run the recommender algorithms in the repository on Azure ML service. Data scientists can also take any notebook that has already been created and submit it to Azure with minimal or no changes.
Azure ML can be used intensively across various notebooks for tasks relating to AI model development, such as:
- Hyperparameter tuning
- Tracking and monitoring metrics to enhance the model creation process
- Scaling up and out on compute like DSVM and Azure ML Compute
- Deploying a web service to Azure Kubernetes Service
- Submitting pipelines
Learn more
Utilize the GitHub repository for your own recommender systems.
Learn more about the Azure Machine Learning service.
Get started with a free trial of Azure Machine Learning service.