Machine learning algorithms
An introduction to the math and logic behind machine learning.
What are machine learning algorithms?
Machine learning algorithms are pieces of code that help people explore, analyze, and find meaning in complex data sets. Each algorithm is a finite set of unambiguous step-by-step instructions that a machine can follow to achieve a certain goal. In a machine learning model, the goal is to establish or discover patterns that people can use to make predictions or categorize information. What is machine learning?
Machine learning algorithms use parameters that are based on training data—a subset of data that represents the larger set. As the training data expands to represent the world more realistically, the algorithm calculates more accurate results.
Different algorithms analyze data in different ways. They’re often grouped by the machine learning techniques that they’re used for: supervised learning, unsupervised learning, and reinforcement learning. The most commonly used algorithms use regression and classification to predict target categories, find unusual data points, predict values, and discover similarities.
Machine learning techniques
As you learn more about machine learning algorithms, you’ll find that they typically fall within one of three machine learning techniques:
In supervised learning, algorithms make predictions based on a set of labeled examples that you provide. This technique is useful when you know what the outcome should look like.
For example, you provide a dataset that includes city populations by year for the past 100 years, and you want to know what the population of a specific city will be four years from now. The outcome uses labels that already exist in the data set: population, city, and year.
In unsupervised learning, the data points aren’t labeled—the algorithm labels them for you by organizing the data or describing its structure. This technique is useful when you don’t know what the outcome should look like.
For example, you provide customer data, and you want to create segments of customers who like similar products. The data that you’re providing isn’t labeled, and the labels in the outcome are generated based on the similarities that were discovered between data points.
Reinforcement learning uses algorithms that learn from outcomes and decide which action to take next. After each action, the algorithm receives feedback that helps it determine whether the choice it made was correct, neutral, or incorrect. It’s a good technique to use for automated systems that have to make a lot of small decisions without human guidance.
For example, you’re designing an autonomous car, and you want to ensure that it’s obeying the law and keeping people safe. As the car gains experience and a history of reinforcement, it learns how to stay in its lane, go the speed limit, and brake for pedestrians.
What you can do with machine learning algorithms
Machine learning algorithms help you answer questions that are too complex to answer through manual analysis. Use cases typically fall into one of these categories.
Predict a target category
Two-class (binary) classification algorithms divide the data into two categories. They’re useful for questions that have only two possible answers that are mutually exclusive, including yes/no questions. For example:
- Will this tire fail in the next 1,000 miles: yes or no?
- Which brings in more referrals: a USD 10 credit or a 15% discount?
Multiclass (multinomial) classification algorithms divide the data into three or more categories. They’re useful for questions that have three or more possible answers that are mutually exclusive. For example:
- In which month do the majority of travelers purchase airline tickets?
- What emotion is the person in this photo displaying?
Find unusual data points
Anomaly detection algorithms identify data points that fall outside of the defined parameters for what’s “normal.” For example, you would use anomaly detection algorithms to answer questions like:
- Where are the defective parts in this batch?
- Which credit card purchases might be fraudulent?
Regression algorithms predict the value of a new data point based on historical data. They help you answer questions like:
- How much will the average two-bedroom home cost in my city next year?
- How many patients will come through the clinic on Tuesday?
Clustering algorithms divide the data into multiple groups by determining the level of similarity between data points. Clustering algorithms work well for questions like:
- Which viewers like the same types of movies?
- Which printer models fail in the same way?
Popular machine learning algorithms
Linear regression algorithms show or predict the relationship between two variable or factors by fitting a continuous straight line to the data. The line is often calculated using the Squared Error Cost function.
Logistic regression algorithms fit a continuous S-shaped curve to the data.
Naïve Bayes algorithms calculate the probability that an event will occur, based on the occurrence of a related event.
Support Vector Machines draw a hyperplane between the two closest data points. This marginalizes the classes and maximizes the distances between them to more clearly differentiate them.
Decision tree algorithms split the data into two or more homogeneous sets. They use if–then rules to separate the data based on the most significant differentiator between data points.
K-Nearest neighbor algorithms store all available data points and classify each new data point based on the data points that are closest to it, as measured by a distance function.
Random forest algorithms are based on decision trees, but instead of creating one tree, they create a forest of trees and then randomize the trees in that forest. Then, they aggregate votes from different random formations of the decision trees to determine the final class of the test object.
Gradient boosting algorithms produce a prediction model that bundles weak prediction models—typically decision trees—through an ensembling process that improves the overall performance of the model.
K-Means algorithms classify data into clusters—where K equals the number of clusters. The data points inside of each cluster are homogeneous, and they’re heterogeneous to data points in other clusters.
Learn more about machine learning
Start experimenting with Azure Machine Learning
See how different algorithms analyze data by building and deploying your own machine learning models using Azure Machine Learning.