# Support vector machines

The present tutorials covers the implementation of Support Vector Machines (SVM) and the notions of kernels.

# Reference slides

Download the slides

- Support Vector Machines
- Properties of kernels

# Tutorial

In this tutorial, we will cover a more advanced classification algorithm through the use of *Support Vector Machines* (SVMs). The goal is to gain an intuition of how SVMs works and how to use *Gaussian kernel* with SVMs to find out the decision boundary. The implementation proposed here follows a simplified version of the *Sequential Minimal Optimization* (SMO) algorithm for training support vector machines, which is detailed in the following paper. The complete SMO algorithm can then be quite easily implemented following the full paper.

Once again, to simplify your work, we provide the following set of functions that you should find in the `03_Support_Vector_Machines`

folder

File |
Explanation |
---|---|

`example1.mat` |
Example data for a quasi-linear problem |

`example2.mat` |
Example data for a non-linear but well-defined problem |

`example3.mat` |
Example data for a slightly non-linear problem with outliers |

`plotData.m` |
Allows to plot the set of data points |

`visualizeBoundary.m` |
Plots a non-linear decision boundary from a SVM model |

`visualizeBoundaryLinear.m` |
Plots a linear decision boundary |

## 3.1 - Linear classification

We briefly recall here that the goal of SVMs is that given a set of input vectors separated in two classes ( and , we would like to find the *optimal boundary* (separating line) between the two classes. To do so, we want to find the separation that gives the *maximal margin* between the two sets. Hence, let us first try to find a solution to the following problem

By setting the variable , we can rewrite this problem as

The idea is that for some of the input vectors (called *support vectors*), we will have an equality in equation (1), where and . By substracting those two equation, we see that we are trying to find a solution to

Therefore, solving the set of equations amounts to find the maximal margin between the support vectors (as they *support* classification). As seen in the slides, we know that this problem can also be expressed as

Where we can introduce a kernel in the equation in order to solve non-linear optimization problems. We can use this formulation to perform the *predictions* of the algorithm and in the first part of this tutorial, we will only consider the *linear* kernel . Computing the predictions should be performed in the `svmPredict`

function

The complete model and all of its corresponding parameters are summarized in the `model`

structure defined as

**Exercise**

- Update the
`svmPredict`

code to compute the SVM prediction (only with a linear kernel). - Run the predictions on random initializations of the models

**Learning phase**

In order to perform learning, we will try to maximize the *dual formulation* problem expressed as

with and . is a chosen *regularization hyper-parameter*. To ease your work, we will implement only a simplified version of the SMO algorithm which is mainly focused on the optimization aspects. The SMO algorithm selects two parameters, and and optimizes the objective value jointly for both these . Finally it adjusts the parameter based on the new . This process is repeated until the converge.

**Selecting .** The full SMO algorithm use heuristics to choose which and to optimize (which greatly increase the convergence speed. However, for this simplified version, we will simply iterate on all and select a random , to perform the joint optimization of and

**Optimizing and .** To jointly optimize the value, we first find bounds and such that holds to satisfy the constraint of the original problem

Now we want to find so as to maximize the objective function. It can be shown (as in the original paper) that the optimal is given by

where

defines the error between the prediction of the SVM on the example and its desired label. When computing , we can also replace the inner product by a kernel function . Finally, we clip the value of to lie within the selected bounds.

Now that we have solved , we can update the value for through

**Computing threshold .** Finally, we can compute the threshold to satisfy the original constraints by selecting

where

For the first part of this tutorial, we will compute the main iterations of the algorithm (minimization of the objective function), while relying on a *linear* kernel. This implies that we will only be able to perform linear discrimination. However, remember that the formulation of the SVMs provide an *optimal* and (gloriously) *convex* answer to this problem.

**Exercise**

- Update the
`svmTrain`

code to perform the training of a SVM with linear kernel. - Run your
`svmTrain`

function on the`example1.mat`

(linear) dataset to obtain the result displayed below. - Update the framework to display the decision boundary at each iteration of the algorithm.

**Expected output** [Reveal]

## 3.2 - Gaussian kernel

The Gaussian kernel (also called *Radial Basis Function (RBF)* kernel) allows to perform the optimization of non-linear classification problems. It is defined as

As seen in the slides, the underlying idea is that instead of trying to solve a non-linear separation problem in a given space, we might first transform the space into a target space in which the problem is linearly separable. Therefore, we transform the space through a kernel (which is directly expressed in the minimization problem) and solve the linear separation problem in the target space. Given the starter code and the previous exercise, you should be able to directly plug this kernel inside the `svmTrain`

and `svmPredict`

functions.

**Exercise**

- Update the
`svmPredict`

code to be able to rely on a Gaussian kernel. - Update the
`svmTrain`

code to be able to rely on a Gaussian kernel. - Run the
`svmTrain`

function on the`example2.mat`

(non-linear) dataset to obtain the result displayed below. - Once again, display the decision boundary at each iteration of the algorithm.

**Expected output** [Reveal]

## 3.3 - Supplementary kernels

Now that you have mastered the concept of kernels and updated your code to include the Gaussian kernel, you can augment your classifier and optimizaton problem to include several different kernels as those proposed afterwards.

**Polynomial kernel.** Intuitively, the polynomial kernel looks not only at the given features of input samples to determine their similarity, but also combinations of these.

**Sigmoid (tanh) kernel.** This kernel takes two parameters: and . For , we can view as a scaling parameter of the input data, and as a shifting parameter that controls the threshold of mapping. For , the dot-product of the input data is not only scaled but reversed.

**Exercise**

- Update the
`svmPredict`

code to include supplementary kernels. - Update the
`svmTrain`

code to include supplementary kernels.

## 3.4 - Audio application

As seen in the previous tutorial, we have considered only a *binary classification* problem (where elements can only belong to one of two classes). This simplifies the work as we simply need to separate the space between two types of points. However, in real-world problems, we usually want to solve *multi-class* problems with any given number of classes. For this tutorial, we will rely on a simple trick which is to consider that a -class problem can simply be restated as different binary classification problems. The underlying idea is that each class defines a binary classifier which tells us if a given element is part of this class, or belongs to *any of the other* classes.

**Exercise**

- Change the training loop to define separate binary problems
- For each problem, train an SVM to perform classification.
- Evaluate the global classification accuracy of your system.