# Nearest neighbors

This tutorials corresponds to the same slides following the introductory developments performed in the previous tutorial. Based on the features computed, we will implement a simple *querying* and *classification* system based on Nearest Neighbors.

# Reference slides

Download the slides

- Introduction to artificial intelligence
- Properties of machine learning
- Nearest-neighbors

# Tutorial

In this tutorial, we will cover the simplest querying and classification algorithm, namely the *\(k\)-Nearest Neighbor* method. The idea is for a given (usually *unknown*) element, to find its closest neighbor(s), based on the distances between this element and the known dataset for a given set of features. Formally, given a set of elements \(e_{i}\), \(i\in\left[1,N\right]\) and their corresponding features \(\mathbf{f_{i,m}}\in\mathbb{R}^{d}\) (which denotes the \(m^{th}\) feature of the \(i^{th}\) element, which can be \(d\)-dimensional), we will need to compute a distance measure \(\mathcal{D}\left(\mathbf{f_{i,m}},\mathbf{f_{j,m}}\right)\) between the features of elements of the dataset. This distance will express the dissimilarity between two features. For the first two questions of the tutorial, we will simply consider that the dissimilarity between features is expressed by their Euclidean \((l_{2})\) distance.

Given distances for each feature, we need to *merge* these various dissimilarities and then select the nearest neighbor of a particular element by computing

Of course, based on the types of features, different distance measures can be used, such as any of the \(l_{p}\) distances (a generalization of the Euclidean distance)

\[l_{p}\left(\mathbf{f_{i,m}},\mathbf{f_{j,m}}\right)=\sqrt[p]{\sum_{d=1}^{D}\left|f_{i,m}^{d}-f_{j,m}^{d}\right|^{p}}\]the *Cosine* distance

or the *correlation* distance.

The same observation holds for the way we decided to “merge” the different distances. By looking at these given definitions, start by thinking about the following questions.

**Questions**

- What problems can arise from \(n\)-dimensional audio features?
- Based on the selection equation, what constraints are implicitly made on the distances?
- Does the Euclidean distance seems like a sound way to measure the similarity between temporal features?

## 1.1 - Querying

First, we can use the nearest-neighbor idea to devise a very simple *querying* system. This type of method is typically used in many systems such as *Query By Humming (QBH)* softwares (similar to Shazam). As previously, we provide a baseline code in the main script. First, we create a \(N \times M\) distance matrix `dataMatrix`

corresponding to the \(M\) features of the \(N\) elements of the datasets. We selected here only the *SpectralCentroidMean, SpectralFlatnessMean* and *SpectralSkewnessMean* features. Then, after your code is filled, the `dist`

matrix should contain the mean distances (eventually, for various types of distances), which will then be sorted to the `nnIDs`

vector allowing to select the nearest neighbors.

**Exercise**

- Compute the set of distances between a random element and the rest of the dataset.
- Complete the plotting code in order to plot the element and its 10 nearest neighbors.
- Check that you obtain plots similar to those displayed below.
- Implement the \(l_{p}\),
*Cosine*and*correlation*distances - Try the same piece of code by varying the distances and the
`usedFeatures`

. - What can you tell about the discriminative power of features?
- What other steps should we perform on the features?
- (Optional) Extend your code to include temporal features
- (Optional) Extend your code to a fully functional
*Query-By-Example*(QBE) system.

**Expected output** [Reveal]

## 1.2 - Classification

For the second part of this tutorial, we will rely on the same technique (computing the distance of a selected point to the rest of the dataset) in a *classification* framework. The overarching idea behind \(k\)-NN classification is that elements from a same class should have similar properties in the *feature space*. Hence, the closest feature to those of an element should be from elements of its right class. These types of approaches are usually termed as *distance-based* classification methods. The skeleton for this algorithm is provided in the `01_Nearest_Neighbors/knnClassify.m`

function.

The algorithm will globally look quite similar to the previous one. However, this time we will compute the \(k\) Nearest Neighbors *for each of the classes separately*, which will allow to consider the resulting distance vectors as probabilities. Hence, we will compute for the set of classes \(\mathcal{C}_{t}\) the vector of distances, and select the \(k\) closest elements per class.

Then, in order to consider the distances as probabilities, we compute for each class the mean distance of its \(k\) nearest neighbors and normalize these distances across classes

\[p_{\mathcal{C}_{t}}\left(e_{i}\right)=\frac{1}{k}\sum_{j=1}^{k}kNN_{\mathcal{C}_{t}}\left(e_{i}\right)\]In the `knnClassify`

function, we store in `testFeatures`

the vector of features from the element we are trying to classify, and construct a cell of features for each class in the `classFeats`

cell.

**Exercise**

- Update the
`knnClassify`

code to perform the basic k-NN classification function - Run the algorithms for both 1-NN and 5-NN evaluation
- Plot the different confusion matrix to visually check the accuracies (you should obtain the values displayed in the following figure).
- Extend the code to take various distances into account (argument
`distType`

) - What is the use of “class weighting” (argument
`normalize`

)? - Implement the class weighting system and evaluate its effect
- Perform the same classification with various K and features to evaluate the properties and qualities of different parametrizations.
- (Optional) Automatize the evaluation of various configurations.

**Expected output** [Reveal]

## 1.3 - Evaluation

When proposing machine learning algorithms, the fundamental aspects lies in correctly evaluating their performances. Depending on the application, methods, dataset and even the nature of corresponding data, a plethora of evaluation measures can be used. We highly recommend the following articles for those interested in machine learning, so that you develop your critical mind and do not limit yourself to narrow evaluations (by relying on statistical tests) and also that you avoid **cherry picking**

- Demšar, J. (2006).
*Statistical comparisons of classifiers over multiple data sets.*The Journal of Machine Learning Research, 7, 1-30. PDF - Sturm, B. L. (2013).
*Classification accuracy is not enough.*Journal of Intelligent Information Systems,**41**(3), 371-406. PDF - Keogh, E., & Kasetty, S. (2003).
*On the need for time series data mining benchmarks: a survey and empirical demonstration.*Data Mining and knowledge discovery,**7**(4), 349-371. PDF

However, for the scope of this tutorial, we will limit ourselves to typical measures minimally required to evaluate your classifier. Overall, the most important aspects of evaluation lies in different ways of comparing the *real labels* (ground truth) to the *assigned labels* (picked by the classifier).

- The
**confusion matrix**is computed simply by counting the occurences in which a particular instance of a real label (row) is classified to an assigned label (column). This code is already provided in the starter code, and all the following measures can be derived directly from it. - The
**overall accuracy**is computed as the ratio of correctly classified examples divided by the complete number of examples. - The (per-class)
**precision**defines the ratio of examples correctly assigned to a class divided by the number of instances assigned to that class by the classifier. - The (per-class)
**recall**defines the ratio of examples correctly assigned to a class divided by the number of instances really belonging to that class. - The
**F1 measure**is defined as the ratio between the geometric and harmonic means between the precision and recall measures.

You can implement these measures by simply completing the starter code. If you have doubts about the implementation of these measures, you can check the corresponding Wikipedia article

**Exercise**

- Implement the
*accuracy*,*recall*,*precision*and*F1 measure* - Evaluate the previous algorithms with your new measures.
- Perform an automatization of the previous evaluations.
- Run the automatic evaluation for different K, distances and features.
- Plot the corresponding results for comparison