The present tutorials are designed to implement the core notions seen in machine learning lessons. Most techniques can be applied to any type of data from which sets of features can be computed. The exercises here will introduce the basic mecanisms behind these technics and then, we will target specific applications to musical or audio data.
Get the baseline code for your language of choice :
Get the audio datasets from this link
Unzip files and place
00_Datasets along with other folders
Download the slides
- Introduction to artificial intelligence
- Properties of machine learning
In this introduction, we will cover basic Music Information Retrieval (MIR) interactions, in which we process a dataset of sound files and try to observe the properties of their various temporal and spectral features. Hence, we will quickly review basic calculus required to perform further machine learning tasks. This tutorial is also intended to review basic Matlab coding and plotting operations.
0.0 - Reference code
Along the tutorials, we provide a reference code for each section. This code contains helper functions that will alleviate you from the burden of data import and other sideline implementations. You will find designated spaces in each file to develop your solutions. The code is in MATLAB and relies heavily on the concept of code sections which allows you to evaluate only part of the code (to avoid running long import tasks multiple times and concentrate on the question at hand.
Get the baseline MATLAB code for all tutorials from this zip file
Get the baseline Python code for all tutorials from this zip file
In order to get the baseline script to work, you need to have a working distribution of Python, along with the following libraries
For those of you who have never coded in Python, here are a few interesting resources to get started.
0.1 - Datasets
In order to test our algorithms on audio and music data, we will work with several datasets that should be downloaded on your local computer first from this link
|Music-speech||MIREX Recognition set|
|Source separation||SMC Mirum dataset|
|Speech recognition||CMU Arctic dataset|
Unzip the file and place the
00_Datasets folder along with the other code folders
For the first parts of the tutorial, we will mostly rely solely on the classification dataset. In order to facilitate the interactions, we provide the function
importDataset that will allow to import all audio datasets along the tutorials.
- Launch the import procedure and check the corresponding structure
- Code a count function that prints the name and number of examples for each classes
Expected output [Reveal]
0.2 - Preprocessing
We will rely on a set of spectral transforms that allow to obtain a more descriptive view over the audio information. As most of these are out of the scope of the machine learning course, we redirect you to a signal processing course proposed by Julius O. Smith.
The following functions to compute various types of transforms are given as part of the basic package, in the
||Short-term Fourier transform|
||Bark scale transform|
||Mel scale transform|
In order to perform the various computations, we provide the following function, which performs the different transforms on a complete dataset.
- Launch the transform computation procedure and check the corresponding structure
- For each class, select a random element and plot its various transforms on a single plot. You should obtain plots similar to those shown afterwards.
- For each transform, try to spot major pros and cons of their representation.
Expected output [Reveal]
0.3 - Features
As you might have noted from the previous exercice, most spectral transforms have a very high dimensionality, and might not be suited to exhibit the relevant structure of different classes. To that end, we provide a set of functions for computing several spectral features in the
00_Features folder, we redirect interested readers to this exhaustive article on spectral features computation.
||Mel-Frequency Cepstral Coefficients (MFCC)|
Once again, we provide a function to perform the computation of different features on a complete set. Note that for each feature, we compute the temporal evolution in a vector along with the mean and standard deviation of each feature. We only detail the resulting data structure for a single feature (
- Launch the feature computation procedure and check the corresponding structure
- This time for each class, superimpose the plots of various features on a single plot, along with a boxplot of mean and standard deviations. You should obtain plots similar to those shown afterwards.
- What conclusions can you make on the discriminative power of each feature ?
- Perform scatter plots of the mean features for all the dataset, while coloring different classes.
- What conclusions can you make on the discriminative power of mean features ?
Expected output [Reveal]