I am currently an associate professor and research in data mining and artificial intelligence at IRCAM, where I am head of the ACIDS research group within the Musical representations and teaching computer science and mathematics at Sorbonne Université (Formerly UPMC - Paris 6). I also participate in ecological monitoring and metagenomics research with Geneva university (UNIGE).
You can find detailed lists of
- Projects and main research axes.
- Scientific publications and other papers.
- Supervision of PhD and students.
Sound synthesizers are pervasive in music and they now even entirely define new music genres. However, their complexity and sets of parameters renders them difficult to master. We created an innovative generative probabilistic model that learns an invertible mapping between a continuous auditory latent space of a synthesizer audio capabilities and the space of its parameters. We approach this task using variational auto-encoders and normalizing flows Using this new learning model, we can learn the principal macro-controls of a synthesizer, allowing to travel across its organized manifold of sounds, performing parameter inference from audio to control the synthesizer with our voice, and even address semantic dimension learning where we find how the controls fit to given semantic concepts, all within a single model. These ideas have been implemented in a real-time plugin as a Max4Live device
Generative timbre spaces
Best presentation award at ISMIR 2019
Timbre spaces have been used to study the relationships between different instrumental timbres,based on perceptual ratings. However, they provide limited interpretability, no generative capabilityand no generalization. Here, we show that variational auto-encoders (VAE) can alleviate these limitations, by regularizing their latent space during training in order to ensure that the latent space of audio follows the same topology as that of the perceptual timbre space. Hence, we bridge audio analysis, perception and synthesis into a single system.
The orchids software is the first complete system for abstract and temporal computer-assisted orchestration and timbral mixture optimization. It provides a set of algorithms and features to reconstruct any time-evolving target sound with a combination of acoustic instruments, given a set of psychoacoustic criteria. It can help composers to achieve unthinkable timbral colors by providing efficient sets of solutions that best match a sound target. Find more information on the dedicated webpage
Live Orchestral Piano
We recently developed the first live orchestral piano (LOP) system. The system provides a way to compose music with a full classical orchestra in real-time by simply playing on a MIDI keyboard. Our approach is to perform statistical inference on a corpus of midi files. This corpus contains piano scores and their orchestration by famous composers.This objective might seem too ambitious : learning orchestration through the mere observation of scores ? We believe that by observing the correlation between piano scores and corresponding orchestrations made by famous composers, we might be able to infer the spectral kwnoledge of composers. The probabilistic models we investigate are neural networks with conditional and temporal structures. Find more information on this webpage
Esling P., Masuda, N. Bardet, A. Despres, R. Chemla–Romeu-Santos A. Universal audio synthesizer control with normalizing flows, 22nd International Digital Audio Effects (DAFx2019) Conference. [Blog]
Esling P., Chemla–Romeu-Santos A. & Bitton A. Generative timbre spaces: regularizing variational auto-encoders with perceptual metrics, 19th International Society for Music Information Retrieval (ISMIR2018) Conference. Best presentation award [Blog]
Esling P., Lejzerowicz F. & Pawlowski J. High-throughput accuracy for multiplex amplicon sequencing, Nucleic Acid Research, February 17, 2015 doi:10.1093/nar/gkv107. (2015). IF: 8.055
Esling P. Agon C., Multiobjective time series matching and classification IEEE Transactions on Speech Audio and Language Processing, vol. 21, no. 10, pp. 2057–2072. (2013). IF: 1.675
Esling P. Agon C., Time series data mining ACM Computing Surveys, vol. 45, no. 1., (2012). IF: 4.543