Workshop on Recent Developments in Theoretical Machine Learning

Event
Workshop

Date: Monday, 13 January 2025

Time: 9.00 - 17.00 GMT
Location: LRT608A (Lecture Theatre on Level 6), Translation and Innovation Hub (I-HUB)
Campus: White City Campus

Audience: Open to all
Cost: £10 GBP
Tickets: Tickets to be purchased in advance

Registration is now closed. Add event to calendar

For further details:

Contact: Cristopher Salvi

This workshop aims to bring together researchers in stochastic analysis, statistics and theoretical machine learning for an exchange of ideas at the forefront of the field. The workshop will coincide with the visit of Professor Gerard Ben Arous, a leading expert in stochastic analysis and high-dimensional statistics, whose insights into deep learning theory offer an exceptional opportunity for meaningful collaborations. The event will feature a series of presentations and discussions around the mathematical underpinnings of modern machine learning techniques, including topics such as:

Theoretical analysis of deep learning architectures
High-dimensional statistics and learning theory
Diffusion models
Connections between stochastic differential equations and neural networks

Confirmed speakers

(Courant Institute, NYU)
(DeepMind)
(51�Թ��)
(51�Թ��)
(51�Թ��)
(51�Թ��)
(Bath)
(51�Թ��)
(Oxford)

Schedule

9.15 – 9.30 Registration and Welcome

9.30 – 10.00�� Andrew Duncan

10.00 – 10.30�� Nicola M. Cirone��

��10.30 – 11.00 Deniz Akyildiz

��11.00 – 11.30 Coffee break

��11.30 – 12.15��Gerard Ben Arous��

��12.15 – 12.45�� Will Turner

��12.45 – 14.00 Lunch @ The Works, Sir Michael Uren Building, London W12 0BZ (by invitation only)

��14.00 – 14.45�� Arnaud Doucet��

��14.45 – 15.15�� James Foster

��15.15 – 15.45 Coffee break

��15.45 – 16.30�� Harald Oberhauser

��16.30 – 17.00�� Yingzhen Li

18.30 – Conference dinner @ The Broadcaster, 89 Wood Ln, London W12 7FX (by invitation only)

Titles and abstracts

Title: Effective dynamics for summary statistics in high dimensional optimization: can the spectral point of view be sharp?

Abstract: I will present a survey of the recent progress on the notion of summary statistics and effective dynamics for the natural optimization tasks needed for high dimensional data science and machine learning. The main idea is that in many problems in very high dimension, most of the action happens locally in a low dimensional space. The projection on these spaces (the summary statistics) follows (possibly complex) autonomous dynamics (the effective dynamics) which carry the whole information about the success of the optimization task. The hard part is often to find these summary statistics, and then a dynamical spectral approach is useful, as the Hessian (and Fisher information matrix) develop a spectral BBP transition along the training process when the signal to noise ratio is strong enough. I will illustrate this with a brand new result about a sharp dynamical spectral transition in a central example of ML, i.e. multilayered Neural Nets for classification tasks. This line of ideas was introduced in a line of works started with Reza Gheissari and Aukosh Jagganath and was then developed jointly with Jiaoyang Huang for the spectral approach. If time permits, I will also show how these effective dynamics work for the case of multi-spike Tensor PCA (which is taken from a series of recent joint works with Cedric Gerbelot and Vanessa Piccolo). There the spectral transition is yet to be studied.

Title: Accelerated Diffusion Models via Speculative Sampling

Abstract: Speculative sampling is a popular technique for accelerating inference in Large Language Models (LLMs) by generating candidate tokens using a fast draft model and accepting or rejecting them based on the target model’s distribution. While speculative sampling was previously limited to discrete sequences, we extend it to diffusion models, which generate samples via continuous, vector-valued Markov chains. In this context, the target model is a high-quality but computationally expensive diffusion model. We propose various drafting strategies, including a simple and effective approach that does not require training a draft model and is applicable out of the box to any diffusion model. Our experiments demonstrate significant generation speedup on various
diffusion models while generating exact samples from the target model.

Title: Non-linear ICA

Abstract: Given measurements of mixed source signals, independent component analysis (ICA) recovers the original sources under linear mixing. While nonlinear mixtures have posed a challenge, recent progress has been significant. This talk develops and applies theoretical results in stochastic analysis to this statistical problem, leading to effective algorithms for nonlinear ICA with theoretical guarantees. Joint work with A. Schell.

Title: Diffusion-based Learning of Latent Variable Models��

Abstract: In this talk, I will summarize recent progress and challenges in maximum marginal likelihood estimation (MMLE) for learning latent variable models (LVMs) – focusing on the methods based on Langevin diffusions. I will first introduce the problem and the necessary background on Langevin diffusions, together with recent results on Langevin-based MMLE estimators, detailing the interacting particle Langevin algorithm (IPLA) which is a recent Langevin-based MMLE method with explicit theoretical guarantees akin to Langevin Monte Carlo methods. I will then move on to outline recent progress, specifically accelerated variants, and methods for MMLE in nondifferentiable statistical models with convergence and complexity results. Finally, if time permits, I will talk about the application of IPLA to inverse problems.

Title: Computable Statistical��Divergences for Functional Data

Abstract: Kernel-based discrepancies have found considerable success in constructing statistical tests which are now widely used in statistical machine learning. ��Examples include Kernel Stein Discrepancy which enables goodness-of-fit tests of data samples against an (unnormalized) ��probability density based on Stein’s method. ��The effectiveness of the associated tests will crucially depend on the dimension of the data.��I will present some recent results on the behaviour of such tests in high dimensions, ��exploring properties of the��statistical��divergence under different scaling of data dimension and data size. ��Building on this, I will discuss how such��discrepancies��can be extended to probability distributions on infinite-dimensional spaces.�� I will discuss applications to goodness-of-fit testing for measures on function spaces and its relevance to various problems in ML.

Title:��On the Identifiability of Switching Dynamical Systems

Abstract: One of my research dreams is to build a high-resolution video generation model that enables granularity controls in e.g., the scene appearance and the interactions between objects. I tried, and then realised the need of me inventing deep learning tricks for this goal is due to the issue of non-identifiability in my sequential deep generative models. In this talk I will discuss our research towards developing identifiable deep generative models in sequence modelling, and share some recent and on-going works regarding switching dynamic models. In particular, we first show conditions of identifiability for Markov Switching Models (or auto-regressive HMMs) with non-linear transitions, with a new proof technique��different from��the algebraic approach of the seminal HMM identifiability work by Allman et al. 2009. Then we lift the��Markov Switching Model to latent space and leverage existing results to show identifiability. If time permits, I will also show recent developments that build in more flexible structures in the latent switching dynamical prior.

Title: Efficient, Accurate and Stable Gradients for Neural Differential Equations

Abstract: Neural differential equations (NDEs) sit at the intersection of two dominant modelling paradigms – neural networks and differential equations. One of their features is that they can be trained with a small memory footprint through adjoint equations. This can be helpful in high-dimensional applications since the memory usage of standard backpropagation scales linearly with depth (or, in the NDE case, the number of steps taken by the solver). However, adjoint equations have seen little use in practice as the resulting gradients are often inaccurate.��Fortunately, there has emerged a class of numerical methods which allow NDEs to be trained using gradients that are both accurate and memory efficient. These solvers are known as “algebraically reversible” and produce numerical solutions which can be reconstructed backwards in time. Whilst algebraically reversible solvers have seen some success in large-scale applications, they are known to have stability issues.��In this talk, we propose a methodology for constructing reversible NDE solvers from non-reversible ones. We show that the resulting reversible solvers converge in the ODE setting, can achieve high order convergence, and even have stability regions. We conclude with a few examples demonstrating the memory efficiency of our approach.��Joint work with Samuel McCallum.

Title: Graph Expansions of Deep Neural Networks and their Universal Scaling Limits

Abstract: We present a unified approach to obtain scaling limits of neural networks using the genus expansion technique from random matrix theory. This approach begins with a novel expansion of neural networks which is reminiscent of Butcher series for ODEs, and is obtained through a generalisation of Faà di Bruno’s formula to an arbitrary number of compositions. In this expansion, the role of monomials is played by random multilinear maps indexed by directed graphs whose edges correspond to random matrices, which we call operator graphs. This expansion linearises the effect of the activation functions, allowing for the direct application of Wick’s principle to compute the expectation of each of its terms. We then determine the leading contribution to each term by embedding the corresponding graphs onto surfaces, and computing their Euler characteristic. Furthermore, by developing a correspondence between analytic and graphical operations, we obtain similar graph expansions for the neural tangent kernel as well as the input-output Jacobian of the original neural network, and derive their infinite-width limits with relative ease. Notably, we find explicit formulae for the moments of the limiting singular value distribution of the Jacobian. We then show that all of these results hold for networks with more general weights, such as general matrices with i.i.d. entries satisfying moment assumptions, complex matrices and sparse matrices.

Title: Randomised path developments, and signature kernels as universal scaling limits

Abstract: Scaling limits of random developments of a path into a matrix Lie group have recently been used to construct signature-based time series kernels. General linear group developments have been shown to be connected to the ordinary signature kernel (Muça Cirone et al.), while unitary developments have been used to construct the path characteristic function distance (Lou et al.) which has proven a successful discriminator for generative modelling tasks. By leveraging the tools of random matrix theory and free probability theory, we are able to provide a unified treatment of both limits��under general assumptions on the randomisation. For unitary developments, we show that the limiting kernel is given by the contraction of a signature against the monomials of freely independent semicircular random variables. Using the Schwinger-Dyson equations, we show that this kernel can be obtained by solving a novel quadratic functional equation. We will also discuss extensions to a class of Hermitian matrix models, whose limiting Schwinger-Dyson equations lead to path-dependent functional equations. Joint work with Thomas Cass, Samuel Crew and Cristopher Salvi.

Getting here

Registration is now closed. Add event to calendar

See all events

51�Թ���