Publications

Do deep neural networks have an inbuilt Occam’s razor?

Published in arxiv, 2023

Occams razor in neural networks

Recommended citation: Test

Automatic Gradient Descent: Deep Learning without Hyperparameters

Published in arxiv, 2023

Automatic gradient descent trains both fully-connected and convolutional networks out-of-the-box and at ImageNet scale without hyperparameters

Recommended citation: Jeremy Bernstein, Chris Mingard, Kevin Huang, Navid Azizan, Yisong Yue. "Automatic Gradient Descent: Deep Learning without Hyperparameters." arXiv preprint arXiv:2304.05187 (2023).

Feature Learning and Signal Propagation in Deep Neural Networks

Published in ICML 2022, 2022

Recent work by Baratin et al. (2021) sheds light on an intriguing pattern that occurs during the training of deep neural networks: some layers align much more with data compared to other layers

Recommended citation: Lou, Yizhang, Chris E. Mingard, and Soufiane Hayou. "Feature Learning and Signal Propagation in Deep Neural Networks." International Conference on Machine Learning. PMLR, 2022.

Is SGD a Bayesian Sampler?

Published in JMLR, 2021

This paper investigates how similar SGD trained networks are to their Gaussian Processes

Recommended citation: Mingard, Chris, et al. "Is SGD a Bayesian sampler? Well, almost." The Journal of Machine Learning Research 22.1 (2021): 3579-3642.

Neural networks are a priori biased towards Boolean functions with low entropy

Published in arxiv, 2019

Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks – a single-layer perceptron with n input neurons, one output neuron, and no threshold bias term – we prove that upon random initialisation of weights, the a priori probability $P(t)$ that it represents a Boolean function that classifies $t$ points in ${0,1}^n$ as $1$ has a remarkably simple form: $P(t) = 2^{-n}$ for $0\leq t < 2^n$

Recommended citation: Mingard, Chris, et al. "Neural networks are a priori biased towards boolean functions with low entropy." arXiv preprint arXiv:1909.11522 (2019).