Do deep neural networks have an inbuilt Occam’s razor?
Published in arxiv, 2023
Occams razor in neural networks
Recommended citation: Test
Published in arxiv, 2023
Occams razor in neural networks
Recommended citation: Test
Published in arxiv, 2023
Automatic gradient descent trains both fully-connected and convolutional networks out-of-the-box and at ImageNet scale without hyperparameters
Recommended citation: Jeremy Bernstein, Chris Mingard, Kevin Huang, Navid Azizan, Yisong Yue. "Automatic Gradient Descent: Deep Learning without Hyperparameters." arXiv preprint arXiv:2304.05187 (2023).
Published in ICML 2022, 2022
Recent work by Baratin et al. (2021) sheds light on an intriguing pattern that occurs during the training of deep neural networks: some layers align much more with data compared to other layers
Recommended citation: Lou, Yizhang, Chris E. Mingard, and Soufiane Hayou. "Feature Learning and Signal Propagation in Deep Neural Networks." International Conference on Machine Learning. PMLR, 2022.
Published in JMLR, 2021
This paper investigates how similar SGD trained networks are to their Gaussian Processes
Recommended citation: Mingard, Chris, et al. "Is SGD a Bayesian sampler? Well, almost." The Journal of Machine Learning Research 22.1 (2021): 3579-3642.
Published in arxiv, 2019
Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks – a single-layer perceptron with n input neurons, one output neuron, and no threshold bias term – we prove that upon random initialisation of weights, the a priori probability $P(t)$ that it represents a Boolean function that classifies $t$ points in ${0,1}^n$ as $1$ has a remarkably simple form: $P(t) = 2^{-n}$ for $0\leq t < 2^n$
Recommended citation: Mingard, Chris, et al. "Neural networks are a priori biased towards boolean functions with low entropy." arXiv preprint arXiv:1909.11522 (2019).