PhD course, "Entropy, information, statistical inference"

Scuola Alti Studi IMT Lucca, anno accademico 2023-2024

National Ph.D. in Artificial Intelligence for Society



OUTLINE: Introduction to statistical inference: the direct and inverse problems. Elements of Bayesian statistics and Bayesian model selection. Examples of hierarchical inference. Elements of information theory and some applications in systems neuroscience. Notions of probabilistic approaches to cognition.

CONTACT: miguel % ibanezberganza $ imtlucca £ com


PROVISIONAL PROGRAM PROPOSAL:

* Phenomenological introduction. The direct and inverse problems. Unsupervised inference. The concepts of spurious correlations, under- and over-fitting, variance and bias errors. The Occam's razor.

* The direct problem in examples. Sampling from a multivariate probability distribution: Markov chains and the MCMCM method. Different timescales in MCMC. Inefficiency of uniform MCMCM (and the equivalence of ensembles in statistical physics). Bootstrap and Jack-knife error estimation in MCMC. The Central Limit Theorem. Relative entropy, variational free energy, and the mean field approximation. Linear response theory: correlations in the Gaussian and Ising models. Spurious correlations and the high-temperature expansion in statistical physics. The Belief Propagation algorithm. The backpropagation algorithm. Variational sampling with neural networks.

* The inverse problem in examples. Bayesian estimators. Conjugate probability distributions. Inference with Gaussian mixture models. The Expectation-Maximisation algorithm. Correlations and cumulants. The Wick's theorem for the Gaussian distribution. The inverse problem in statistical physics, with two examples (in biophysics and systems neuroscience). Notions on unsupervised neural network learning. Deterministic and stochastic gradient ascent algorithms. The pseudo-likelihood approximation.

* Elements of Bayesian model selection. Occam's factors and the BIC. Two worked examples of model selection. Dimensionality reduction in Principal Component Analysis. Model selection and clustering. The Evidence Lower BOund approximation in ANN learning.

* Hierarchical inference. Hierarchical normal models and the Gamma distribution. Hierarchical Gaussian Filter. The Predictive Coding algorithm.

* Notions of probabilistic approaches to cognition.

* Notions of information theory. Shannon entropy and code lengths. Relative entropy and mutual information. Examples of information theory in systems neuroscience.