ObjectivesThe goal of this lecture is to introduce basic theoretical and computational notions to be able to understand recent excitement about Big Data, their analysis and exploitation. The lecture will leverage on the material covered in Statistical mechanics: methods and applications (lecture by A. Maggs in 1st year) since part of the mathematical structure in high dimensional statistics and data analysis is closely related to statistical physics.
SyllabusMore specific topics that we will cover:
- How to discover structure in data? Introduction to statistical inference. Dimensionality reduction. The notion of high-dimensional statistics.
- Notions of information theory (Shannon theory). Notions of theory of algorithmic complexity. Reminder of useful results from probability.
- Introduction of prototypical data analysis problems: Regression, clustering, classification, denoising.
- Basic methods: Least squares. What is a regularization? Spectral algorithms, principal component analysis,
- Bayesian estimation, the role of models and priors. Marginalization and maximum likelihood. Formulation in terms of statistical physics and the physics - probability dictionary.
- Some computational methods for Bayesian estimation: Gibbs sampling known in physics as Monte Carlo, variational Bayes estimation known in physics as mean field theory.
- Phase transitions in optimization and estimation problems, relation to phase transitions in physics. Relation to information theoretic and algorithmic limitations.
- Introduction to artificial neural networks.
- Basic notions about modern machine learning, deep learning.
There will be several homeworks involving a little bit of coding or usage of provided codes and data, to get a practical experience with the material covered in the lecture. Note: It is useful, but not required, to have knowledge of the third year course Statistics and modeling by I. Rivals.
Last Modification : Thursday 29 June 2017