- Machine learning
- Input data
- Feature selection
- Dimensionality reduction

- Output labels

- Input data
- Supervised learning
- Basics
- Classification
- Regression
- The regression function
- A straightforward algorithm: $K$-nearest neighbors for regression
- Linear regression
- The least squares method
- Robustness to outliers and the least absolute deviations method
- Ridge regression
- The LASSO
- Support Vector Regression
- $\nu$-Support vector regression
- Linear programming support vector regression

- Nonlinear regression
- Back to the least squares method
- Kernel ridge regression
- Support Vector Regression
- Multi-layer perceptron
- Kernelized LASSO
- Piecewise smooth regression

- Switching regression
- K-LinReg: the least squares method again
- An EM algorithm

- Learning theory
- Basics
- Underfitting and overfitting
- Estimation error versus approximation error
- The structural risk minimization principle
- Regularization

- Consistency
- Consistency of the ERM principle
- Nontrivial consistency

- No-free-lunch theorems
- Generalization error bounds for classification
- My first error bound, and why it is useless
- A simple (but useful) bound for finite function classes
- Towards error bounds for infinite function classes
- Projection of a function class onto a data sample and the growth function
- VC-dimension
- VC bounds
- Margin-based bounds
- Bounds based on Rademacher averages

- Bound involving the best in class

- Basics

- Unsupervised learning
- Clustering
- What is a good clustering?
- Center-based clustering
- $K$-means
- Expectation-Maximization (EM)

- Spectral clustering
- Subspace clustering

- Density estimation

- Clustering
- Appendix
- Probability and statistics
- Linear algebra
- Vectors and vector spaces
- Dot/inner products
- Metrics
- Norms
- Matrices
- Eigenvalues and eigenvectors
- Positive definite matrices

- Solving linear systems

- Optimization
- Unconstrained optimization
- Lagrangian duality
- Convex optimization
- Nonconvex optimization

- Functional analysis