Contents

In words...

  1. Machine learning
    1. Input data
      1. Feature selection
      2. Dimensionality reduction
    2. Output labels
  2. Supervised learning
    1. Basics
      1. Predictive models
        1. Restricted model spaces
        2. Parametrized model spaces (hyperparameters)
      2. Learning algorithms
      3. The risk and the loss function
        1. Estimating the risk with a test sample
        2. Estimating the risk by cross-validation
      4. The empirical risk minimization principle
      5. The maximum likelihood inference principle
      6. Underfitting and overfitting
      7. Model selection
        1. Fast tuning of $K$-nearest neighbors
        2. Fast tuning of the Gaussian kernel
      8. Overall learning procedure
    2. Classification
      1. The Bayes classifier
      2. A straightforward algorithm: $K$-nearest neighbors
      3. Linear classification
        1. The perceptron
        2. Logistic regression
        3. The optimal separating hyperplane and the margin
        4. Support vector machines
      4. Nonlinear classification
        1. Nonlinear mapping in feature space
        2. The kernel trick
        3. Nonlinear support vector machine
        4. Multi-layer perceptron
        5. Decision trees
      5. Generative approach
        1. Linear and quadratic discriminant analysis
        2. The naive Bayes classifier
      6. Multi-class problems
        1. One-versus-all decomposition
        2. One-versus-one decomposition
        3. DAG-based decomposition
    3. Regression
      1. The regression function
      2. A straightforward algorithm: $K$-nearest neighbors for regression
      3. Linear regression
        1. The least squares method
        2. Robustness to outliers and the least absolute deviations method
        3. Ridge regression
        4. The LASSO
        5. Support Vector Regression
          • $\nu$-Support vector regression
          • Linear programming support vector regression
      4. Nonlinear regression
        1. Back to the least squares method
        2. Kernel ridge regression
        3. Support Vector Regression
        4. Multi-layer perceptron
        5. Kernelized LASSO
        6. Piecewise smooth regression
      5. Switching regression
        1. K-LinReg: the least squares method again
        2. An EM algorithm
    4. Learning theory
      1. Basics
        1. Underfitting and overfitting
        2. Estimation error versus approximation error
        3. The structural risk minimization principle
        4. Regularization
      2. Consistency
        1. Consistency of the ERM principle
        2. Nontrivial consistency
      3. No-free-lunch theorems
      4. Generalization error bounds for classification
        1. My first error bound, and why it is useless
        2. A simple (but useful) bound for finite function classes
        3. Towards error bounds for infinite function classes
        4. Projection of a function class onto a data sample and the growth function
        5. VC-dimension
        6. VC bounds
        7. Margin-based bounds
        8. Bounds based on Rademacher averages
      5. Bound involving the best in class
  3. Unsupervised learning
    1. Clustering
      1. What is a good clustering?
      2. Center-based clustering
        1. $K$-means
        2. Expectation-Maximization (EM)
      3. Spectral clustering
      4. Subspace clustering
    2. Density estimation
  4. Appendix
    1. Probability and statistics
      1. Probability space/measure
        1. Basic properties
      2. Random variables
        1. Probability distribution
        2. Probability density function
        3. Expectation
        4. Variance
      3. Random pairs and joint distributions
        1. Conditional probabilities
        2. Marginal distributions
        3. Independence
        4. Bayes' theorem
        5. Conditional expectation
        6. The law of total probability and total expectation
      4. Indicator function
      5. Inequalities
        1. The union bound
        2. Jensen's inequality
        3. Concentration inequalities
    2. Linear algebra
      1. Vectors and vector spaces
      2. Dot/inner products
      3. Metrics
      4. Norms
      5. Matrices
        1. Eigenvalues and eigenvectors
        2. Positive definite matrices
      6. Solving linear systems
    3. Optimization
      1. Unconstrained optimization
      2. Lagrangian duality
      3. Convex optimization
        1. Linear programming
        2. Quadratic programming
      4. Nonconvex optimization
    4. Functional analysis
      1. Function spaces
      2. Hilbert spaces
      3. Evaluation functional (cf function space)
      4. Reproducing kernel Hilbert spaces
        1. Common kernel functions
        2. Constructing kernel functions

In pictures...

In maths...

...

Notes

Despite the classical linear reading pattern depicted in the table of contents above, the book could be opened at any page from where it should provide links to more advanced topics but also to all the prerequisites for this page. Hopefully, you can enjoy the book by exploring in both directions starting from an intriguing topic in the list above.