2016-12-28

5 Steps to Grasping Modern ML

Recently, I've been teaching an advanced Algorithms course, which concluded in a short introduction to Machine Learning. Obviously, ML is its own track in Computer Science curriculum, but, nevertheless, there's a substantial overlap between these 2 disciplines: algorithms and ML. However, ML adds another dimension that is not usually considered in the world of algorithmic thinking.

Anyhow, this experience helped me formulate the minimal selection of concepts that need to be grasped in order to start practical ML work. An ML crash course so to say.

As I've never seen such compilation, I'd like to share it in this post. Here are the 5 essential steps to understanding

Step 1. Understanding the ML problem formulation. kNN algorithm

The first thing one needs to realize is the difference between an ML problem and the common programming problems. Here training/test data and an objective function should be explained alongside with the 3 common "learning" approaches: supervised, unsupervised, and reinforcement. A widely used and good initial examples is the Iris data set and kNN algorithm.

Step 2. Adding features and iterative training into the picture. Perceptron algorithm

The second step is introduction of the concept of feature extraction that allows approaching a problem from different angles. The Iris data set already has features, but initially they may be perceived as given. Iterative training is another common ML approach (although some popular algorithms like kNN or Decision Trees don't rely upon it). Perceptron is the simplest algorithm to explain (which still remains in practical use) and leads nicely to the next step.

A good example task and data set for this part is the Brown Corpus and the problem of POS tagging. And there's a great post outlining its soultion by Matthew Honnibal.

Step 3. Continuous vs discrete learning, gradient descent. Softmax algorithm

The obvious next step is transitioning from a discrete perceptron learning to continuous gradient descent used in Logistic regression. Andrew Ng provides a lucid connection in Part II & III of his tutorial on Linear Models. It also helps that Logistic regression and Softmax are the basic building blocks of Neural Networks that are to be discussed next. The example task for this problem may remain the same POS tagging, although others, like the ones used by Andrew, may be also utilized.

Step 4. Learning graphs (aka neural nets), backprop. Feed-forward Neural Network algorithm

As soon as we understand gradient descent and logistic regression, it's rather easy to make the next step to forming layers of such blocks to allow the combined model to "learn" higher-level feature representations. This is where the Backprop algorithm for efficient training comes into play (that is, by the way, another example of a dynamic programming algorithm). Also in this part, it's possible to talk about vector representations of words and other highly contextualized objects (landmark position in image, etc.) A great explanation of Backprop is presented in this post of Christopher Olah. Also, a good exaple data set here is the MNIST.

Step 5. Bias-variance tradeoff, regularization & ensembles. Random Forest algorithm

Finally, we should return to the beginning and revisit the learning problem, but with some practical experience already under our belt. This is where the essential bias-variance tradeoff and common ways to tackle it should be discussed: regularization and ensembles. It's also a good place to introduce the Decision Tree algorithms and the ensemble methods based upon them (Random Forest and, maybe, others) as one of the most widely-used current practical approach.

"Winning the Bias-Variance Tradeoff" by Juila Evans may be a good introductory text on this.

Overall, due to the highly condensed nature of such presentation, a lot of important things will be almost not covered. For example, unsupervised learning, CV with its convolutions, sequence models. However, I believe that with the obtained knowledge and conceptual understand of the mentioned basis those parts may be grasped quite easily.

If this plan turns out helpful to you or some essential improvements are necessary, please, leave your thoughts and comments...

No comments: