How to use machine learning in web pages:
a tutorial for ML.js

ML.js is a javascript library developed as a part of the MLweb project and aimed at providing machine learning capabilities for web applications.

This short tutorial will show how to build a small and simple example application. Further information can be found in the main documentation.

The goal is to create a web page showing small pieces of text one after the other, at the most natural speed for the reader. Machine learning is needed here because that speed depends on different factors such as the number of words, the length of the words, the punctuation marks and, of course, the reader himself. Thus, it cannot be easily guessed.

The idea is to let the user click to see the next piece of text while recording the speed at which these events occur. From these data, we can then learn a model that predicts the reading time of any piece of text. Once this model is obtained, the user no longer needs to click and the text is displayed automatically.

See the live demo.

HTML part

First, we need to load the machine learning library, typically in the head:
<html>
<head>
	<meta charset="UTF-8">
	<script src="http://mlweb.loria.fr/ml.js"> </script>
</head>
Then, the document body merely contains a clickable div which displays the next piece of text at every click:
<body onload="init();">

	<h1>Learning how to uncover text at the right,<br> reader-specific, pace</h1>

	<div id="textdiv" onclick="next();" style="height:300px;width: 60%;background:rgba(0,0,0,0.1);"></div>

</body>
</html>

Javascript code

In the javascript part, we start by defining a few global variables:
var t;		// time of the click
var tprev;	// time of the previous click 

var sentencedata = new Array();	// this will store the data on the pieces of text for training 
var readingtime = new Array();	// and this will store the corresponding reading times

var text = "Once upon a time... "; // a very long text with many sentences of various length

var n = -1;				// index of the sentence being displayed
var sentences = text.split(".");	// Array of pieces of text (here, sentences terminated by a '.')
var sentence;				// Current piece of text

var auto = false; 		// Are we in automatic mode yet?
The initialization procedure called by body.onload simply displays a few indications for the user:
function init() {
	t = 0;
	textdiv.innerHTML = "I will display a text sentence by sentence upon your mouse clicks in the grey box.
				 After a while, the grey box will disappear, meaning that I have learned the 
				 right reading pace for you and I can decide when to skip to the next sentence, 
				 without waiting for your clicks. "; 
}
Most of the work is done in the next function called on mouse clicks. This function changes the text appearing in the main div and starts a timer to measure the reading time with
	t = (new Date()).getTime();	// store current time in ms

	n++;	
	sentence = sentences[n]; 
	textdiv.innerHTML = sentences[n] + "."; // show next piece of text
In addition, if this is not the first click, we need to record some data about the previously displayed text. Since the aim is to predict the reading time, we record the elapsed time since the text was displayed and some variables that might influence this time.
For instance, it might be reasonable to assume that the reading speed depends on the number of words, the average word length, the length of the longest word and the number of commas, colons and semicolons. These data are computed by the following function
function sentenceStats( str ) {
	// Compute statistics for the text String str
	var words = str.split(" ");	// Array of words
	var nWords = words.length;  // Number of words
	var avgWordLength = 0;      // Average word length
	var maxWordLength = 0;      // Max word length
	for ( var w = 0; w < nWords; w++) {
		avgWordLength += words[w].length;
		if( words[w].length > maxWordLength ) 
			maxWordLength = words[w].length; 
	}
	avgWordLength /= nWords;
	
	var reg=new RegExp("[,;:]+", "g");
	var nSigns = str.split(reg).length;  // number of punctuation marks likely to slow down the read
	
	return [ nWords, avgWordLength, maxWordLength, nSigns ];
}
So, the function next() that collects data for training might look like this:
function next() {
	if ( n >= 0) {
		// Record data for the current sentence
		// and append it to the training set:
		sentencedata.push( sentenceStats(sentence) ); 

		// also record the reading time (in ms):
		tprev = t;
		t = (new Date()).getTime();
	
		readingtime.push( t-tprev );			
	}
	else {
		// Start timing
		t = (new Date()).getTime();		
	}
	
	// Write next sentence
	n++;
	sentence = sentences[n]; 	
	textdiv.innerHTML = sentences[n] + "."; 
}
Now we need to modifiy this function to actually train a predictive model. We will do that after collecting enough data, say data about 10 sentences. Since the value to be predicted is quantitative (not just a category but a number), the model will be a Regression model and not a Classifier. More precsely, we will use the most simple regression method, known as the least squares method.
The training functions in ML.js require a data Matrix and the corresponding Vector of labels containing the value to be predicted for each row in the matrix.
function next() {
	if ( n == 10 ) {
		// Create a predictive model 
		model = new Regression(LeastSquares);
		
		// Convert our Array of data to a Matrix:
		var X = array2mat( sentencedata );
		
		// An Array of labels can be used directly as a vector of labels:
		var Y = readingtime; 
		
		// Train the model on the data matrix X and target vector Y
		model.train( X, Y );
		
		// Turn on the automatic mode for the application:
		auto = true;		
		
		// Remove the grey box to inform the user that everything is automatic
		textdiv.style.background = "white";
		
		// and call for the next piece of text
		setTimeout(next,0);
	}
	else if ( n >= 0) {
		// Record data for the current sentence: 		
		...
	}
	else {
		// Start timing
		...
	}		
	
	// Write next sentence
	...
}
The final step consists in implementing the automatic display of text pieces based on the model predictions.
function next() {
	if ( auto ) {
		// Prepare the data for the next sentence: 		
		n++;
		sentence = sentences[n]; 
		var x = sentenceStats(sentence) ;	// x = vector of numbers describing the sentence
		
		// Predict the reading time with the model for this sentence:
		var predictedTime = model.predict( x );	
		
		// Show the next piece of text in the predicted amount of time
		setTimeout(next, predictedTime ) ;
	}
	else if ( n == 10 ) {
		// Create a predictive model 
		...
	}
	else if ( n >= 0) {
		// Record data for the current sentence: 		
		...
	}
	else {
		// Start timing
		...
	}		
	
	// Write next sentence
	...
}
Here is the complete source code.

Final notes

This is basically the source code of this live demo, which is only augmented with a bit of CSS.

Here, the machine learning part is very simple because the application is simple: the least squares regression method works already satisfactorily. More complex problems might call for more complex methods with parameters to tune (see the list of available methods). However, the default AutoReg method can be used to automatically find the best method and parameters. To use it, simply change

	model = new Regression(LeastSquares);
by
	model = new Regression();  // use default AutoReg method
However, note that the AutoReg method can be much slower because it actually tests all methods to find the best one.

Likewise, the computations are not very demanding here: the data set is small (10 examples in dimension 4) and the default method is one of the most simple ones. Thus, we could implement the learning part in the current scope, without blocking the browser for too long. However, in most situations, we should compute in a background lab by replacing

	model = new Regression(LeastSquares);
	var X = array2mat( sentencedata );
	var Y = readingtime; 
	model.train( X, Y );	
	auto = true;
...
	// Predict the reading time with the model for this sentence:
	var predictedTime = model.predict( x );		
	setTimeout(next, predictedTime ) ;	
by
var lab = new MLlab(); // this should be a global variable
...
	lab.exec("model = new Regression(LeastSquares);");  // create a model in the lab
	lab.load(sentencedata, "X");            // load sentencedata in the matrix X
	lab.load(readingtime, "Y");             // load readingtime in the vector Y

	// Train the model in the lab 
	// and set the automatic mode with a callback called when the model is trained 
	lab.exec("model.train( X, Y );", function () { auto=true; });   
...
	// Predict the reading time with the model in the lab
	// (here the data in x are simply passed directly in the string)
	// and use that predicted time in the callback: 
	lab.exec("model.predict( [" + x + "] );", function( result ) {
				setTimeout(next, result );
			});