In a machine learning problem characterized by a given input space and a given label space, a predictive model is a function that assigns a label to every possible input; and we call "predictions" the outputs of such a function.
Learning can be defined as the search for the best predictive model among all the functions that fulfill this definition. However, this approach is usually not as directly implemented by learning algorithms for two major reasons.
First, this approach might be algorithmically intractable, since it is very difficult for an algorithm to implement any function from the input space to the label space. Second, even if we could implement all these functions, we would not be able to determine which one is the best one and the algorithm would be prone to overfitting and might easily fall in the trap of learning by heart.
This is why most learning algorithms learn a function from a restricted model space.
Thus, learning can be formulated as the search for a model in the space of all such functions, denoted $\Y^\X$.
However, most learning algorithms search for a function within a restricted model space.