Introduction to Gaussian Processes
David J C MacKay
Feedforward neural networks such as multilayer perceptrons are
popular tools for nonlinear regression and classification problems.
From a Bayesian perspective, a choice of a neural network model can
be viewed as defining a prior probability distribution over
non-linear functions, and the neural network's learning process can
be interpreted in terms of the posterior probability distribution
over the unknown function. (Some learning algorithms search for the
function with maximum posterior probability and other Monte Carlo
methods draw samples from this posterior probability).
In the limit of large but otherwise standard networks, \citeasnoun{Radford_book}
has shown that the prior distribution over non-linear functions
implied by the Bayesian neural network falls in a class of
probability distributions known as Gaussian processes. The
hyperparameters of the neural network model determine the
characteristic lengthscales of the Gaussian process. Neal's
observation motivates the idea of discarding parameterized networks
and working directly with Gaussian processes. Computations in which
the parameters of the network are optimized are then replaced by
simple matrix operations using the covariance matrix of the Gaussian
process.
In this chapter
I will review work on this idea by \citeasnoun{williams_rasmussen:96},
\citeasnoun{Neal_gp}, \citeasnoun{williams:96} and
\citeasnoun{Gibbs_MacKay97b},
and will assess whether, for
supervised regression and classification tasks, the feedforward
network has been superceded.
Known typos in this paper:
equation 25 should read:
C_{nn'} = ... + \sigma_{\nu}^2 \delta_{nn'}
instead of:
C_{nn'} = ... + \delta_{nn'}
postscript (Cambridge UK).
postscript (Canada mirror).
David MacKay's:
home page,
publications.
bibtex file.
Canadian mirrors:
home page,
publications.
bibtex file.