# Leave-one-out

Consider \(n\) observations \(y_i\) from a Kriging model corresponding to
the “`Kriging`

” case with no nugget or noise. For \(i=1\), \(\dots\), \(n\)
let \(\widehat{y}_{i|-i}\) be the prediction of \(y_i\) based on the
vector \(\mathbf{y}_{-i}\) obtained by omitting the observation \(i\) in
\(\mathbf{y}\). The vector of *leave-one-out* (LOO) predictions is
defined by

and the leave-one-out Sum of Square Errors criterion is defined by

It can be shown that

where \(\mathbf{B}\) is the Bending Energy Matrix (BEM) and \(\mathbf{D}_{\mathbf{B}}\) is the diagonal matrix with the same diagonal as \(\mathbf{B}\).

By minimizing \(\texttt{SSE}_{\texttt{LOO}}\) with respect to the covariance parameters \(\theta_\ell\) we get estimates of these. Note that similarly to the profile likelihood, the LOO MSE does not depend on the vector \(\boldsymbol{\beta}\) of trend parameters.

An estimate of the GP variance \(\sigma^2\) is given by

where \(\mathring{\mathbf{B}}:= \sigma^2 \mathbf{B}\) does not depend on \(\sigma^2\) and \(\mathbf{D}_{\mathring{\mathbf{B}}}\) is the diagonal matrix having the same diagonal as \(\mathring{\mathbf{B}}\).

The LOO estimation can be preferable to the maximum-likelihood estimation when the covariance kernel is mispecified, see Bachoc [Bac12] who provides many details on the criterion \(\texttt{SSE}_{\texttt{LOO}}\), including its derivatives.