Leave-one-out
Consider \(n\) observations \(y_i\) from a Kriging model corresponding to
the “Kriging” case with no nugget or noise. For \(i=1\), \(\dots\), \(n\)
let \(\widehat{y}_{i|-i}\) be the prediction of \(y_i\) based on the
vector \(\m{y}_{-i}\) obtained by omitting the observation \(i\) in
\(\m{y}\). The vector of leave-one-out (LOO) predictions is
defined by
and the leave-one-out Sum of Square Errors criterion is defined by
It can be shown that
where \(\m{B}\) is the Bending Energy Matrix (BEM) and \(\m{D}_{\m{B}}\) is the diagonal matrix with the same diagonal as \(\m{B}\).
By minimizing \(\texttt{SSE}_{\texttt{LOO}}\) with respect to the covariance parameters \(\theta_\ell\) we get estimates of these. Note that similarly to the profile likelihood, the LOO MSE does not depend on the vector \(\bs{\beta}\) of trend parameters.
An estimate of the GP variance \(\sigma^2\) is given by
where \(\mathring{\m{B}}:= \sigma^2 \m{B}\) does not depend on \(\sigma^2\) and \(\m{D}_{\mathring{\m{B}}}\) is the diagonal matrix having the same diagonal as \(\mathring{\m{B}}\).
The LOO estimation can be preferable to the maximum-likelihood estimation when the covariance kernel is mispecified, see Bachoc [Bac12] who provides many details on the criterion \(\texttt{SSE}_{\texttt{LOO}}\), including its derivatives.