(SecLOO)=
# Leave-one-out

Consider $n$ observations $y_i$ from a Kriging model corresponding to
the "`Kriging`" case with no nugget or noise.  For $i=1$, $\dots$, $n$
let $\widehat{y}_{i|-i}$ be the prediction of $y_i$ based on the
vector $\m{y}_{-i}$ obtained by omitting the observation $i$ in
$\m{y}$. The vector of *leave-one-out* (LOO) predictions is
defined by

$$
  \widehat{\m{y}}_{\mathtt{LOO}} :=
  [ \widehat{y}_{1|-1}, \dots, \,  \widehat{y}_{n|-n} ]^\top,
$$

and the leave-one-out Sum of Square Errors criterion is defined by

$$
  \texttt{SSE}_{\texttt{LOO}} :=
  \sum_{i=1}^n \{ y_i - \widehat{y}_{i|-i} \}^2 =
  \| \m{y} - \widehat{\m{y}}_{\texttt{LOO}} \|^2.
$$

It can be shown that

$$ 
\m{y} - \widehat{\m{y}}_{\texttt{LOO}} =
\m{D}_{\m{B}}^{-1}\m{B}\,\m{y} 
$$ 

where $\m{B}$ is the [Bending Energy Matrix](SecBending) (BEM)
and $\m{D}_{\m{B}}$ is the diagonal matrix with the same
diagonal as $\m{B}$.

By minimizing $\texttt{SSE}_{\texttt{LOO}}$ with respect to the
covariance parameters $\theta_\ell$ we get estimates of these. Note
that similarly to the profile likelihood, the LOO MSE does not depend
on the vector $\bs{\beta}$ of trend parameters.

An estimate of the GP variance $\sigma^2$ is given by

$$
   \widehat{\sigma}^2_{\texttt{LOO}} = 
   \frac{1}{n} \, \m{y}^\top \mathring{\m{B}} 
   \m{D}_{\mathring{\m{B}}}^{-1} 
   \mathring{\m{B}} \m{y}
$$

where $\mathring{\m{B}}:= \sigma^2 \m{B}$ does not depend on
$\sigma^2$ and $\m{D}_{\mathring{\m{B}}}$ is the diagonal
matrix having the same diagonal as $\mathring{\m{B}}$.

The LOO estimation can be preferable to the maximum-likelihood
estimation when the covariance kernel is mispecified, see
{cite:t}`Bachoc_ParametricCov` who provides many details on the
criterion $\texttt{SSE}_{\texttt{LOO}}$, including its derivatives.