(SecKrigingModels)= # Kriging models ## Components of Kriging models **libKriging** makes available several kinds of Kriging models as commonly used in the field of computer experiments. All models involve a stochastic process $y(\m{x})$ indexed by a vector $\m{x} \in \mathbb{R}^d$ of $d$ real inputs $x_k$, sometimes called the design vector. The response variable or output $y$ is assumed to be observed for $n$ values $\m{x}_i$ of the input vector with corresponding response values $y_i$ for $i=1$, $\dots$, $n$. The response values are considered as realizations of random variables. The models involve the following elements or components. * **Trend** A known vector-valued function $\mathbb{R}^d \to \mathbb{R}^p$ with value denoted by $\m{f}(\m{x})$. It is used in relation with an unknown vector $\bs{\beta}$ of trend parameters to provide the trend term $\mu(\m{x}) = \m{f}(\m{x})^\top \bs{\beta}$. * **Smooth Gaussian Process (GP)** An unobserved GP $\zeta(\m{x})$, at least continuous, with mean zero and known covariance kernel $C_\zeta(\m{x}, \, \m{x}')$. * **Nugget** A White noise GP $\varepsilon(\m{x})$ with variance $\tau^2$ hence with covariance kernel $\tau^2 \delta(\m{x},\,\m{x}')$ where $\delta$ is the Dirac function $\delta(\m{x}, \,\m{x}') := 1_{\{\m{x} = \m{x}'\}}$. * **Noise** A collection of independent random variables $\varepsilon_i$ with mean zero and variances $\tau^2_i$. Note that the words *nugget* and *noise* are sometimes considered as equivalent. Yet in **libKriging** *nugget* will be used only when a single path is considered for the stochastic process, in which case no duplicated value can exist for the vector of inputs. When a nugget term is used, the process $y(\m{x})$ is discontinuous, so the prediction at a new value $\m{x}^\star$ will be identical to $y(\m{x}_i)$ if it happens that $\m{x}^\star = \m{x}_i$ for some $i$. We may say that the prediction is an interpolation, in relation with this feature. However, in the usual acceptation of this term, interpolation involves the use of a *smooth* function, say at least continuous. **Note** The so-called *Gaussian-Process Regression* framework corresponds to the noisy case. However duplicated designs are generally allowed and the noise r.vs are assumed to have either a common unknown variance $\tau^2$ or a variance $\tau^2(\m{x})$ depending on the design according to some specification. **libKriging** implements the three classes `"Kriging"`, `"NoiseKriging"` and `"NuggetKriging"` of objects corresponding to Kriging models. In each class we find the linear trend, the smooth GP. The difference relates to the presence of a nugget or noise term. ## Classes of Kriging model objects To describe the three classes of Kriging models, we assume that $n$ observations are given corresponding to $n$ input vectors $\m{x}_i$. - **The `Kriging` class** correspond to observations of the form $$ \m{y}(\m{x}_i) = \underset{\text{trend}}{ \underbrace{\m{f}(\m{x}_i)^\top \bs{\beta}}} + \underset{\text{smooth GP}}{\underbrace{\zeta(\m{x}_i)}}, \qquad i= 1,\, \dots,\, n. $$ - **The `"NuggetKriging"` class** corresponds to observations of the form $$ \m{y}(\m{x}_i) = \underset{\text{trend}}{ \underbrace{\m{f}(\m{x}_i)^\top \bs{\beta}}} + \underset{\text{smooth GP}}{\underbrace{\zeta(\m{x}_i)}} + \underset{\text{nugget}}{\underbrace{\varepsilon(\m{x}_i)}}, \qquad i= 1,\, \dots,\, n. $$ The sum $\eta(\m{x}) := \zeta(\m{x}) + \varepsilon(\m{x})$ defines a GP with discontinuous paths and covariance kernel $C(\m{x}, \m{x}') + \tau^2\delta(\m{x},\,\m{x}')$. - **The `"NoiseKriging"` class** corresponds to observations of the form $$ y_i = \underset{\text{trend}}{ \underbrace{\m{f}(\m{x}_i)^\top \bs{\beta}}} + \underset{\text{smooth GP}}{\underbrace{\zeta(\m{x}_i)}} + \underset{\text{noise}}{\underbrace{\varepsilon_i}}, \qquad i= 1,\, \dots,\, n $$ where the noise r.vs $\varepsilon_i$ are Gaussian with mean zero and known variances $\tau_i^2$. Although the response $y_i$ corresponds to the input $\m{x}_i$ as for the classes `"Kriging"` and `"NugggetKriging"`, there can be several observations made at the same input $\m{x}_i$. We may then speak of *duplicated* inputs. ## Matrix formalism and assumptions The $n$ input vectors $\m{x}_i$ are conveniently considered as the (transposed) rows of a matrix. * The $n \times d$ design or input matrix $\m{X}$ having $\m{x}_i^\top$ as its row $i$. * The $n \times p$ trend matrix $\m{F}(\m{X})$ or simply $\m{F}$ having $\m{f}(\m{x}_i)^\top$ as its row $i$. * The $n \times n$ covariance matrix $\m{C}(\m{X},\, \m{X}) =[C(\m{x}_i,\,\m{x}_j)]_{i,j}$ is sometimes called the Gram matrix and is often simply denoted as $\m{C}$. The observations for a `Kriging` model write in matrix notations $\m{y} = \m{F} \bs{\beta} + \bs{\zeta}$, while those for `NuggetKriging` and `NoiseKriging` models write as $\m{y} = \m{F} \bs{\beta} + \bs{\zeta} + \bs{\varepsilon}$. Similar notations are used if a sequence of $n^\star$ "new" designs $\m{x}_i^\star$ are considered, resulting in matrices with $n^\star$ rows $\m{X}^\star$ and $\m{F}^\star$. It must be kept in mind that unless explicitly stated otherwise, the covariance matrix $\m{C}$ is *that of the non-trend component* $\bs{\eta}$ including the smooth GP plus the nugget or noise. It will be assumed that the matrix $\m{F}$ has rank $p$ (hence that $n \geqslant p$) and that the matrix $\m{C}$ is positive definite. Inasmuch a positive kernel $C_\zeta(\m{x},\, \m{x}')$ is used the matrix $\m{C}_\zeta(\m{X}, \, \m{X})$ is positive definite for every design $\m{X}$ corresponding to distinct inputs $\m{x}_i$. **Note** {cite:t}`BerlinetThomasagnant_RKHS` define Kriging models as the sum of a deterministic trend and a stochastic process with stationary increments, as is the case for splines. So the name *Kriging model* is understood here in a more restrictive way. See the [Prediction and simulation](SecPredAndSim) page.