# `WarpKriging` ## Description Create a `WarpKriging` Object representing a Trend $+$ GP Model with **per-variable input warping**. Each input dimension is independently mapped through a learnable transform $w_j$ before the base kernel is evaluated: $$ k(\mathbf{x}, \mathbf{x}') \;=\; \sigma^2 \cdot k_{\text{base}}\!\bigl(\Phi(\mathbf{x}), \Phi(\mathbf{x}')\,;\, \theta\bigr), \qquad \Phi(\mathbf{x}) = \bigl[\,w_1(x_1),\, w_2(x_2),\, \dots,\, w_d(x_d)\,\bigr]. $$ All warping parameters are optimised jointly with the GP hyper-parameters $(\sigma^2, \theta, \beta)$ by maximising the marginal log-likelihood. ## Usage Just build the model: * Python ```python wk = WarpKriging(warping = ["kumaraswamy", "kumaraswamy"], kernel = "matern5_2", noise = None) # later, call wk.fit(y, X, ..., noise = None) ``` * R ```r wk <- WarpKriging(warping = c("kumaraswamy", "kumaraswamy"), kernel = "matern5_2", noise = NULL) # later, call wk$fit(y, X, ..., noise = NULL) ``` * Matlab/Octave ```octave wk = WarpKriging(warping = {"kumaraswamy", "kumaraswamy"}, kernel = "matern5_2", noise = []) % later, call wk.fit(y, X, ..., noise = []) ``` or, build and fit at the same time: * Python ```python wk = WarpKriging( y, X, warping = ["kumaraswamy", "kumaraswamy"], kernel = "matern5_2", regmodel = "constant", normalize = False, optim = "BFGS+Adam", objective = "LL", parameters = None, noise = None, ) ``` * R ```r wk <- WarpKriging( y, X, warping = c("kumaraswamy", "kumaraswamy"), kernel = "matern5_2", regmodel = "constant", normalize = FALSE, optim = "BFGS+Adam", objective = "LL", parameters = NULL, noise = NULL ) ``` * Matlab/Octave ```octave wk = WarpKriging( ... y, X, ... warping = {"kumaraswamy", "kumaraswamy"}, ... kernel = "matern5_2", ... regmodel = "constant", ... normalize = false, ... optim = "BFGS+Adam", ... objective = "LL", ... parameters = [], ... noise = [] ... ) ``` * Julia ```julia using jlibkriging # build only wk = WarpKriging(warping=["kumaraswamy", "kumaraswamy"], kernel="matern5_2", noise=nothing) # later, call fit(wk, y, X, ..., noise=nothing) # or build and fit at the same time wk = WarpKriging(y, X, warping=["kumaraswamy", "kumaraswamy"], kernel="matern5_2", regmodel="constant", normalize=false, optim="BFGS+Adam", objective="LL", parameters=nothing, noise=nothing) ``` ## Arguments Argument |Description ------------- |---------------- `y` | Numeric vector of response values. `X` | Numeric matrix of input design. `warping` | Vector of warping specifications, **one per input dimension**. See *Warping specifications* below. `kernel` | Base covariance model applied in the warped feature space: `"gauss"`, `"exp"`, `"matern3_2"`, `"matern5_2"`. `regmodel` | Universal Kriging linear trend: `"constant"`, `"linear"`, `"interactive"`, `"quadratic"`. `normalize` | Logical. If `TRUE` both the input matrix `X` and the response `y` are normalized to take values in the interval $[0, 1]$. `optim` | Optimisation method used to fit warp and GP hyper-parameters jointly. Possible values are: `"BFGS+Adam"` (bi-level: L-BFGS on $\log\theta$ inside Adam on the warp parameters — the default), `"BFGS"` (joint L-BFGS-B on all parameters) and `"none"` (keep `parameters` as-is). `objective` | Character giving the objective function to optimise. Currently `"LL"` (Log-Likelihood). `parameters` | Initial values and options for the optimiser. Named list / dict with optional entries `"sigma2"`, `"theta"`, `"max_iter_bfgs"`, `"max_iter_adam"`, `"adam_lr"`. `noise` | Either a numeric vector of per-observation noise variances, `"nugget"` to estimate a homogeneous nugget, or `NULL` (default) for noise-free interpolation. ## Warping specifications Each entry of `warping` is a string describing the transform for one input variable. Supported forms (see also the [**Warping Gallery**](../warping/index.md) for full examples and plots): Spec | Description | Params | Details -----|-------------|--------|------- `"none"` | identity (no warping) | 0 | [→](../warping/none.md) `"affine"` | $w(x) = a\,x + b$ | 2 | [→](../warping/affine.md) `"boxcox"` | Box–Cox $w(x) = (x^\lambda - 1)/\lambda$ | 1 | [→](../warping/boxcox.md) `"kumaraswamy"` | Kumaraswamy CDF on $[0,1]$: $w(x) = 1 - (1 - x^a)^b$ | 2 | [→](../warping/kumaraswamy.md) `"knots(K)"` | piecewise-linear monotone, `K` knots | $K+1$ | [→](../warping/knots.md) `"neural_mono(H)"` | small monotone neural network, `H` hidden units | $3H+1$ | [→](../warping/neural_mono.md) `"mlp(h1:h2,q,act)"` | unconstrained MLP (multi-dim output `q`, activation `act`) | varies | [→](../warping/mlp.md) `"categorical(L,q)"` | categorical embedding: `L` levels in $\mathbb{R}^q$ | $L\cdot q$ | [→](../warping/categorical.md) `"ordinal(L)"` | ordered positions for `L` discrete levels | $L-1$ | [→](../warping/ordinal.md) `"mlp_joint(h1:h2,q,act)"` | single MLP taking **all** inputs jointly | varies | (see [`MLPKriging`](MLPKriging.md)) Defaults for omitted sub-arguments: `"neural_mono"` → `"neural_mono(8)"`, `"mlp"` → `"mlp(16:8,2,selu)"`, `"categorical(5)"` → `"categorical(5,2)"`. ## Details The warped representation $\Phi(\mathbf{x})$ is the concatenation of all per-variable outputs. The GP kernel then operates in this warped space. The default optimiser is a **bi-level strategy**: an outer Adam loop updates the warp parameters, while an inner L-BFGS loop optimises $\log\theta$ with analytical gradients. The variance $\sigma^2$ and trend coefficients $\beta$ are concentrated out (computed analytically) at each step. Reference: * Garrido-Merchán & Hernández-Lobato (2020), *Dealing with categorical and integer-valued variables in Bayesian Optimization with GP*. * Saves et al. (2023), *SMT 2.0: surrogate modelling toolbox with mixed variables support*. ## Value An object `"WarpKriging"`. Should be used with its `predict`, `simulate`, `update` methods. ## Examples ```r f <- function(x) 1 - 1 / 2 * (sin(12 * x) / (1 + x) + 2 * cos(7 * x) * x^5 + 0.7) X <- as.matrix(seq(0.05, 0.95, length.out = 10)) y <- f(X) wk <- WarpKriging( y, X, warping = "kumaraswamy", kernel = "gauss", parameters = list(max_iter_adam = "20", max_iter_bfgs = "10") ) print(wk) x <- as.matrix(seq(0, 1, length.out = 101)) p <- wk$predict(x, return_stdev = TRUE) plot(f) points(X, y) lines(x, p$mean, col = "blue") polygon(c(x, rev(x)), c(p$mean - 2 * p$stdev, rev(p$mean + 2 * p$stdev)), border = NA, col = rgb(0, 0, 1, 0.2)) ``` ### Results ```{literalinclude} examples/WarpKriging.md.Rout :language: bash ``` ![](examples/WarpKriging.md.png)