WarpKriging

Description

Create a WarpKriging Object representing a Trend \(+\) GP Model with per-variable input warping. Each input dimension is independently mapped through a learnable transform \(w_j\) before the base kernel is evaluated:

\[ k(\mathbf{x}, \mathbf{x}') \;=\; \sigma^2 \cdot k_{\text{base}}\!\bigl(\Phi(\mathbf{x}), \Phi(\mathbf{x}')\,;\, \theta\bigr), \qquad \Phi(\mathbf{x}) = \bigl[\,w_1(x_1),\, w_2(x_2),\, \dots,\, w_d(x_d)\,\bigr]. \]

All warping parameters are optimised jointly with the GP hyper-parameters \((\sigma^2, \theta, \beta)\) by maximising the marginal log-likelihood.

Usage

Just build the model:

  • Python

    wk = WarpKriging(warping = ["kumaraswamy", "kumaraswamy"], kernel = "matern5_2", noise = None)
    # later, call wk.fit(y, X, ..., noise = None)
    
  • R

    wk <- WarpKriging(warping = c("kumaraswamy", "kumaraswamy"), kernel = "matern5_2", noise = NULL)
    # later, call wk$fit(y, X, ..., noise = NULL)
    
  • Matlab/Octave

    wk = WarpKriging(warping = {"kumaraswamy", "kumaraswamy"}, kernel = "matern5_2", noise = [])
    % later, call wk.fit(y, X, ..., noise = [])
    

or, build and fit at the same time:

  • Python

    wk = WarpKriging(
        y, X,
        warping    = ["kumaraswamy", "kumaraswamy"],
        kernel     = "matern5_2",
        regmodel   = "constant",
        normalize  = False,
        optim      = "BFGS+Adam",
        objective  = "LL",
        parameters = None,
        noise      = None,
    )
    
  • R

    wk <- WarpKriging(
      y, X,
      warping    = c("kumaraswamy", "kumaraswamy"),
      kernel     = "matern5_2",
      regmodel   = "constant",
      normalize  = FALSE,
      optim      = "BFGS+Adam",
      objective  = "LL",
      parameters = NULL,
      noise      = NULL
    )
    
  • Matlab/Octave

    wk = WarpKriging( ...
      y, X, ...
      warping    = {"kumaraswamy", "kumaraswamy"}, ...
      kernel     = "matern5_2", ...
      regmodel   = "constant", ...
      normalize  = false, ...
      optim      = "BFGS+Adam", ...
      objective  = "LL", ...
      parameters = [], ...
      noise      = [] ...
    )
    
  • Julia

    using jlibkriging
    # build only
    wk = WarpKriging(warping=["kumaraswamy", "kumaraswamy"], kernel="matern5_2", noise=nothing)
    # later, call fit(wk, y, X, ..., noise=nothing)
    
    # or build and fit at the same time
    wk = WarpKriging(y, X,
                     warping=["kumaraswamy", "kumaraswamy"],
                     kernel="matern5_2",
                     regmodel="constant",
                     normalize=false,
                     optim="BFGS+Adam",
                     objective="LL",
                     parameters=nothing,
                     noise=nothing)
    

Arguments

Argument

Description

y

Numeric vector of response values.

X

Numeric matrix of input design.

warping

Vector of warping specifications, one per input dimension. See Warping specifications below.

kernel

Base covariance model applied in the warped feature space: "gauss", "exp", "matern3_2", "matern5_2".

regmodel

Universal Kriging linear trend: "constant", "linear", "interactive", "quadratic".

normalize

Logical. If TRUE both the input matrix X and the response y are normalized to take values in the interval \([0, 1]\).

optim

Optimisation method used to fit warp and GP hyper-parameters jointly. Possible values are: "BFGS+Adam" (bi-level: L-BFGS on \(\log\theta\) inside Adam on the warp parameters — the default), "BFGS" (joint L-BFGS-B on all parameters) and "none" (keep parameters as-is).

objective

Character giving the objective function to optimise. Currently "LL" (Log-Likelihood).

parameters

Initial values and options for the optimiser. Named list / dict with optional entries "sigma2", "theta", "max_iter_bfgs", "max_iter_adam", "adam_lr".

noise

Either a numeric vector of per-observation noise variances, "nugget" to estimate a homogeneous nugget, or NULL (default) for noise-free interpolation.

Warping specifications

Each entry of warping is a string describing the transform for one input variable. Supported forms (see also the Warping Gallery for full examples and plots):

Spec

Description

Params

Details

"none"

identity (no warping)

0

"affine"

\(w(x) = a\,x + b\)

2

"boxcox"

Box–Cox \(w(x) = (x^\lambda - 1)/\lambda\)

1

"kumaraswamy"

Kumaraswamy CDF on \([0,1]\): \(w(x) = 1 - (1 - x^a)^b\)

2

"knots(K)"

piecewise-linear monotone, K knots

\(K+1\)

"neural_mono(H)"

small monotone neural network, H hidden units

\(3H+1\)

"mlp(h1:h2,q,act)"

unconstrained MLP (multi-dim output q, activation act)

varies

"categorical(L,q)"

categorical embedding: L levels in \(\mathbb{R}^q\)

\(L\cdot q\)

"ordinal(L)"

ordered positions for L discrete levels

\(L-1\)

"mlp_joint(h1:h2,q,act)"

single MLP taking all inputs jointly

varies

(see MLPKriging)

Defaults for omitted sub-arguments: "neural_mono""neural_mono(8)", "mlp""mlp(16:8,2,selu)", "categorical(5)""categorical(5,2)".

Details

The warped representation \(\Phi(\mathbf{x})\) is the concatenation of all per-variable outputs. The GP kernel then operates in this warped space.

The default optimiser is a bi-level strategy: an outer Adam loop updates the warp parameters, while an inner L-BFGS loop optimises \(\log\theta\) with analytical gradients. The variance \(\sigma^2\) and trend coefficients \(\beta\) are concentrated out (computed analytically) at each step.

Reference:

  • Garrido-Merchán & Hernández-Lobato (2020), Dealing with categorical and integer-valued variables in Bayesian Optimization with GP.

  • Saves et al. (2023), SMT 2.0: surrogate modelling toolbox with mixed variables support.

Value

An object "WarpKriging". Should be used with its predict, simulate, update methods.

Examples

f <- function(x) 1 - 1 / 2 * (sin(12 * x) / (1 + x) + 2 * cos(7 * x) * x^5 + 0.7)
X <- as.matrix(seq(0.05, 0.95, length.out = 10))
y <- f(X)

wk <- WarpKriging(
  y, X,
  warping = "kumaraswamy",
  kernel = "gauss",
  parameters = list(max_iter_adam = "20", max_iter_bfgs = "10")
)
print(wk)

x <- as.matrix(seq(0, 1, length.out = 101))
p <- wk$predict(x, return_stdev = TRUE)
plot(f)
points(X, y)
lines(x, p$mean, col = "blue")
polygon(c(x, rev(x)), c(p$mean - 2 * p$stdev, rev(p$mean + 2 * p$stdev)),
        border = NA, col = rgb(0, 0, 1, 0.2))

Results

* WarpKriging
* data: 10x[0.05,0.95] -> 10x[0.163421,0.976851]
* trend constant (est.): 126.685
* variance (est.): 2.63805e+08
* covariance:
  * kernel: gauss
  * range (est.): 9
  * warpings:
      x0: "kumaraswamy"    Kumaraswamy(a=1.01912, b=0.981236)
  * total warp params: 2
  * fit:
    * objective: LL
    * optim: BFGS+Adam