GP#

class fvgp.GP(x_data, y_data, init_hyperparameters=None, noise_variances=None, compute_device='cpu', kernel_function=None, kernel_function_grad=None, noise_function=None, noise_function_grad=None, prior_mean_function=None, prior_mean_function_grad=None, gp2Scale=False, dask_client=None, gp2Scale_batch_size=10000, linalg_mode=None, ram_economy=False, args=None)[source]#

This class provides all the tools for a single-task Gaussian Process (GP). Use fvGP for multi-task GPs. However, the fvGP class inherits all methods from this class. This class allows full HPC distributed training via the hgdl package, large-scale sparse GPs via gp2Scale and offers GPU support.

V … number of input points

D … input space dimensionality

N … arbitrary integers (N1, N2,…)

Parameters:
  • x_data (np.ndarray or list) – The input point positions. Shape (V x D), where D is the fvgp.GP.index_set_dim. For single-task GPs, the index set dimension = input space dimension. For multi-task GPs, the index set dimension = input space dimension + 1. If dealing with non-Euclidean inputs x_data should be a list, not a numpy array. In this case, both the index set and the input space dim are set to 1.

  • y_data (np.ndarray) – The values of the data points. Shape (V) or (V, N). If shape (V,N) the algorithm will run N independent GPs. This is not to be confused with multi-task learning. In this case, all GPs have to have the same prior mean function.

  • init_hyperparameters (np.ndarray, optional) – Vector of hyperparameters used to initiate the GP. The default is an array of ones with the right length for the anisotropic Matern kernel with automatic relevance determination (ARD). If gp2Scale is enabled, the default kernel changes to the anisotropic Wendland kernel. The full hyperparameter vector is passed to the kernel, mean, and noise callables, but the index ranges used by each callable are disjoint and user-defined. Each callable must only read the indices reserved for it. The gradient computation relies on this: when a hyperparameter index belongs to the mean function its kernel derivative is assumed zero, and vice versa.

  • noise_variances (np.ndarray, optional) – A numpy array defining the uncertainties/noise in the y_data in form of a point-wise variance. Shape (V). Note: if no noise_variances are provided here, the noise_function callable will be used; if the callable is not provided, the noise variances will be set to abs(np.mean(y_data)) / 100.0. If noise covariances are required (correlated noise), make use of the noise_function. Only provide a noise function OR noise_variances, not both. If the shape of y_data is (V,N) the noise is still of shape (V), e.g., the outputs must have the same noise in this scenario.

  • compute_device (str, optional) – One of cpu or gpu, determines how linear algebra computations are executed. The default is cpu. For gpu, pytorch or cupy has to be installed manually. For advanced options see args. If gp2Scale is enabled but no kernel is provided, the choice of the compute_device will be particularly important. In that case, the default Wendland kernel will be computed on the cpu or the gpu which will significantly change the compute time depending on the compute architecture.

  • kernel_function (Callable, optional) – A symmetric positive definite covariance function (a kernel) that calculates the covariance between data points. It is a function of the form k(x1,x2,hyperparameters, [args]). args is optional and is used to make fvgp.GP.args available. The input x1 is a N1 x D array of positions, x2 is a N2 x D array of positions, the hyperparameters argument is a 1d array of length D+1 for the default kernel and of a different length for user-defined kernels. The default is a stationary anisotropic kernel (fvgp.GP.default_kernel()) which performs automatic relevance determination (ARD). The output is a matrix, an N1 x N2 numpy array. This callable receives the full hyperparameter vector but must only use the indices reserved for the kernel (disjoint from mean and noise indices).

  • kernel_function_grad (Callable, optional) – A function that calculates the derivative of the kernel_function with respect to the hyperparameters. If provided, it will be used for local training (optimization) and can speed up the calculations. It accepts as input x1 (a N1 x D array of positions), x2 (a N2 x D array of positions) and hyperparameters (a 1d array of length D+1 for the default kernel). The default is an analytical gradient for the default kernel or a finite difference calculation otherwise. If ram_economy is True, the function’s input is x1, x2, hyperparameters (numpy array), and a direction (int). The output is a numpy array of shape (len(hps) x N). If ram_economy is False, the function’s input is x1, x2, and hyperparameters. The output is a numpy array of shape (len(hyperparameters) x N1 x N2). See ram_economy.

  • prior_mean_function (Callable, optional) – A function f(x, hyperparameters, [args]) that evaluates the prior mean at a set of input position. It accepts as input an array of positions (of shape N1 x D) and hyperparameters (a 1d array of length D+1 for the default kernel). Optionally, the third argument args can be defined. The return value is a 1d array of length N1. If prior_mean_function is not provided, fvgp.GP._default_mean_function() is used, which is the average of the y_data. This callable receives the full hyperparameter vector but must only use the indices reserved for the mean function (disjoint from kernel and noise indices).

  • prior_mean_function_grad (Callable, optional) – A function that evaluates the gradient of the prior_mean_function at a set of input positions with respect to the hyperparameters. It accepts as input an array of positions (of size N1 x D) and hyperparameters (a 1d array of length D+1 for the default kernel). The return value is a 2d array of shape (len(hyperparameters) x N1). If None is provided, either zeros are returned since the default mean function does not depend on hyperparameters, or a finite-difference approximation is used if prior_mean_function is provided.

  • noise_function (Callable, optional) – The noise function is a callable f(x,hyperparameters, [args]) that returns a vector (1d np.ndarray) of len(x), a matrix of shape (length(x),length(x)) or a sparse matrix of the same shape. The third argument args is optional. The input x is a numpy array of shape (N x D). The hyperparameter array is the same that is communicated to mean and kernel functions. Only provide a noise function OR a noise variance vector, not both. This callable receives the full hyperparameter vector but must only use the indices reserved for the noise function (disjoint from kernel and mean indices).

  • noise_function_grad (Callable, optional) – A function that evaluates the gradient of the noise_function at an input position with respect to the hyperparameters. It accepts as input an array of positions (of size N x D) and hyperparameters (a 1d array of length D+1 for the default kernel). The return value is a 2d np.ndarray of shape (len(hyperparameters) x N) or a 3d np.ndarray of shape (len(hyperparameters) x N x N). If None is provided, either zeros are returned since the default noise function does not depend on hyperparameters, or, if noise_function is provided but no noise function gradient, a finite-difference approximation will be used. The same rules regarding ram_economy as for the kernel definition apply here. That means the function will have an additional direction parameter.

  • gp2Scale (bool, optional) – Turns on gp2Scale. This will distribute the covariance computations across multiple workers. This is an advanced feature for HPC GPs up to 10 million data points. If gp2Scale is used, the default kernel is an anisotropic Wendland kernel which is compactly supported. There are a few things to consider (read on); this is an advanced option. If no kernel is provided, the compute_device option should be revisited. The default kernel will use the specified device to compute covariances. The default is False.

  • gp2Scale_batch_size (int, optional) – Matrix batch size for distributed computing in gp2Scale. The default is 10000.

  • dask_client (dask.distributed.Client, optional) – A dask client for gp2Scale, asynchronous training,a nd certain linear algebra operations. On HPC architecture, this client is provided by the job script. Please have a look at the examples. A local client is used as the default.

  • linalg_mode (str, optional) –

    Controls the linear-algebra backend used to solve (K+V)x=b and compute log|K+V|. The default is None, which selects "Chol" for standard GPs and automatically picks the best sparse mode for gp2Scale GPs.

    Recommended for standard (non-gp2Scale) GPs:

    • "Chol" (default) — Cholesky factorization; numerically stable and memory-efficient.

    • "CholInv" — Cholesky factorization, then explicitly stores the inverse; speeds up posterior covariance evaluation 3–10×. Avoid for datasets larger than ~5 000 points due to memory and numerical cost. Training always uses the Cholesky factor for stability.

    • "Inv" — computes and stores the explicit inverse directly (no Cholesky) even during training. Only suitable for very small datasets where posterior covariance is computed many times.

    Specialized for gp2Scale (sparse covariance matrices):

    • "sparseLU" — sparse LU factorization; good default for sparse systems up to ~50 000 points.

    • "sparseCG" — sparse conjugate-gradient iterative solver.

    • "sparseMINRES" — sparse MINRES iterative solver.

    • "sparseSolve" — direct sparse solve via scipy.

    • "sparseCGpre" — preconditioned conjugate-gradient. The preconditioner type is selected by args["sparse_preconditioner_type"] (default "ilu"; also "ic"/"incomplete_cholesky", "block_jacobi", "schwarz"/"additive_schwarz", or "amg" (requires pyamg)).

    • "sparseMINRESpre" — preconditioned MINRES; same preconditioner choices.

    • "sparseCGpre_<type>" / "sparseMINRESpre_<type>" — shortcut that sets args["sparse_preconditioner_type"] to <type> (e.g. "sparseCGpre_amg").

    Custom solver (any GP):

    Pass an iterable of three callables [f_factor, f_solve, f_logdet]:

    • f_factor(K) — receives the covariance matrix and returns a factorization object (or the matrix itself if no factorization is needed).

    • f_solve(obj, b) — solves the linear system and returns the solution vector.

    • f_logdet(obj) — returns the log-determinant as a scalar.

  • ram_economy (bool, optional) – Only of interest if the gradient and/or Hessian of the log marginal likelihood is/are used for the training. If True, components of the derivative of the log marginal likelihood are calculated sequentially, leading to a slow-down but much less RAM usage. If the derivative of the kernel (and noise function) with respect to the hyperparameters (kernel_function_grad) is going to be provided, it has to be tailored: for ram_economy=True it should be of the form f(x, hyperparameters, direction) and return a 2d numpy array of shape len(x1) x len(x2). If ram_economy=False, the function should be of the form f(x, hyperparameters) and return a numpy array of shape H x len(x1) x len(x2), where H is the number of hyperparameters. CAUTION: This array will be stored and is very large.

  • args (dict, optional) –

    Advanced options. Recognized keys are:

    Stochastic-Lanczos logdet (sparse modes):

    • ”random_logdet_lanczos_degree” : int; default = 20

    • ”random_logdet_error_rtol” : float; default = 0.01

    • ”random_logdet_verbose” : True/False; default = False

    • ”random_logdet_print_info” : True/False; default = False

    • ”random_logdet_lanczos_compute_device” : str; default = “cpu”/”gpu”

    Sparse iterative solver tolerances and iteration limits:

    • ”sparse_cg_tol” : float; default = 1e-5

    • ”sparse_minres_tol” : float; default = 1e-5

    • ”sparse_cg_maxiter” : int; default = None (use scipy default)

    • ”sparse_minres_maxiter” : int; default = None (use scipy default)

    • ”sparse_krylov_maxiter” : int; default = None (applies to both if the solver-specific key is not set)

    • ”sparse_block_krylov” : True/False; default = False — use a block CG variant when there are multiple RHS columns

    • ”sparse_krylov_mode” : “single”/”block”; equivalent toggle

    • ”sparse_krylov_block_size” : int — RHS block size for block CG

    Iterative-solver acceleration (sparseCG/sparseMINRES and the *pre variants):

    • ”sparse_krylov_warm_start” : True/False; default = False — feed the previous training iteration’s KVinvY as x0 to the next solve

    • ”sparse_preconditioner_type” : str; default = “ilu”. One of “ilu”, “ic”/”ichol”/”incomplete_cholesky”, “block_jacobi”, “schwarz”/ “additive_schwarz”, “amg” (requires pyamg)

    • ”sparse_preconditioner_refresh_interval” : int; default = 1 — reuse the cached preconditioner for up to N consecutive solves before rebuilding. set_KV always force-refreshes.

    • ”sparse_preconditioner_block_size” : int — block size for block_jacobi and additive_schwarz partitions

    • ”sparse_preconditioner_schwarz_overlap” : int — overlap layers for additive Schwarz

    • ”sparse_preconditioner_drop_tol” / “sparse_preconditioner_fill_factor” — forwarded to scipy spilu for “ilu”

    • ”sparse_preconditioner_amg_*” — forwarded to pyamg (max_levels, max_coarse, strength, cycle, etc.)

    • ”sparse_preconditioner_shift” / “_growth” / “_attempts” — diagonal shift retry knobs for “ic” / “block_jacobi” / “additive_schwarz” when a local Cholesky encounters a non-PD block

    Cholesky compute-device routing:

    • ”Chol_factor_compute_device” : str; default = “cpu”/”gpu”

    • ”update_Chol_factor_compute_device”: str; default = “cpu”/”gpu”

    • ”Chol_solve_compute_device” : str; default = “cpu”/”gpu”

    • ”Chol_logdet_compute_device” : str; default = “cpu”/”gpu”

    GPU backend:

    • ”GPU_engine” : “torch”/”cupy”; default = first available

    • ”GPU_device” : str; e.g. “cuda:1” or “mps”

    • ”GPU_device_index” : int — explicit CUDA device index

    All other keys will be stored and are available as part of the object instance and in kernel, mean, and noise functions.

x_data#

Datapoint positions.

Type:

np.ndarray or list

y_data#

Datapoint values.

Type:

np.ndarray

noise_variances#

Datapoint observation variances.

Type:

np.ndarray

hyperparameters#

Current hyperparameters in use.

Type:

np.ndarray

K#

Current prior covariance matrix of the GP.

Type:

np.ndarray

m#

Current prior mean vector.

Type:

np.ndarray

V#

the noise covariance matrix or a vector.

Type:

np.ndarray

set_args(new_args)[source]#

Use this function to change the arguments for the GP.

Note

New args do not invalidate cached state (K, m, V, factorizations, KVinvY). If your kernel, prior_mean_function, or noise_function consumes args, the new values will only be picked up the next time those callables are invoked: a call to set_hyperparameters(), update_gp_data() with append=False, a fresh train(), or a posterior call with an explicit hyperparameters argument. For an explicit flush, call set_hyperparameters(self.hyperparameters).

Parameters:

new_args (dict) – The new advanced settings.

set_hyperparameters(hps)[source]#

Function to set hyperparameters.

Parameters:

hps (np.ndarray) – A 1-d numpy array of hyperparameters.

update_gp_data(x_new, y_new, noise_variances_new=None, append=True, rank_n_update=None)[source]#

This function updates the data in the gp object instance. The data will only be overwritten if append=False, otherwise the data will be appended. This is a change from earlier versions. Now, the default is not to overwrite the existing data.

Parameters:
  • x_new (np.ndarray or list) – The point positions. Shape (V x D), where D is the fvgp.GP.index_set_dim. If dealing with non-Euclidean inputs x_new should be a list, not a numpy array.

  • y_new (np.ndarray) – The values of the data points. Shape (V).

  • noise_variances_new (np.ndarray, optional) – A numpy array defining the uncertainties in the data y_data in form of a point-wise variance. Shape(y_new)==Shape(noise_variances_new). Note: if no variances are provided here, the noise_covariance callable will be used; if the callable is not provided the noise variances will be set to abs(np.mean(y_data)) / 100.0. If you provided a noise function at initialization, the noise_variances_new will be ignored.

  • append (bool, optional) – Indication whether to append to or overwrite the existing dataset. Default=True. In the default case, data will be appended.

  • rank_n_update (bool, optional) – Indicates whether the GP marginal likelihood should be rank-n updated or recomputed. The default is rank_n_update=append, meaning if data is only appended, the rank_n_update will be performed.

train(hyperparameter_bounds=None, objective_function=None, objective_function_gradient=None, objective_function_hessian=None, init_hyperparameters=None, method='mcmc', pop_size=20, tolerance=0.0001, max_iter=10000, mcmc_prior=None, mcmc_prop_distrs='normal', mcmc_args={}, local_optimizer='L-BFGS-B', global_optimizer='genetic', constraints=(), dask_client=None, info=False, asynchronous=False)[source]#

This function finds the maximum of the log marginal likelihood and therefore trains the GP (synchronously). This can be done on a remote cluster/computer by specifying the method to be hgdl and providing a dask client. Methods hgdl, mcmc, and adam can also be run asynchronously. The GP prior will automatically be updated with the new hyperparameters after the training or when the update_hyperparameters() method is called.

Parameters:
  • hyperparameter_bounds (np.ndarray, optional) – A 2d numpy array of shape (N x 2), where N is the number of hyperparameters. The default means inferring the bounds from the communicated dataset. This only works for the default kernel.

  • objective_function (callable, optional) – The function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a scalar. This function can be used to train via non-standard user-defined objectives. The default is the negative log marginal likelihood.

  • objective_function_gradient (callable, optional) – The gradient of the function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a vector of len(hps). This function can be used to train via non-standard user-defined objectives. The default is the gradient of the negative log marginal likelihood.

  • objective_function_hessian (callable, optional) – The Hessian of the function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a matrix of shape(len(hps),len(hps)). This function can be used to train via non-standard user-defined objectives. The default is the Hessian of the negative log marginal likelihood.

  • init_hyperparameters (np.ndarray, optional) – Initial hyperparameters used as starting location for all optimizers. The default is a random draw from a uniform distribution within the hyperparameter_bounds.

  • method (str or Callable, optional) – The method used to train the hyperparameters. The options are global, local, hgdl, mcmc, adam, and a callable. The callable gets a fvgp.GP instance and has to return a 1d np.ndarray of hyperparameters. The default is mcmc. If method = mcmc or default, the attribute fvgp.GP.mcmc_info is updated and contains convergence and distribution information. For hgdl, please provide a distributed.Client.

  • pop_size (int, optional) – A number of individuals used for any optimizer with a global component. Default = 20.

  • tolerance (float, optional) – Used as termination criterion for local optimizers. Default = 0.0001.

  • max_iter (int, optional) – Maximum number of iterations for global and local optimizers. Default = 10000.

  • mcmc_prior (callable, optional) – A function that defines the prior probability distribution for the MCMC sampler. The form of the function is f(x, bounds, args) and returns a scalar. The default is a uniform distribution within the hyperparameter_bounds. The args are the same as the args of the GP instance.

  • mcmc_prop_distrs (list of callables, optional) – A list of functions that define the proposal distributions for the MCMC sampler. Each function should have the form f(x, para, obj) and return a vector of the same shape as x. See ProposalDistribution in the documentation for more information.

  • mcmc_args (dict, optional) – A dictionary of additional arguments for the MCMC sampler. The default is an empty dictionary.

  • local_optimizer (str, optional) – Defining the local optimizer. Default = L-BFGS-B, most scipy.optimize.minimize() functions are permissible.

  • global_optimizer (str, optional) – Defining the global optimizer. Only applicable to hgdl. Default = genetic

  • constraints (tuple of object instances, optional) – Equality and inequality constraints for the optimization. If the optimizer is hgdl, see the hgdl documentation. If the optimizer is a scipy optimizer, see the scipy documentation.

  • dask_client (distributed.client.Client, optional) – A Dask Distributed Client instance for asynchronous training. This can also be provided at initialization, but this will be used if not provided.

  • info (bool, optional) – Provides a way how to access information reports during training of the GP. The default is False. If other information is needed please utilize logger as described in the online documentation (separately for HGDL and fvgp if needed).

  • asynchronous (bool, optional) – When True, submit the training job and return immediately with an optimizer proxy object. Supported for method='hgdl', 'mcmc', and 'adam'. Call get_latest() on the returned object to poll intermediate results, or call update_hyperparameters() directly to apply them.

Returns:

optimized hyperparameters (only fyi, gp is already updated)

Return type:

np.ndarray

stop_training(opt_obj)[source]#

Function to stop an asynchronous hgdl training. This leaves the distributed.client.Client alive.

Parameters:

opt_obj (object instance) – Object returned by train(asynchronous=True).

kill_client(opt_obj)[source]#

Function to kill an asynchronous training client. This shuts down the associated distributed.client.Client.

Parameters:

opt_obj (object instance) – Object returned by train(asynchronous=True).

update_hyperparameters(opt_obj)[source]#

Function to update the Gaussian Process hyperparameters if an asynchronous training is running.

Parameters:

opt_obj (object instance) – Object created by train(asynchronous=True)().

Returns:

hyperparameters – The latest hyperparameter vector pulled from the running optimizer.

Return type:

np.ndarray

get_hyperparameters()[source]#

Get the current hyperparameters.

Deprecated since version Use: the hyperparameters property instead.

Returns:

hyperparameters

Return type:

np.ndarray

get_prior_pdf()[source]#

Return the current GP prior covariance matrix and mean vector.

Returns:

prior – Keys: "prior covariance (K)" (ndarray) and "prior mean" (ndarray).

Return type:

dict

log_likelihood(hyperparameters=None)[source]#

Function that computes the marginal log-likelihood

Parameters:

hyperparameters (np.ndarray, optional) – Vector of hyperparameters of shape (N). If not provided, the covariance will not be recomputed.

Returns:

log_likelihood – Log marginal likelihood of the data.

Return type:

float

neg_log_likelihood_gradient(hyperparameters=None, component=0)[source]#

Function that computes the gradient of the marginal log-likelihood.

Parameters:
  • hyperparameters (np.ndarray, optional) – Vector of hyperparameters of shape (N). If not provided, the covariance will not be recomputed.

  • component (int, optional) – In case many GPs are computed in parallel, this specifies which one is considered.

Returns:

gradient – Gradient of the negative log marginal likelihood, shape (N,).

Return type:

np.ndarray

test_log_likelihood_gradient(hyperparameters, epsilon=1e-06)[source]#

Function to test your gradient of the log-likelihood and therefore of the kernel function.

Parameters:

hyperparameters (np.ndarray, optional) – Vector of hyperparameters of shape (N).

Returns:

  • fd_gradient (np.ndarray) – Finite-difference gradient of the log-likelihood, shape (N,).

  • analytical_gradient (np.ndarray) – Analytical gradient of the log-likelihood, shape (N,).

posterior_mean(x_pred, hyperparameters=None, x_out=None)[source]#

This function calculates the posterior mean for a set of input points.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • hyperparameters (np.ndarray, optional) – A numpy array of the correct size depending on the kernel. This is optional in case the posterior mean has to be computed with given hyperparameters, which is, for instance, the case if the posterior mean is a constraint during training. The default is None which means the initialized or trained hyperparameters are used.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

Returns:

Solution points and function values

Return type:

dict

posterior_mean_grad(x_pred, hyperparameters=None, x_out=None, direction=None, component=0)[source]#

This function calculates the gradient of the posterior mean for a set of input points.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • hyperparameters (np.ndarray, optional) – A numpy array of the correct size depending on the kernel. This is optional in case the posterior mean has to be computed with given hyperparameters, which is, for instance, the case if the posterior mean is a constraint during training. The default is None which means the initialized or trained hyperparameters are used.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

  • direction (int, optional) – Direction of derivative, If None (default) the whole gradient will be computed.

  • component (int, optional) – In case y_data is multi-modal and no fvgp.GPOptimizer is used — this means y_data.shape[1] independent GPs are being executed — this indicates which GP’s gradient is evaluated. The default is 0.

Returns:

Solution

Return type:

dict

posterior_covariance(x_pred, x_out=None, variance_only=False, add_noise=False)[source]#

Function to compute the posterior covariance.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

  • variance_only (bool, optional) – If True the computation of the posterior covariance matrix is avoided which can save compute time. In that case the return will only provide the variance at the input points. Default = False. This is only relevant if the inverse of the covariance matrix is stored (linalg_mode == ‘CholInv’ or linalg_mode == ‘Inv’).

  • add_noise (bool, optional) – If True the noise variances will be added to the posterior variances. Default = False.

Returns:

Solution

Return type:

dict

posterior_covariance_grad(x_pred, x_out=None, direction=None)[source]#

Function to compute the gradient of the posterior covariance.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

  • direction (int, optional) – Direction of derivative, If None (default) the whole gradient will be computed.

Returns:

Solution

Return type:

dict

joint_gp_prior(x_pred, x_out=None)[source]#

Function to compute the joint prior over f (at measured locations) and f_pred at x_pred.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

Returns:

Solution

Return type:

dict

joint_gp_prior_grad(x_pred, direction, x_out=None)[source]#

Function to compute the gradient of the data-informed prior.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • direction (int) – Direction of derivative.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

Returns:

Solution

Return type:

dict

gp_entropy(x_pred, x_out=None)[source]#

Function to compute the entropy of the gp prior probability distribution.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces. Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

Returns:

Entropy

Return type:

float

gp_entropy_grad(x_pred, direction, x_out=None)[source]#

Function to compute the gradient of entropy of the prior in a given direction.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • direction (int) – Direction of the derivative.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

Returns:

Entropy gradient in given direction

Return type:

float

gp_kl_div(x_pred, comp_mean, comp_cov, x_out=None)[source]#

Function to compute the kl divergence of a posterior at given points and a given normal distribution.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • comp_mean (np.ndarray) – Comparison mean vector for KL divergence. len(comp_mean) = len(x_pred)

  • comp_cov (np.ndarray) – Comparison covariance matrix for KL divergence. shape(comp_cov) = (len(x_pred),len(x_pred))

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

Returns:

Solution

Return type:

dict

gp_mutual_information(x_pred, x_out=None, add_noise=False)[source]#

Function to calculate the mutual information between the random variables f(x_data) and f(x_pred). The mutual information is always positive, as it is a KL divergence, and is bounded from below by 0. The maxima are expected at the data points. Zero is expected far from the data support.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

  • add_noise (bool, optional) – If True the noise variances will be added to the prior over the prediction points. Default = False.

Returns:

Solution

Return type:

dict

gp_total_correlation(x_pred, x_out=None, add_noise=False)[source]#

Function to calculate the interaction information between the random variables f(x_data) and f(x_pred). This is the mutual information of each f(x_pred) with f(x_data). It is also called the Multi-information. It is best used when several prediction points are supposed to be mutually aware. The total correlation is always positive, as it is a KL divergence, and is bounded from below by 0. The maxima are expected at the data points. Zero is expected far from the data support.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

  • add_noise (bool, optional) – If True the noise variances will be added to the prior over the prediction points. Default = False.

Returns:

Solution – Total correlation between prediction points, as a collective.

Return type:

dict

gp_relative_information_entropy(x_pred, x_out=None, add_noise=False)[source]#

Function to compute the KL divergence and therefore the relative information entropy of the prior distribution defined over predicted function values and the posterior distribution. The value is a reflection of how much information is predicted to be gained through observing a set of data points at x_pred.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

  • add_noise (bool, optional) – If True the noise variances will be added to the posterior covariance. Default = False.

Returns:

Solution – Relative information entropy of prediction points, as a collective.

Return type:

dict

gp_relative_information_entropy_set(x_pred, x_out=None, add_noise=False)[source]#

Function to compute the KL divergence and therefore the relative information entropy of the prior distribution over predicted function values and the posterior distribution. The value is a reflection of how much information is predicted to be gained through observing each data point in x_pred separately, not all at once as in gp_relative_information_entropy().

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

  • add_noise (bool, optional) – If True the noise variances will be added to the posterior covariance. Default = False.

Returns:

Solution – Relative information entropy of prediction points, but not as a collective.

Return type:

dict

posterior_probability(x_pred, comp_mean, comp_cov, x_out=None)[source]#

Function to compute probability of a probabilistic quantity of interest, given the GP posterior at given points.

Parameters:
  • x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.

  • comp_mean (np.ndarray) – A vector of mean values, same length as x_pred.

  • comp_cov (np.ndarray) – Covariance matrix, in R^{len(x_pred) x len(x_pred)}

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).

Returns:

Solution – The probability of a probabilistic quantity of interest, given the GP posterior at a given point.

Return type:

dict

crps(x_test, y_test)[source]#

This function calculates the continuous rank probability score.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape (V x No) in the multi-output case. These are the y data to compare against.

Returns:

CRPS, standard dev. of CRPS

Return type:

(float, float)

rmse(x_test, y_test)[source]#

This function calculates the root mean squared error. Note that in the multi-task setting the user should perform their input point transformation beforehand.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.

Returns:

RMSE

Return type:

float

nrmse(x_test, y_test)[source]#

This function calculates the normalized root mean squared error. Note that in the multi-task setting the user should perform their input point transformation beforehand.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.

Returns:

NRMSE

Return type:

float

nlpd(x_test, y_test)[source]#

This function calculates the Negative log predictive density.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.

Returns:

NLPD

Return type:

float

r2(x_test, y_test)[source]#

This function calculates the R2 prediction score.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.

Returns:

R2

Return type:

float

picp(x_test, y_true, interval=0.95)[source]#

Computes the Prediction Interval Coverage Probability (PICP) for a Gaussian Process posterior.

Parameters:
  • x_test (array-like, shape (N,dim))

  • y_true (array-like, shape (N,)) – True values of the target variable.

  • interval (float, optional) – Confidence interval (default 0.95 for 95% intervals).

Returns:

  • picp (float) – Prediction Interval Coverage Probability

  • lower_bounds (ndarray) – Lower bounds of prediction intervals

  • upper_bounds (ndarray) – Upper bounds of prediction intervals

coverage_curve(x_test, y_test, intervals=None)[source]#

This function computes the coverage curve (calibration curve) of the GP posterior by evaluating picp() across a range of target coverage levels. Plotting target_coverage against measured_coverage reveals whether the posterior is well-calibrated (diagonal), overconfident (below diagonal), or underconfident (above diagonal).

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.

  • intervals (np.ndarray, optional) – A 1d array of target coverage levels in (0, 1). Default is np.linspace(0.05, 0.95, 19).

Return type:

dict with keys target_coverage and measured_coverage, each a list of floats.

mpiw(x_test, interval=0.95)[source]#

This function calculates the Mean Prediction Interval Width (MPIW). It measures the average width of the posterior credible intervals and is best interpreted alongside picp(): a narrow interval with high coverage indicates a well-calibrated, sharp model.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • interval (float, optional) – Credible interval level. Default = 0.95.

Returns:

MPIW

Return type:

float

interval_score(x_test, y_test, interval=0.95)[source]#

This function calculates the Interval Score (also known as the Winkler Score). It penalizes both missed coverage and unnecessarily wide prediction intervals, combining the concerns of picp() and mpiw() into a single scalar. Lower is better.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.

  • interval (float, optional) – Credible interval level. Default = 0.95.

Returns:

Interval Score

Return type:

float

mae(x_test, y_test)[source]#

This function calculates the Mean Absolute Error (MAE). Note that in the multi-task setting the user should perform their input point transformation beforehand.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.

Returns:

MAE

Return type:

float

mape(x_test, y_test)[source]#

This function calculates the Mean Absolute Percentage Error (MAPE). Note that in the multi-task setting the user should perform their input point transformation beforehand. Avoid using this metric when y_test contains values close to zero, as the percentage error becomes unstable.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.

Returns:

MAPE

Return type:

float

msll(x_test, y_test)[source]#

This function calculates the Mean Standardized Log Loss (MSLL). It is the nlpd() of the GP posterior minus the NLPD of a trivial baseline model (a Gaussian with the empirical mean and variance of the training targets). Negative values indicate that the GP predicts better than the baseline; zero means it matches it.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.

Returns:

MSLL

Return type:

float

plot_observed_vs_predicted(x_test, y_test, title=None, ax=None)[source]#

Scatter plot of observed vs. predicted values with a reference diagonal and 1-sigma predictive error bars (noise-inflated posterior variance). Useful for a quick visual check of model fit on a held-out test set.

Parameters:
  • x_test (np.ndarray) – Test input positions, shape (V, D).

  • y_test (np.ndarray) – Observed test values, shape (V,) or (V, No) for multi-output.

  • title (str, optional) – Plot title.

  • ax (matplotlib.axes.Axes, optional) – Existing axes to draw on; if None, a fresh figure + axes is created.

Returns:

If matplotlib is not installed a UserWarning is emitted; otherwise the plot is drawn on the supplied or freshly-created axes.

Return type:

None

static gaussian_1d(x, mu, sigma)[source]#

Evaluates a 1D Gaussian (Normal) distribution at a point x.

Parameters:
  • x (np.ndarray) – The points where you want to evaluate the Gaussian.

  • mu (np.ndarray) – The mean of the Gaussian (default 0.0).

  • sigma (np.ndarray) – The standard deviation of the Gaussians.

Returns:

Evaluations of the Gaussian

Return type:

np.ndarray

static make_2d_x_pred(bx, by, resx=100, resy=100)[source]#

This is a purely convenience-driven function calculating prediction points on a grid. :param bx: A numpy array or list of shape (2) defining lower and upper bounds in x direction. :type bx: iterable :param by: A numpy array of shape (2) defining lower and upper bounds in y direction. :type by: iterable :param resx: Resolution in x direction. Default = 100. :type resx: int, optional :param resy: Resolution in y direction. Default = 100. :type resy: int, optional

Returns:

prediction points

Return type:

np.ndarray

static make_1d_x_pred(b, res=100)[source]#

This is a purely convenience-driven function calculating prediction points on a 1d grid.

Parameters:
  • b (iterable) – A numpy array or list of shape (2) defining lower and upper bounds

  • res (int, optional) – Resolution. Default = 100

Returns:

prediction points

Return type:

np.ndarray

get_gp2Scale_exec_time(time_per_worker_execution, number_of_workers)[source]#

This function calculates the estimated time gp2Scale takes to calculate the covariance matrix as a function of the number of workers and their speed calculating a block.

Parameters:
  • time_per_worker_execution (float) – The time one worker takes to compute a block of the covariance matrix.

  • number_of_workers (int) – The number of dask workers the covariance matrix calculation is distributed over.

Returns:

estimated execution time

Return type:

float

initialize_gp2Scale_dask_client(gp2Scale, dask_client)[source]#

Ensure a Dask client exists when gp2Scale=True, creating a local one if needed.

Parameters:
  • gp2Scale (bool) – Whether the sparse gp2Scale mode is active.

  • dask_client (distributed.Client or None) – An existing Dask client, or None to auto-create a local one.

Returns:

client – A valid Dask client, or None when gp2Scale=False.

Return type:

distributed.Client or None