GP

class fvgp.GP(input_space_dim, x_data, y_data, init_hyperparameters=None, hyperparameter_bounds=None, noise_variances=None, compute_device='cpu', gp_kernel_function=None, gp_kernel_function_grad=None, gp_noise_function=None, gp_noise_function_grad=None, gp_mean_function=None, gp_mean_function_grad=None, sparse_mode=False, gp2Scale=False, gp2Scale_dask_client=None, gp2Scale_batch_size=10000, normalize_y=False, store_inv=True, ram_economy=False, args=None, info=False)

This class provides all the tools for a single-task Gaussian Process (GP). Use fvGP for multi-task GPs. However, the fvGP class inherits all methods from this class. This class allows for full HPC support for training via the HGDL package.

V … number of input points

D … input space dimensionality

N … arbitrary integers (N1, N2,…)

Parameters:
  • input_space_dim (int) – Dimensionality of the input space (D). If the input is non-Euclidean, the input dimensionality will be ignored.

  • x_data (np.ndarray or list of tuples) – The input point positions. Shape (V x D), where D is the input_space_dim. If dealing with non-Euclidean inputs x_data should be an iterable, not a numpy array.

  • y_data (np.ndarray) – The values of the data points. Shape (V,1) or (V).

  • init_hyperparameters (np.ndarray, optional) – Vector of hyperparameters used by the GP initially. This class provides methods to train hyperparameters. The default is a random draw from a uniform distribution within hyperparameter_bounds, with a shape appropriate for the default kernel (D + 1), which is an anisotropic Matern kernel with automatic relevance determination (ARD). If sparse_node or gp2Scale is enabled, the default kernel changes to the anisotropic Wendland kernel.

  • hyperparameter_bounds (np.ndarray, optional) – A 2d numpy array of shape (N x 2), where N is the number of needed hyperparameters. The default is None, in which case the hyperparameter_bounds are estimated from the domain size and the initial y_data. If normalize_y is True or the data changes significantly, the hyperparameters and the bounds should be changed/retrained. Initial hyperparameters and bounds can also be set in the train calls. The default only works for the default kernels.

  • noise_variances (np.ndarray, optional) – An numpy array defining the uncertainties/noise in the data y_data in form of a point-wise variance. Shape (len(y_data), 1) or (len(y_data)). Note: if no noise_variances are provided here, the gp_noise_function callable will be used; if the callable is not provided, the noise variances will be set to abs(np.mean(y_data) / 100.0. If noise covariances are required, also make use of the gp_noise_function.

  • compute_device (str, optional) – One of “cpu” or “gpu”, determines how linear system solves are run. The default is “cpu”. For “gpu”, pytorch has to be installed manually. If gp2Scale is enabled but no kernel is provided, the choice of the compute_device becomes much more important. In that case, the default kernel will be computed on the cpu or the gpu which will significantly change the compute time depending on the compute architecture.

  • gp_kernel_function (Callable, optional) – A symmetric positive semi-definite covariance function (a kernel) that calculates the covariance between data points. It is a function of the form k(x1,x2,hyperparameters, obj). The input x1 is a N1 x D array of positions, x2 is a N2 x D array of positions, the hyperparameters argument is a 1d array of length D+1 for the default kernel and of a different user-defined length for other kernels obj is an fvgp.GP instance. The default is a stationary anisotropic kernel (fvgp.GP.default_kernel) which performs automatic relevance determination (ARD). The output is a covariance matrix, an N1 x N2 numpy array.

  • gp_kernel_function_grad (Callable, optional) – A function that calculates the derivative of the ‘gp_kernel_function’ with respect to the hyperparameters. If provided, it will be used for local training (optimization) and can speed up the calculations. It accepts as input x1 (a N1 x D array of positions), x2 (a N2 x D array of positions), hyperparameters (a 1d array of length D+1 for the default kernel), and a fvgp.GP instance. The default is a finite difference calculation. If ‘ram_economy’ is True, the function’s input is x1, x2, direction (int), hyperparameters (numpy array), and a fvgp.GP instance, and the output is a numpy array of shape (len(hps) x N). If ‘ram economy’ is False,the function’s input is x1, x2, hyperparameters, and a fvgp.GP instance. The output is a numpy array of shape (len(hyperparameters) x N1 x N2). See ‘ram_economy’.

  • gp_mean_function (Callable, optional) – A function that evaluates the prior mean at a set of input position. It accepts as input an array of positions (of shape N1 x D), hyperparameters (a 1d array of length D+1 for the default kernel) and a fvgp.GP instance. The return value is a 1d array of length N1. If None is provided, fvgp.GP._default_mean_function is used.

  • gp_mean_function_grad (Callable, optional) – A function that evaluates the gradient of the ‘gp_mean_function’ at a set of input positions with respect to the hyperparameters. It accepts as input an array of positions (of size N1 x D), hyperparameters (a 1d array of length D+1 for the default kernel) and a fvgp.GP instance. The return value is a 2d array of shape (len(hyperparameters) x N1). If None is provided, either zeros are returned since the default mean function does not depend on hyperparameters, or a finite-difference approximation is used if ‘gp_mean_function’ is provided.

  • gp_noise_function (Callable optional) – The noise function is a callable f(x,hyperparameters,obj) that returns a positive symmetric definite matrix of shape(len(x),len(x)). The input x is a numpy array of shape (N x D). The hyperparameter array is the same that is communicated to mean and kernel functions. The obj is a fvgp.GP instance.

  • gp_noise_function_grad (Callable, optional) – A function that evaluates the gradient of the ‘gp_noise_function’ at an input position with respect to the hyperparameters. It accepts as input an array of positions (of size N x D), hyperparameters (a 1d array of length D+1 for the default kernel) and a fvgp.GP instance. The return value is a 3-D array of shape (len(hyperparameters) x N x N). If None is provided, either zeros are returned since the default noise function does not depend on hyperparameters. If ‘gp_noise_function’ is provided but no gradient function, a finite-difference approximation will be used. The same rules regarding ram economy as for the kernel definition apply here.

  • normalize_y (bool, optional) – If True, the data values ‘y_data’ will be normalized to max(y_data) = 1, min(y_data) = 0. The default is False. Variances will be updated accordingly.

  • sparse_mode (bool, optional) – When sparse_mode is enabled, the algorithm will use a user-defined kernel function or, if that’s not provided, an anisotropic Wendland kernel and check for sparsity in the prior covariance. If sparsity is present, sparse operations will be used to speed up computations. Caution: the covariance is still stored at first in a dense format. For more extreme scaling, check out the gp2Scale option.

  • gp2Scale (bool, optional) – Turns on gp2Scale. This will distribute the covariance computations across multiple workers. This is an advanced feature for HPC GPs up to 10 million data points. If gp2Scale is used, the default kernel is an anisotropic Wendland kernel which is compactly supported. The noise function will have to return a scipy.sparse matrix instead of a numpy array. There are a few more things to consider (read on); this is an advanced option. If no kernel is provided, the compute_device option should be revisited. The kernel will use the specified device to compute covariances. The default is False.

  • gp2Scale_dask_client (dask.distributed.Client, optional) – A dask client for gp2Scale to distribute covariance computations over. Has to contain at least 3 workers. On HPC architecture, this client is provided by the job script. Please have a look at the examples. A local client is used as default.

  • gp2Scale_batch_size (int, optional) – Matrix batch size for distributed computing in gp2Scale. The default is 10000.

  • store_inv (bool, optional) – If True, the algorithm calculates and stores the inverse of the covariance matrix after each training or update of the dataset or hyperparameters, which makes computing the posterior covariance faster. For larger problems (>2000 data points), the use of inversion should be avoided due to computational instability and costs. The default is True. Note, the training will always use Cholesky or LU decomposition instead of the inverse for stability reasons. Storing the inverse is a good option when the dataset is not too large and the posterior covariance is heavily used.

  • ram_economy (bool, optional) – Only of interest if the gradient and/or Hessian of the marginal log_likelihood is/are used for the training. If True, components of the derivative of the marginal log-likelihood are calculated subsequently, leading to a slow-down but much less RAM usage. If the derivative of the kernel (or noise function) with respect to the hyperparameters (gp_kernel_function_grad) is going to be provided, it has to be tailored: for ram_economy=True it should be of the form f(x1[, x2], direction, hyperparameters, obj) and return a 2d numpy array of shape len(x1) x len(x2). If ram_economy=False, the function should be of the form f(x1[, x2,] hyperparameters, obj) and return a numpy array of shape H x len(x1) x len(x2), where H is the number of hyperparameters. CAUTION: This array will be stored and is very large.

  • args (any, optional) – args will be a class attribute and therefore available to kernel, noise and prior mean functions.

  • info (bool, optional) – Provides a way how to see the progress of gp2Scale, Default is False

x_data

Datapoint positions

Type:

np.ndarray

y_data

Datapoint values

Type:

np.ndarray

noise_variances

Datapoint observation (co)variances

Type:

np.ndarray

hyperparameters

Current hyperparameters in use.

Type:

np.ndarray

K

Current prior covariance matrix of the GP

Type:

np.ndarray

KVinv

If enabled, the inverse of the prior covariance + nose matrix V inv(K+V)

Type:

np.ndarray

KVlogdet

logdet(K+V)

Type:

float

V

the noise covariance matrix

Type:

np.ndarray

update_gp_data(x_data, y_data, noise_variances=None)

This function updates the data in the gp object instance. The data will NOT be appended but overwritten! Please provide the full updated data set.

Parameters:
  • x_data (np.ndarray) – The point positions. Shape (V x D), where D is the input_space_dim.

  • y_data (np.ndarray) – The values of the data points. Shape (V,1) or (V).

  • noise_variances (np.ndarray, optional) – An numpy array defining the uncertainties in the data y_data in form of a point-wise variance. Shape (len(y_data), 1) or (len(y_data)). Note: if no variances are provided here, the noise_covariance callable will be used; if the callable is not provided the noise variances will be set to ‘abs(np.mean(y_data)) / 100.0’. If you provided a noise function, the noise_variances will be ignored.

train(hyperparameter_bounds=None, init_hyperparameters=None, method='global', pop_size=20, tolerance=0.0001, max_iter=120, local_optimizer='L-BFGS-B', global_optimizer='genetic', constraints=(), dask_client=None)

This function finds the maximum of the log marginal likelihood and therefore trains the GP (synchronously). This can be done on a remote cluster/computer by specifying the method to be ‘hgdl’ and providing a dask client. However, in that case fvgp.GP.train_async() is preferred. The GP prior will automatically be updated with the new hyperparameters after the training.

Parameters:
  • hyperparameter_bounds (np.ndarray, optional) – A numpy array of shape (D x 2), defining the bounds for the optimization. The default is an array of bounds of the length of the initial hyperparameters with all bounds defined practically as [0.00001, inf]. The initial hyperparameters are either defined by the user at initialization, or in this function call, or are defined as np.ones((input_space_dim + 1)). This choice is only recommended in very basic scenarios and can lead to suboptimal results. It is better to provide hyperparameter bounds.

  • init_hyperparameters (np.ndarray, optional) – Initial hyperparameters used as starting location for all optimizers with local component. The default is a random draw from a uniform distribution within the bounds.

  • method (str or Callable, optional) – The method used to train the hyperparameters. The options are ‘global’, ‘local’, ‘hgdl’, ‘mcmc’, and a callable. The callable gets a gp.GP instance and has to return a 1d np.ndarray of hyperparameters. The default is ‘global’ (scipy’s differential evolution). If method = “mcmc”, the attribute fvgp.GP.mcmc_info is updated and contains convergence and distribution information.

  • pop_size (int, optional) – A number of individuals used for any optimizer with a global component. Default = 20.

  • tolerance (float, optional) – Used as termination criterion for local optimizers. Default = 0.0001.

  • max_iter (int, optional) – Maximum number of iterations for global and local optimizers. Default = 120.

  • local_optimizer (str, optional) – Defining the local optimizer. Default = “L-BFGS-B”, most scipy.opimize.minimize functions are permissible.

  • global_optimizer (str, optional) – Defining the global optimizer. Only applicable to method = hgdl. Default = genetic

  • constraints (tuple of object instances, optional) – Equality and inequality constraints for the optimization. If the optimizer is ‘hgdl’ see ‘hgdl.readthedocs.io’. If the optimizer is a scipy optimizer, see the scipy documentation.

  • dask_client (distributed.client.Client, optional) – A Dask Distributed Client instance for distributed training if HGDL is used. If None is provided, a new dask.distributed.Client instance is constructed.

train_async(hyperparameter_bounds=None, init_hyperparameters=None, max_iter=10000, local_optimizer='L-BFGS-B', global_optimizer='genetic', constraints=(), dask_client=None)

This function asynchronously finds the maximum of the log marginal likelihood and therefore trains the GP. This can be done on a remote cluster/computer by providing a dask client. This function submits the training and returns an object which can be given to ‘fvgp.GP.update_hyperparameters()’, which will automatically update the GP prior with the new hyperparameters.

Parameters:
  • hyperparameter_bounds (np.ndarray, optional) – A numpy array of shape (D x 2), defining the bounds for the optimization. The default is an array of bounds for the default kernel D = input_space_dim + 1 with all bounds defined practically as [0.00001, inf]. This choice is only recommended in very basic scenarios.

  • init_hyperparameters (np.ndarray, optional) – Initial hyperparameters used as starting location for all optimizers with local component. The default is a random draw from a uniform distribution within the bounds.

  • max_iter (int, optional) – Maximum number of epochs for HGDL. Default = 10000.

  • local_optimizer (str, optional) – Defining the local optimizer. Default = “L-BFGS-B”, most scipy.opimize.minimize functions are permissible.

  • global_optimizer (str, optional) – Defining the global optimizer. Only applicable to method = hgdl. Default = genetic

  • constraints (tuple of hgdl.NonLinearConstraint instances, optional) – Equality and inequality constraints for the optimization. See ‘hgdl.readthedocs.io’

  • dask_client (distributed.client.Client, optional) – A Dask Distributed Client instance for distributed training if HGDL is used. If None is provided, a new dask.distributed.Client instance is constructed.

Returns:

  • Optimization object that can be given to fvgp.GP.update_hyperparameters()

  • to update the prior GP (object instance)

stop_training(opt_obj)

This function stops the training if HGDL is used. It leaves the dask client alive.

Parameters:

opt_obj (object) – An object returned form the fvgp.GP.train_async() function.

kill_training(opt_obj)

This function stops the training if HGDL is used, and kills the dask client.

Parameters:

opt_obj (object) – An object returned form the fvgp.GP.train_async() function.

update_hyperparameters(opt_obj)

This function asynchronously finds the maximum of the marginal log_likelihood and therefore trains the GP. This can be done on a remote cluster/computer by providing a dask client. This function just submits the training and returns an object which can be given to fvgp.GP.update_hyperparameters(), which will automatically update the GP prior with the new hyperparameters.

Parameters:

object (HGDL class instance) – HGDL class instance returned by fvgp.GP.train_async()

Returns:

The current hyperparameters

Return type:

np.ndarray

set_hyperparameters(hps)

Function to set hyperparameters.

Parameters:

hpsnp.ndarray

A 1-d numpy array of hyperparameters.

get_hyperparameters()

Function to get the current hyperparameters.

Parameters: None

Return:

hyperparameters : np.ndarray

get_prior_pdf()

Function to get the current prior covariance matrix.

Parameters:

None

Return:

A dictionary containing information about the GP prior distribution : dict

log_likelihood(hyperparameters=None)

Function that computes the marginal log-likelihood

Parameters:

hyperparameters (np.ndarray, optional) – Vector of hyperparameters of shape (N). If not provided, the covariance will not be recomputed.

Returns:

log marginal likelihood of the data

Return type:

float

neg_log_likelihood(hyperparameters=None)

Function that computes the marginal log-likelihood

Parameters:

hyperparameters (np.ndarray, optional) – Vector of hyperparameters of shape (N) If not provided, the covariance will not be recomputed.

Returns:

negative log marginal likelihood of the data

Return type:

float

neg_log_likelihood_gradient(hyperparameters=None)

Function that computes the gradient of the marginal log-likelihood.

Parameters:

hyperparameters (np.ndarray, optional) – Vector of hyperparameters of shape (N). If not provided, the covariance will not be recomputed.

Returns:

Gradient of the negative log marginal likelihood

Return type:

np.ndarray

neg_log_likelihood_hessian(hyperparameters=None)

Function that computes the Hessian of the marginal log-likelihood. It does so by a first-order approximation of the exact gradient.

Parameters:

hyperparameters (np.ndarray) – Vector of hyperparameters of shape (N). If not provided, the covariance will not be recomputed.

Returns:

Hessian of the negative log marginal likelihood

Return type:

np.ndarray

posterior_mean(x_pred, hyperparameters=None, x_out=None)

This function calculates the posterior mean for a set of input points.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • hyperparameters (np.ndarray, optional) – A numpy array of the correct size depending on the kernel. This is optional in case the posterior mean has to be computed with given hyperparameters, which is, for instance, the case if the posterior mean is a constraint during training. The default is None which means the initialized or trained hyperparameters are used.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Solution points and function values

Return type:

dict

posterior_mean_grad(x_pred, hyperparameters=None, x_out=None, direction=None)

This function calculates the gradient of the posterior mean for a set of input points.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • hyperparameters (np.ndarray, optional) – A numpy array of the correct size depending on the kernel. This is optional in case the posterior mean has to be computed with given hyperparameters, which is, for instance, the case if the posterior mean is a constraint during training. The default is None which means the initialized or trained hyperparameters are used.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

  • direction (int, optional) – Direction of derivative, If None (default) the whole gradient will be computed.

Returns:

Solution

Return type:

dict

posterior_covariance(x_pred, x_out=None, variance_only=False, add_noise=False)

Function to compute the posterior covariance.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

  • variance_only (bool, optional) – If True the computation of the posterior covariance matrix is avoided which can save compute time. In that case the return will only provide the variance at the input points. Default = False.

  • add_noise (bool, optional) – If True the noise variances will be added to the posterior variances. Default = False.

Returns:

Solution

Return type:

dict

posterior_covariance_grad(x_pred, x_out=None, direction=None)

Function to compute the gradient of the posterior covariance.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

  • direction (int, optional) – Direction of derivative, If None (default) the whole gradient will be computed.

Returns:

Solution

Return type:

dict

joint_gp_prior(x_pred, x_out=None)

Function to compute the joint prior over f (at measured locations) and f_pred at x_pred.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Solution

Return type:

dict

joint_gp_prior_grad(x_pred, direction, x_out=None)

Function to compute the gradient of the data-informed prior.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • direction (int) – Direction of derivative.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Solution

Return type:

dict

entropy(S)

Function computing the entropy of a normal distribution res = entropy(S); S is a 2d np.ndarray array, a covariance matrix which is non-singular.

Parameters:

Snp.ndarray

A covariance matrix.

Return:

Entropy : float

gp_entropy(x_pred, x_out=None)

Function to compute the entropy of the gp prior probability distribution.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Entropy

Return type:

float

gp_entropy_grad(x_pred, direction, x_out=None)

Function to compute the gradient of entropy of the prior in a given direction.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • direction (int) – Direction of derivative.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Entropy gradient in given direction

Return type:

float

kl_div(mu1, mu2, S1, S2)

Function to compute the KL divergence between two Gaussian distributions.

Parameters:
  • mu1 (np.ndarray) – Mean vector of distribution 1.

  • mu1 – Mean vector of distribution 2.

  • S1 (np.ndarray) – Covariance matrix of distribution 1.

  • S2 (np.ndarray) – Covariance matrix of distribution 2.

Returns:

KL divergence

Return type:

float

gp_kl_div(x_pred, comp_mean, comp_cov, x_out=None)

Function to compute the kl divergence of a posterior at given points and a given normal distribution.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • comp_mean (np.array) – Comparison mean vector for KL divergence. len(comp_mean) = len(x_pred)

  • comp_cov (np.array) – Comparison covariance matrix for KL divergence. shape(comp_cov) = (len(x_pred),len(x_pred))

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Solution

Return type:

dict

gp_kl_div_grad(x_pred, comp_mean, comp_cov, direction, x_out=None)

Function to compute the gradient of the kl divergence of a posterior at given points.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • comp_mean (np.ndarray) – Comparison mean vector for KL divergence. len(comp_mean) = len(x_pred)

  • comp_cov (np.ndarray) – Comparison covariance matrix for KL divergence. shape(comp_cov) = (len(x_pred),len(x_pred))

  • direction (int) – The direction in which the gradient will be computed.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Solution

Return type:

dict

mutual_information(joint, m1, m2)

Function to calculate the mutual information between two normal distributions, which is equivalent to the KL divergence(joint, marginal1 * marginal1).

Parameters:
  • joint (np.ndarray) – The joint covariance matrix.

  • m1 (np.ndarray) – The first marginal distribution

  • m2 (np.ndarray) – The second marginal distribution

Returns:

Mutual information

Return type:

float

gp_mutual_information(x_pred, x_out=None)

Function to calculate the mutual information between the random variables f(x_data) and f(x_pred). The mutual information is always positive, as it is a KL divergence, and is bounded from below by 0. The maxima are expected at the data points. Zero is expected far from the data support. :param x_pred: A numpy array of shape (V x D), interpreted as an array of input point positions. :type x_pred: np.ndarray :param x_out: Output coordinates in case of multi-task GP use; a numpy array of size (N x L),

where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Solution

Return type:

dict

gp_total_correlation(x_pred, x_out=None)

Function to calculate the interaction information between the random variables f(x_data) and f(x_pred). This is the mutual information of each f(x_pred) with f(x_data). It is also called the Multiinformation. It is best used when several prediction points are supposed to be mutually aware. The total correlation is always positive, as it is a KL divergence, and is bounded from below by 0. The maxima are expected at the data points. Zero is expected far from the data support.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Solution – Total correlation between prediction points, as a collective.

Return type:

dict

gp_relative_information_entropy(x_pred, x_out=None)

Function to compute the KL divergence and therefore the relative information entropy of the prior distribution over predicted function values and the posterior distribution. The value is a reflection of how much information is predicted to be gained through observing a set of data points at x_pred.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Solution – Relative information entropy of prediction points, as a collective.

Return type:

dict

gp_relative_information_entropy_set(x_pred, x_out=None)

Function to compute the KL divergence and therefore the relative information entropy of the prior distribution over predicted function values and the posterior distribution. The value is a reflection of how much information is predicted to be gained through observing each data point in x_pred separately, not all at once as in gp_relative_information_entrop.

Parameters:
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Solution – Relative information entropy of prediction points, but not as a collective.

Return type:

dict

posterior_probability(x_pred, comp_mean, comp_cov, x_out=None)

Function to compute probability of a probabilistic quantity of interest, given the GP posterior at a given point.

Parameters:
  • x_pred (1d or 2d numpy array of points, note, these are elements of the) – index set which results from a cartesian product of input and output space

  • comp_mean (a vector of mean values, same length as x_pred) –

  • comp_cov (covarianve matrix, in R^{len(x_pred)xlen(x_pred)}) –

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Solution – The probability of a probabilistic quantity of interest, given the GP posterior at a given point.

Return type:

dict

posterior_probability_grad(x_pred, comp_mean, comp_cov, direction, x_out=None)

Function to compute the gradient of the probability of a probabilistic quantity of interest, given the GP posterior at a given point.

Parameters:
  • x_pred (1d or 2d numpy array of points, note, these are elements of the) – index set which results from a cartesian product of input and output space

  • comp_mean (a vector of mean values, same length as x_pred) –

  • comp_cov (covarianve matrix, in R^{len(x_pred)xlen(x_pred)}) –

  • direction (int) – The direction to compute the gradient in.

  • x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.

Returns:

Solution – The gradient of the probability of a probabilistic quantity of interest, given the GP posterior at a given point.

Return type:

dict

squared_exponential_kernel(distance, length)

Function for the squared exponential kernel. kernel = np.exp(-(distance ** 2) / (2.0 * (length ** 2)))

Parameters:
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • length (scalar) – The length scale hyperparameters.

Returns:

Kernel output

Return type:

same as distance parameter.

squared_exponential_kernel_robust(distance, phi)

Function for the squared exponential kernel (robust version) kernel = np.exp(-(distance ** 2) * (phi ** 2))

Parameters:
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • phi (scalar) – The length scale hyperparameters.

Returns:

Kernel output

Return type:

same as distance parameter.

exponential_kernel(distance, length)

Function for the exponential kernel. kernel = np.exp(-(distance) / (length))

Parameters:
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • length (scalar) – The length scale hyperparameters.

Returns:

Kernel output

Return type:

same as distance parameter.

exponential_kernel_robust(distance, phi)

Function for the exponential kernel (robust version) kernel = np.exp(-(distance) * (phi**2))

Parameters:
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • phi (scalar) – The length scale hyperparameters.

Returns:

Kernel output

Return type:

same as distance parameter.

matern_kernel_diff1(distance, length)

Function for the Matern kernel, order of differentiability = 1. kernel = (1.0 + ((np.sqrt(3.0) * distance) / (length))) * np.exp(

-(np.sqrt(3.0) * distance) / length

Parameters:
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • length (scalar) – The length scale hyperparameters.

Returns:

Kernel output

Return type:

same as distance parameter.

matern_kernel_diff1_robust(distance, phi)

Function for the Matern kernel, order of differentiability = 1, robust version. kernel = (1.0 + ((np.sqrt(3.0) * distance) * (phi**2))) * np.exp(

-(np.sqrt(3.0) * distance) * (phi**2))

Parameters:
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • phi (scalar) – The length scale hyperparameters.

Returns:

Kernel output

Return type:

same as distance parameter.

matern_kernel_diff2(distance, length)

Function for the Matern kernel, order of differentiability = 2. kernel = (

1.0 + ((np.sqrt(5.0) * distance) / (length)) + ((5.0 * distance ** 2) / (3.0 * length ** 2))

) * np.exp(-(np.sqrt(5.0) * distance) / length)

Parameters:
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • length (scalar) – The length scale hyperparameters.

Returns:

Kernel output

Return type:

same as distance parameter.

matern_kernel_diff2_robust(distance, phi)

Function for the Matern kernel, order of differentiability = 2, robust version. kernel = (

1.0 + ((np.sqrt(5.0) * distance) * (phi**2)) + ((5.0 * distance ** 2) * (3.0 * phi ** 4))

) * np.exp(-(np.sqrt(5.0) * distance) * (phi**2))

Parameters:
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • phi (scalar) – The length scale hyperparameters.

Returns:

Kernel output

Return type:

same as distance parameter.

sparse_kernel(distance, radius)

Function for a compactly supported kernel.

Parameters:
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • radius (scalar) – Radius of support.

Returns:

Kernel output

Return type:

same as distance parameter.

periodic_kernel(distance, length, p)

Function for a periodic kernel. kernel = np.exp(-(2.0/length**2)*(np.sin(np.pi*distance/p)**2))

Parameters:
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • length (scalar) – Length scale.

  • p (scalar) – Period.

Returns:

Kernel output

Return type:

same as distance parameter.

linear_kernel(x1, x2, hp1, hp2, hp3)

Function for a linear kernel. kernel = hp1 + (hp2*(x1-hp3)*(x2-hp3))

Parameters:
  • x1 (float) – Point 1.

  • x2 (float) – Point 2.

  • hp1 (float) – Hyperparameter.

  • hp2 (float) – Hyperparameter.

  • hp3 (float) – Hyperparameter.

Returns:

Kernel output

Return type:

same as distance parameter.

dot_product_kernel(x1, x2, hp, matrix)

Function for a dot-product kernel. kernel = hp + x1.T @ matrix @ x2

Parameters:
  • x1 (np.ndarray) – Point 1.

  • x2 (np.ndarray) – Point 2.

  • hp (float) – Offset hyperparameter.

  • matrix (np.ndarray) – PSD matrix defining the inner product.

Returns:

Kernel output

Return type:

same as distance parameter.

polynomial_kernel(x1, x2, p)

Function for a polynomial kernel. kernel = (1.0+x1.T @ x2)**p

Parameters:
  • x1 (np.ndarray) – Point 1.

  • x2 (np.ndarray) – Point 2.

  • p (float) – Power hyperparameter.

Returns:

Kernel output

Return type:

same as distance parameter.

default_kernel(x1, x2, hyperparameters, obj)

Function for the default kernel, a Matern kernel of first-order differentiability.

Parameters:
  • x1 (np.ndarray) – Numpy array of shape (U x D).

  • x2 (np.ndarray) – Numpy array of shape (V x D).

  • hyperparameters (np.ndarray) – Array of hyperparameters. For this kernel we need D + 1 hyperparameters.

  • obj (object instance) – GP object instance.

Returns:

Covariance matrix

Return type:

np.ndarray

wendland_anisotropic(x1, x2, hyperparameters, obj)

Function for the Wendland kernel, default kernel if ‘sparse_mode’ is enabled. The Wendland kernel is compactly supported, leading to sparse covariance matrices.

Parameters:
  • x1 (np.ndarray) – Numpy array of shape (U x D).

  • x2 (np.ndarray) – Numpy array of shape (V x D).

  • hyperparameters (np.ndarray) – Array of hyperparameters. For this kernel we need D + 1 hyperparameters.

  • obj (object instance) – GP object instance.

Returns:

Covariance matrix

Return type:

np.ndarray

non_stat_kernel(x1, x2, x0, w, l)

Non-stationary kernel. kernel = g(x1) g(x2)

Parameters:
  • x1 (np.ndarray) – Numpy array of shape (U x D).

  • x2 (np.ndarray) – Numpy array of shape (V x D).

  • x0 (np.ndarray) – Numpy array of the basis function locations.

  • w (np.ndarray) – 1d np.ndarray of weights. len(w) = len(x0).

  • l (float) – Width measure of the basis functions.

Returns:

Covariance matrix

Return type:

np.ndarray

non_stat_kernel_gradient(x1, x2, x0, w, l)

Non-stationary kernel gradient. kernel = g(x1) g(x2)

Parameters:
  • x1 (np.ndarray) – Numpy array of shape (U x D).

  • x2 (np.ndarray) – Numpy array of shape (V x D).

  • x0 (np.ndarray) – Numpy array of the basis function locations.

  • w (np.ndarray) – 1d np.ndarray of weights. len(w) = len(x0).

  • l (float) – Width measure of the basis functions.

Returns:

Covariance matrix

Return type:

np.ndarray

crps(x_test, y_test)

This function calculates the continuous rank probability score. Note that in the multi-task setting the user should perform their input point transformation beforehand.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape (V x 1). These are the y data to compare against.

Returns:

CRPS

Return type:

float

rmse(x_test, y_test)

This function calculates the root mean squared error. Note that in the multi-task setting the user should perform their input point transformation beforehand.

Parameters:
  • x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • y_test (np.ndarray) – A numpy array of shape (V x 1). These are the y data to compare against

Returns:

RMSE

Return type:

float

make_2d_x_pred(bx, by, resx=100, resy=100)

This is a purely convenience-driven function calculating prediction points on a grid.

Parameters:
  • bx (np.ndarray) – A numpy array of shape (2) defining lower and upper bounds in x direction.

  • by (np.ndarray) – A numpy array of shape (2) defining lower and upper bounds in y direction.

  • resx (int, optional) – Resolution in x direction. Default = 100.

  • resy (int, optional) – Resolution in y direction. Default = 100.

Returns:

prediction points

Return type:

np.ndarray

make_1d_x_pred(b, res=100)

This is a purely convenience-driven function calculating prediction points on a 1d grid.

Parameters:
  • b (np.ndarray) – A numpy array of shape (2) defineing lower and upper bounds

  • res (int, optional) – Resolution. Default = 100

Returns:

prediction points

Return type:

np.ndarray