GP

class fvgp.gp.GP(input_space_dim, x_data, y_data, init_hyperparameters, variances=None, compute_device='cpu', gp_kernel_function=None, gp_kernel_function_grad=None, gp_mean_function=None, gp_mean_function_grad=None, normalize_y=False, use_inv=False, ram_economy=True, args=None)

This class provides all the tools for a single-task Gaussian Process (GP). Use fvGP for multi task GPs. However, the fvGP class inherits all methods from this class. This class allows for full HPC support for training.

Parameters
  • input_space_dim (int) – Dimensionality of the input space.

  • x_data (np.ndarray) – The point positions. Shape (V x D), where D is the input_space_dim.

  • y_data (np.ndarray) – The values of the data points. Shape (V,1) or (V).

  • init_hyperparameters (np.ndarray) – Vector of hyperparameters used by the GP initially. The class provides methods to train hyperparameters.

  • variances (np.ndarray, optional) – An numpy array defining the uncertainties in the data y_data. Shape (V x 1) or (V). Note: if no variances are provided they will be set to abs(np.mean(y_data) / 100.0.

  • compute_device (str, optional) – One of “cpu” or “gpu”, determines how linear system solves are run. The default is “cpu”.

  • gp_kernel_function (Callable, optional) – A function that calculates the covariance between data points. It accepts as input x1 (a V x D array of positions), x2 (a U x D array of positions), hyperparameters (a 1-D array of length D+1 for the default kernel), and a gpcam.gp_optimizer.GPOptimizer instance. The default is a stationary anisotropic kernel (fvgp.gp.GP.default_kernel).

  • gp_kernel_function_grad (Callable, optional) – A function that calculates the derivative of the covariance between datapoints with respect to the hyperparameters. If provided, it will be used for local training and can speed up the calculations. It accepts as input x1 (a V x D array of positions), x2 (a U x D array of positions), hyperparameters (a 1-D array of length D+1 for the default kernel), and a gpcam.gp_optimizer.GPOptimizer instance. The default is a finite difference calculation. If ‘ram_economy’ is True, the function’s input is x1, x2, direction (int), hyperparameters (numpy array), and a gpcam.gp_optimizer.GPOptimizer instance, and the output is a numpy array of shape (V x U). If ‘ram economy’ is False,the function’s input is x1, x2, hyperparameters, and a gpcam.gp_optimizer.GPOptimizer instance, and the output is a numpy array of shape (len(hyperparameters) x U x V). See ‘ram_economy’.

  • gp_mean_function (Callable, optional) – A function that evaluates the prior mean at an input position. It accepts as input an array of positions (of size V x D), hyperparameters (a 1-D array of length D+1 for the default kernel) and a gpcam.gp_optimizer.GPOptimizer instance. The return value is a 1-D array of length V. If None is provided, fvgp.gp.GP.default_mean_function is used.

  • gp_mean_function_grad (Callable, optional) – A function that evaluates the gradient of the prior mean at an input position with respect to the hyperparameters. It accepts as input an array of positions (of size V x D), hyperparameters (a 1-D array of length D+1 for the default kernel) and a gpcam.gp_optimizer.GPOptimizer instance. The return value is a 2-D array of shape (len(hyperparameters) x V). If None is provided, a finite difference scheme is used.

  • normalize_y (bool, optional) – If True, the data point values will be normalized to max(initial values) = 1. The default is False.

  • use_inv (bool, optional) – If True, the algorithm calculates and stores the inverse of the covariance matrix after each training or update of the dataset, which makes computing the posterior covariance faster. For larger problems (>2000 data points), the use of inversion should be avoided due to computational instability. The default is False. Note, the training will always use a linear solve instead of the inverse for stability reasons.

  • ram_economy (bool, optional) – Only of interest if the gradient and/or Hessian of the marginal log_likelihood is/are used for the training. If True, components of the derivative of the marginal log-likelihood are calculated subsequently, leading to a slow-down but much less RAM usage. If the derivative of the kernel with respect to the hyperparameters (gp_kernel_function_grad) is going to be provided, it has to be tailored: for ram_economy=True it should be of the form f(points1, points2, direction, hyperparameters) and return a 2-D numpy array of shape V x V. If ram_economy=False, the function should be of the form f(points1, points2, hyperparameters) and return a numpy array of shape H x V x V, where H is the number of hyperparameters. V is the number of points. CAUTION: This array will be stored and is very large.

  • args (any, optional) – args will be a class attribute and therefore available to kernel and and prior mean functions.

x_data

Datapoint positions

Type

np.ndarray

y_data

Datapoint values

Type

np.ndarray

variances

Datapoint observation variances.

Type

np.ndarray

hyperparameters

Current hyperparameters in use.

Type

np.ndarray

update_gp_data(x_data, y_data, variances=None)

This function updates the data in the gp object instance. The data will NOT be appended but overwritten! Please provide the full updated data set.

Parameters
  • x_data (np.ndarray) – The point positions. Shape (V x D), where D is the input_space_dim.

  • y_data (np.ndarray) – The values of the data points. Shape (V,1) or (V).

  • variances (np.ndarray, optional) – An numpy array defining the uncertainties in the data y_data. Shape (V x 1) or (V). Note: if no variances are provided they will be set to abs(np.mean(y_data) / 100.0.

train(hyperparameter_bounds, init_hyperparameters=None, method='global', pop_size=20, tolerance=0.0001, max_iter=120, local_optimizer='L-BFGS-B', global_optimizer='genetic', constraints=(), deflation_radius=None, dask_client=None)

This function finds the maximum of the marginal log_likelihood and therefore trains the GP (synchronously). This can be done on a remote cluster/computer by specifying the method to be be ‘hgdl’ and providing a dask client. The GP prior will automatically be updated with the new hyperparameters.

Parameters
  • hyperparameter_bounds (np.ndarray) – A numpy array of shape (D x 2), defining the bounds for the optimization.

  • init_hyperparameters (np.ndarray, optional) – Initial hyperparameters used as starting location for all optimizers with local component. The default is reusing the initial hyperparameters given at initialization

  • method (str or Callable, optional) – The method used to train the hyperparameters. The optiona are ‘global’, ‘local’, ‘hgdl’, and ‘mcmc’. The default is ‘global’ (meaning scipy’s differential evolution). If a callable is provided, it should accept as input a fvgp.gp object instance and return a np.ndarray of hyperparameters, shape = (V).

  • pop_size (int, optional) – A number of individuals used for any optimizer with a global component. Default = 20.

  • tolerance (float, optional) – Used as termination criterion for local optimizers. Default = 0.0001.

  • max_iter (int, optional) – Maximum number of iterations for global and local optimizers. Default = 120.

  • local_optimizer (str, optional) – Defining the local optimizer. Default = “L-BFGS-B”, most scipy.opimize.minimize functions are permissible.

  • global_optimizer (str, optional) – Defining the global optimizer. Only applicable to method = hgdl. Default = genetic

  • constraints (tuple of object instances, optional) – Equality and inequality constraints for the optimization. If the optimizer is ``hgdl’’ see ``hgdl.readthedocs.io’’. If the optimizer is a scipy optimizer, see scipy documentation.

  • deflation_radius (float, optional) – Deflation radius for the HGDL optimization. Please refer to the HGDL package documentation for more info. Default = None, meaning HGDL will pick the radius.

  • dask_client (distributed.client.Client, optional) – A Dask Distributed Client instance for distributed training if HGDL is used. If None is provided, a new dask.distributed.Client instance is constructed.

train_async(hyperparameter_bounds, init_hyperparameters=None, max_iter=10000, local_optimizer='L-BFGS-B', global_optimizer='genetic', constraints=(), deflation_radius=None, dask_client=None)

This function asynchronously finds the maximum of the marginal log_likelihood and therefore trains the GP. This can be done on a remote cluster/computer by providing a dask client. This function just submits the training and returns an object which can be given to fvgp.gp.update_hyperparameters, which will automatically update the GP prior with the new hyperparameters.

Parameters
  • hyperparameter_bounds (np.ndarray) – A numpy array of shape (D x 2), defining the bounds for the optimization.

  • init_hyperparameters (np.ndarray, optional) – Initial hyperparameters used as starting location for all optimizers with local component. The default is reusing the initial hyperparameters given at initialization

  • max_iter (int, optional) – Maximum number of epochs for HGDL. Default = 10000.

  • local_optimizer (str, optional) – Defining the local optimizer. Default = “L-BFGS-B”, most scipy.opimize.minimize functions are permissible.

  • global_optimizer (str, optional) – Defining the global optimizer. Only applicable to method = hgdl. Default = genetic

  • constraints (tuple of hgdl.NonLinearConstraint instances, optional) – Equality and inequality constraints for the optimization. See ``hgdl.readthedocs.io’’

  • deflation_radius (float, optional) – Deflation radius for the HGDL optimization. Please refer to the HGDL package documentation for more info. Default = None, meaning HGDL will pick the radius.

  • dask_client (distributed.client.Client, optional) – A Dask Distributed Client instance for distributed training if HGDL is used. If None is provided, a new dask.distributed.Client instance is constructed.

Returns

Optimization object that can be given to `fvgp.gp.update_hyperparameters` to update the prior GP

Return type

object instance

stop_training(opt_obj)

This function stops the training if HGDL is used. It leaves the dask client alive.

Parameters

opt_obj (object) – An object returned form the fvgp.gp.GP.train_async function.

kill_training(opt_obj)

This function stops the training if HGDL is used, and kills the dask client.

Parameters

opt_obj (object) – An object returned form the fvgp.gp.GP.train_async function.

update_hyperparameters(opt_obj)

This function asynchronously finds the maximum of the marginal log_likelihood and therefore trains the GP. This can be done on a remote cluster/computer by providing a dask client. This function just submits the training and returns an object which can be given to fvgp.gp.update_hyperparameters, which will automatically update the GP prior with the new hyperparameters.

Parameters

object (HGDL class instance) – HGDL class instance returned by fvgp.gp.train_async

Returns

The current hyperparameters

Return type

np.ndarray

log_likelihood(hyperparameters)

Function that computes the marginal log-likelihood

Parameters

hyperparameters (np.ndarray) – Vector of hyperparameters of shape (V)

Returns

negative marginal log-likelihood

Return type

float

neg_log_likelihood(hyperparameters)

Function that computes the marginal log-likelihood

Parameters

hyperparameters (np.ndarray) – Vector of hyperparameters of shape (V)

Returns

negative marginal log-likelihood

Return type

float

neg_log_likelihood_gradient(hyperparameters)

Function that computes the gradient of the marginal log-likelihood.

Parameters

hyperparameters (np.ndarray) – Vector of hyperparameters of shape (V)

Returns

Gradient of the negative marginal log-likelihood

Return type

np.ndarray

neg_log_likelihood_hessian(hyperparameters)

Function that computes the Hessian of the marginal log-likelihood.

Parameters

hyperparameters (np.ndarray) – Vector of hyperparameters of shape (V)

Returns

Hessian of the negative marginal log-likelihood

Return type

np.ndarray

slogdet(A)

fvGPs slogdet method based on torch

solve(A, b)

fvGPs slogdet method based on torch

default_mean_function(x, hyperparameters, gp_obj)

evaluates the gp mean function at the data points

posterior_mean(x_pred)

This function calculates the posterior mean for a set of input points.

Parameters

x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

Returns

solution dictionary

Return type

dict

posterior_mean_constraint(x_pred, hyperparameters)

This function recalculates the posterior mean with given hyperparameters so that constraints can be enforced.

Parameters
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • hyperparameters (np.ndarray) – A numpy array of new hyperparameters

Returns

solution dictionary

Return type

dict

posterior_mean_grad(x_pred, direction=None)

This function calculates the gradient of the posterior mean for a set of input points.

Parameters

x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

Returns

solution dictionary

Return type

dict

posterior_covariance(x_pred, variance_only=False)

Function to compute the posterior covariance. :param x_pred: A numpy array of shape (V x D), interpreted as an array of input point positions. :type x_pred: np.ndarray :param variance_only: If True the compuation of the posterior covariance matrix is avoided which can save compute time.

In that case the return will only provide the variance at the input points. Default = False.

Returns

solution dictionary

Return type

dict

posterior_covariance_grad(x_pred, direction=None)

Function to compute the gradient of the posterior covariance.

Parameters

x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

Returns

solution dictionary

Return type

dict

gp_prior(x_pred)

function to compute the data-informed prior :param x_pred: index set which results from a cartesian product of input and output space :type x_pred: 1d or 2d numpy array of points, note, these are elements of the

Returns

solution dictionary

Return type

dict

gp_prior_grad(x_pred, direction)

function to compute the gradient of the data-informed prior :param x_pred: index set which results from a cartesian product of input and output space :type x_pred: 1d or 2d numpy array of points, note, these are elements of the :param direction: :type direction: direction in which to compute the gradient

Returns

solution dictionary

Return type

dict

entropy(S)

function comuting the entropy of a normal distribution res = entropy(S); S is a 2d numpy array, matrix has to be non-singular

gp_entropy(x_pred)

Function to compute the entropy of the prior.

Parameters

x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

Returns

entropy

Return type

float

gp_entropy_grad(x_pred, direction)

Function to compute the gradient of entropy of the prior in a given direction.

Parameters
  • x_pred (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.

  • direction (int) – 0 <= direction <= D - 1

Returns

entropy

Return type

float

kl_div(mu1, mu2, S1, S2)

Function to compute the KL divergence between two Gaussian distributions.

Parameters
  • mu1 (np.ndarray) – Mean vector of distribution 1.

  • mu1 – Mean vector of distribution 2.

  • S1 (np.ndarray) – Covariance matrix of distribution 1.

  • S2 (np.ndarray) – Covariance matrix of distribution 2.

Returns

KL div

Return type

float

gp_kl_div(x_pred, comp_mean, comp_cov)

function to compute the kl divergence of a posterior at given points :param x_pred: index set which results from a cartesian product of input and output space :type x_pred: 1d or 2d numpy array of points, note, these are elements of the :param comp_mean: Comparison mean vector for KL divergence. len(comp_mean) = len(x_pred) :type comp_mean: np.array :param comp_cov: Comparison covariance matrix for KL divergence. shape(comp_cov) = (len(x_pred),len(x_pred)) :type comp_cov: np.array

Returns

solution dictionary

Return type

dict

gp_kl_div_grad(x_pred, comp_mean, comp_cov, direction)

function to compute the gradient of the kl divergence of a posterior at given points :param x_pred: index set which results from a cartesian product of input and output space :type x_pred: 1d or 2d numpy array of points, note, these are elements of the :param comp_mean: Comparison mean vector for KL divergence. len(comp_mean) = len(x_pred) :type comp_mean: np.array :param comp_cov: Comparison covariance matrix for KL divergence. shape(comp_cov) = (len(x_pred),len(x_pred)) :type comp_cov: np.array :param direction: :type direction: direction in which the gradient will be computed

Returns

solution dictionary

Return type

dict

shannon_information_gain(x_pred)

function to compute the shannon-information gain of data :param x_pred: index set which results from a cartesian product of input and output space :type x_pred: 1d or 2d numpy array of points, note, these are elements of the

Returns

solution dictionary

Return type

dict

shannon_information_gain_vec(x_pred)

function to compute the shannon-information gain of data :param x_pred: index set which results from a cartesian product of input and output space :type x_pred: 1d or 2d numpy array of points, note, these are elements of the

Returns

solution dictionary

Return type

dict

shannon_information_gain_grad(x_pred, direction)

function to compute the gradient if the shannon-information gain of data :param x_pred: index set which results from a cartesian product of input and output space :type x_pred: 1d or 2d numpy array of points, note, these are elements of the :param direction: :type direction: direction in which to compute the gradient

Returns

solution dictionary

Return type

dict

posterior_probability(x_pred, comp_mean, comp_cov)

function to compute the probability of an uncertain feature given the gp posterior :param x_pred: index set which results from a cartesian product of input and output space :type x_pred: 1d or 2d numpy array of points, note, these are elements of the :param comp_mean: :type comp_mean: a vector of mean values, same length as x_pred :param comp_cov: :type comp_cov: covarianve matrix, in R^{len(x_pred)xlen(x_pred)}

Returns

solution dictionary

Return type

dict

posterior_probability_grad(x_pred, comp_mean, comp_cov, direction)

function to compute the gradient of the probability of an uncertain feature given the gp posterior :param x_pred: index set which results from a cartesian product of input and output space :type x_pred: 1d or 2d numpy array of points, note, these are elements of the :param comp_mean: :type comp_mean: a vector of mean values, same length as x_pred :param comp_cov: :type comp_cov: covarianve matrix, in R^{len(x_pred)xlen(x_pred)} :param direction: :type direction: direction in which to compute the gradient

Returns

solution dictionary

Return type

dict

squared_exponential_kernel(distance, length)

Function for the squared exponential kernel. kernel = np.exp(-(distance ** 2) / (2.0 * (length ** 2)))

Parameters
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • length (scalar) – The length scale hyperparameters

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray

squared_exponential_kernel_robust(distance, phi)

Function for the squared exponential kernel (robust version) kernel = np.exp(-(distance ** 2) * (phi ** 2))

Parameters
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • phi (scalar) – The length scale hyperparameters

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray

exponential_kernel(distance, length)

Function for the exponential kernel kernel = np.exp(-(distance) / (length))

Parameters
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • length (scalar) – The length scale hyperparameters

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray

exponential_kernel_robust(distance, phi)

Function for the exponential kernel (robust version) kernel = np.exp(-(distance) * (phi**2))

Parameters
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • phi (scalar) – The length scale hyperparameters

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray

matern_kernel_diff1(distance, length)

Function for the matern kernel, order of differentiablity = 1. kernel = (1.0 + ((np.sqrt(3.0) * distance) / (length))) * np.exp(

-(np.sqrt(3.0) * distance) / length

Parameters
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • length (scalar) – The length scale hyperparameters

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray

matern_kernel_diff1_robust(distance, phi)

Function for the matern kernel, order of differentiablity = 1, robust version. kernel = (1.0 + ((np.sqrt(3.0) * distance) * (phi**2))) * np.exp(

-(np.sqrt(3.0) * distance) * (phi**2))

Parameters
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • phi (scalar) – The length scale hyperparameters

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray

matern_kernel_diff2(distance, length)

Function for the matern kernel, order of differentiablity = 2. kernel = (

1.0 + ((np.sqrt(5.0) * distance) / (length)) + ((5.0 * distance ** 2) / (3.0 * length ** 2))

) * np.exp(-(np.sqrt(5.0) * distance) / length)

Parameters
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • length (scalar) – The length scale hyperparameters

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray

matern_kernel_diff2_robust(distance, phi)

Function for the matern kernel, order of differentiablity = 2, robust version. kernel = (

1.0 + ((np.sqrt(5.0) * distance) * (phi**2)) + ((5.0 * distance ** 2) * (3.0 * phi ** 4))

) * np.exp(-(np.sqrt(5.0) * distance) * (phi**2))

Parameters
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • length (scalar) – The length scale hyperparameters

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray

sparse_kernel(distance, radius)

Function for a compactly supported kernel.

Parameters
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • radius (scalar) – Radius of support.

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray

periodic_kernel(distance, length, p)

Function for a periodic kernel. kernel = np.exp(-(2.0/length**2)*(np.sin(np.pi*distance/p)**2))

Parameters
  • distance (scalar or np.ndarray) – Distance between a set of points.

  • length (scalar) – Length scale.

  • p (scalar) – Period.

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray

linear_kernel(x1, x2, hp1, hp2, hp3)

Function for a linear kernel. kernel = hp1 + (hp2*(x1-hp3)*(x2-hp3))

Parameters
  • x1 (float) – Point 1.

  • x2 (float) – Point 2.

  • hp1 (float) – Hyperparameter.

  • hp2 (float) – Hyperparameter.

  • hp3 (float) – Hyperparameter.

Returns

A structure of the shape of the distance input parameter

Return type

float

dot_product_kernel(x1, x2, hp, matrix)

Function for a dot-product kernel. kernel = hp + x1.T @ matrix @ x2

Parameters
  • x1 (np.ndarray) – Point 1.

  • x2 (np.ndarray) – Point 2.

  • hp (float) – Offset hyperparameter.

  • matrix (np.ndarray) – PSD matrix defining the inner product.

Returns

A structure of the shape of the distance input parameter

Return type

float

polynomial_kernel(x1, x2, p)

Function for a polynomial kernel. kernel = (1.0+x1.T @ x2)**p

Parameters
  • x1 (np.ndarray) – Point 1.

  • x2 (np.ndarray) – Point 2.

  • p (float) – Power hyperparameter.

Returns

A structure of the shape of the distance input parameter

Return type

float

default_kernel(x1, x2, hyperparameters, obj)

Function for the default kernel, a Matern kernel of first-order differentiability.

Parameters
  • x1 (np.ndarray) – Numpy array of shape (U x D)

  • x2 (np.ndarray) – Numpy array of shape (V x D)

  • hyperparameters (np.ndarray) – Array of hyperparameters. For this kernel we need D + 1 hyperparameters

  • obj (object instance) – GP object instance.

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray

non_stat_kernel(x1, x2, x0, w, l)

Non-stationary kernel. kernel = g(x1) g(x2)

Parameters
  • x1 (np.ndarray) – Numpy array of shape (U x D)

  • x2 (np.ndarray) – Numpy array of shape (V x D)

  • x0 (np.array) – Numpy array of the basis function locations

  • w (np.ndarray) – 1d np.array of weights. len(w) = len(x0)

  • l (float) – Width measure of the basis functions.

Returns

A structure of the shape of the distance input parameter

Return type

float or np.ndarray