gp2Scale

gp2Scale is a special setting in fvgp that combines non-stationary, compactly-supported kernels, HPC distributed computing, and sparse linear algebra to allow scale-up of exact GPs to millions of data points. gp2Scale holds the world record in this category! Here we run a moderately-sized GP, just because we assume you might run this locally.

I hope it is clear how cool it is what is happening here. If you have a dask client that points to a remote cluster with 500 GPUs, you will distribute the covariance matrix computation across those. The full matrix is sparse and will be fast to work with in downstream operations. The algorithm only makes use of naturally-occuring sparsity, so the result is exact in contrast to Vecchia or inducing-point methods.

##first install the newest version of fvgp
#!pip install fvgp==4.1.1

Setup

import numpy as np
import matplotlib.pyplot as plt
from fvgp import GP
from dask.distributed import Client
%load_ext autoreload
%autoreload 2

client = Client() ##this is the client you can make locally like this or 
#your HPC team can provide a script to get it. We included an example to get gp2Scale going
#on Perlmutter


#It's good practice to make sure to wait for all the workers to be ready
client.wait_for_workers(4)

Preparing the data and some other inputs

def f1(x):
    return ((np.sin(5. * x) + np.cos(10. * x) + (2.* (x-0.4)**2) * np.cos(100. * x)))

input_dim = 1
N = 10000
x_data = np.random.rand(N,input_dim)
y_data = f1(x_data)
hps_n = 2

hps_bounds = np.array([[0.1,10.],      ##signal var of Wendland kernel
                       [0.001,0.02]])  ##length scale for Wendland kernel
        
init_hps = np.random.uniform(size = len(hps_bounds), low = hps_bounds[:,0], high = hps_bounds[:,1])

my_gp2S = GP(1, x_data,y_data,init_hps,  #compute_device = 'gpu', #you can use gpus here
            gp2Scale = True, gp2Scale_batch_size= 1000, gp2Scale_dask_client = client, info = True
            )

my_gp2S.train(hyperparameter_bounds = hps_bounds, max_iter = 2)
Transferring the covariance matrix to host done after  1.310960054397583  seconds. sparsity =  0.02041036
MINRES solve in progress ... 1.353991985321045 seconds.
MINRES solve done after  1.6024701595306396 seconds.
logdet() in progress ...  1.6025190353393555 seconds.
                                    results                                   
==============================================================================
     inquiries                            error            samples            
--------------------              ---------------------   ---------           
i         parameters       trace    absolute   relative   num   out  converged
==============================================================================
1               none  -3.491e+04   8.835e+01     0.253%   100     0      False

                                    config                                    
==============================================================================
                matrix                            stochastic estimator        
-------------------------------------    -------------------------------------
gram:                           False    method:                           slq
exponent:                           1    lanczos degree:                    20
num matrix parameters:              0    lanczos tol:                2.220e-16
data type:                     64-bit    orthogonalization:               none

             convergence                                 error                
-------------------------------------    -------------------------------------
min num samples:                   10    abs error tol:              0.000e+00
max num samples:                  100    rel error tol:                  10.00%
outlier significance level:     0.00%    confidence level:              95.00%

                                   process                                    
==============================================================================
                 time                                   device                
-------------------------------------    -------------------------------------
tot wall time (sec):        4.612e+00    num cpu threads:                    1
alg wall time (sec):        4.610e+00    num gpu devices, multiproc:     0,  0
cpu proc time (sec):        4.634e+00    num gpu threads per multiproc:      0

logdet/LU done after  6.216046333312988 seconds.
Transferring the covariance matrix to host done after  0.8912520408630371  seconds. sparsity =  0.01154832
MINRES solve in progress ... 0.9157159328460693 seconds.
MINRES solve done after  1.0464270114898682 seconds.
logdet() in progress ...  1.0464534759521484 seconds.
                                    results                                   
==============================================================================
     inquiries                            error            samples            
--------------------              ---------------------   ---------           
i         parameters       trace    absolute   relative   num   out  converged
==============================================================================
1               none  -3.847e+04   5.408e+01     0.141%   100     0      False

                                    config                                    
==============================================================================
                matrix                            stochastic estimator        
-------------------------------------    -------------------------------------
gram:                           False    method:                           slq
exponent:                           1    lanczos degree:                    20
num matrix parameters:              0    lanczos tol:                2.220e-16
data type:                     64-bit    orthogonalization:               none

             convergence                                 error                
-------------------------------------    -------------------------------------
min num samples:                   10    abs error tol:              0.000e+00
max num samples:                  100    rel error tol:                  10.00%
outlier significance level:     0.00%    confidence level:              95.00%

                                   process                                    
==============================================================================
                 time                                   device                
-------------------------------------    -------------------------------------
tot wall time (sec):        2.438e+00    num cpu threads:                    1
alg wall time (sec):        2.437e+00    num gpu devices, multiproc:     0,  0
cpu proc time (sec):        2.450e+00    num gpu threads per multiproc:      0

logdet/LU done after  3.4843878746032715 seconds.
Transferring the covariance matrix to host done after  0.9209725856781006  seconds. sparsity =  0.00565176
MINRES solve in progress ... 0.9336745738983154 seconds.
MINRES solve done after  0.9836099147796631 seconds.
logdet() in progress ...  0.9836294651031494 seconds.
                                    results                                   
==============================================================================
     inquiries                            error            samples            
--------------------              ---------------------   ---------           
i         parameters       trace    absolute   relative   num   out  converged
==============================================================================
1               none  -3.514e+04   6.747e+01     0.192%   100     0      False

                                    config                                    
==============================================================================
                matrix                            stochastic estimator        
-------------------------------------    -------------------------------------
gram:                           False    method:                           slq
exponent:                           1    lanczos degree:                    20
num matrix parameters:              0    lanczos tol:                2.220e-16
data type:                     64-bit    orthogonalization:               none

             convergence                                 error                
-------------------------------------    -------------------------------------
min num samples:                   10    abs error tol:              0.000e+00
max num samples:                  100    rel error tol:                  10.00%
outlier significance level:     0.00%    confidence level:              95.00%

                                   process                                    
==============================================================================
                 time                                   device                
-------------------------------------    -------------------------------------
tot wall time (sec):        1.114e+00    num cpu threads:                    1
alg wall time (sec):        1.113e+00    num gpu devices, multiproc:     0,  0
cpu proc time (sec):        1.119e+00    num gpu threads per multiproc:      0

logdet/LU done after  2.0978715419769287 seconds.
Transferring the covariance matrix to host done after  0.8941638469696045  seconds. sparsity =  0.01154832
MINRES solve in progress ... 0.914618968963623 seconds.
MINRES solve done after  1.022510290145874 seconds.
logdet() in progress ...  1.0225327014923096 seconds.
                                    results                                   
==============================================================================
     inquiries                            error            samples            
--------------------              ---------------------   ---------           
i         parameters       trace    absolute   relative   num   out  converged
==============================================================================
1               none  -3.854e+04   6.689e+01     0.174%   100     0      False

                                    config                                    
==============================================================================
                matrix                            stochastic estimator        
-------------------------------------    -------------------------------------
gram:                           False    method:                           slq
exponent:                           1    lanczos degree:                    20
num matrix parameters:              0    lanczos tol:                2.220e-16
data type:                     64-bit    orthogonalization:               none

             convergence                                 error                
-------------------------------------    -------------------------------------
min num samples:                   10    abs error tol:              0.000e+00
max num samples:                  100    rel error tol:                  10.00%
outlier significance level:     0.00%    confidence level:              95.00%

                                   process                                    
==============================================================================
                 time                                   device                
-------------------------------------    -------------------------------------
tot wall time (sec):        2.563e+00    num cpu threads:                    1
alg wall time (sec):        2.562e+00    num gpu devices, multiproc:     0,  0
cpu proc time (sec):        2.576e+00    num gpu threads per multiproc:      0

logdet/LU done after  3.5861313343048096 seconds.
x_pred = np.linspace(0,1,100) ##for big GPs, this is usually not a good idea, but in 1d, we can still do it
                              ##It's better to do predicitons only for a handful of points.

mean1 = my_gp2S.posterior_mean(x_pred.reshape(-1,1))["f(x)"]
var1 =  my_gp2S.posterior_covariance(x_pred.reshape(-1,1), variance_only=False)["v(x)"]

print(my_gp2S.hyperparameters)

plt.figure(figsize = (16,10))
plt.plot(x_pred,mean1, label = "posterior mean", linewidth = 4)
plt.plot(x_pred,f1(x_pred), label = "latent function", linewidth = 4)
plt.fill_between(x_pred, mean1 - 3. * np.sqrt(var1), mean1 + 3. * np.sqrt(var1), alpha = 0.5, color = "grey", label = "var")
plt.scatter(x_data,y_data, color = 'black')
[2.24028262 0.0057387 ]
<matplotlib.collections.PathCollection at 0x7fbb75099f60>
../_images/7beb94a74706e3227359ef5917d5f4d64416544b4a1c750d9c9dfbcb9d055bc1.png