GPs on Non-Euclidean Input Spaces

GPs on non-Euclidean input spaces have become more and more relevant in recent years. fvgp can be used for that purpose as long as a cvalid kernel is provided. Of course, if mean functions and noise functions are also provided, they have to operate on these non-Euclidean spaces.

In this example, we run a small GP on words. It’s a proof of concept, the results are not super relevant

#install the newest version of fvgp
#!pip install fvgp==4.1.1
import numpy as np
import matplotlib.pyplot as plt
from fvgp import GP
from dask.distributed import Client
%load_ext autoreload
%autoreload 2
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
#making the x_data a set will allow us to put any objects or structures into it.
x_data = [('hello'),('world'),('this'),('is'),('fvgp')]
y_data = np.array([2.,1.9,1.8,3.0,5.])


def string_distance(string1, string2):
    difference = abs(len(string1) - len(string2))
    common_length = min(len(string1),len(string2))
    string1 = string1[0:common_length]
    string2 = string2[0:common_length]
    
    for i in range(len(string1)):
        if string1[i] != string2[i]:
            difference += 1.

    return difference


def kernel(x1,x2,hps,obj):
    d = np.zeros((len(x1),len(x2)))
    count1 = 0
    for string1 in x1:
        count2 = 0
        for string2 in x2:
            d[count1,count2] = string_distance(string1,string2)
            count2 += 1
        count1 += 1
    return hps[0] * obj.matern_kernel_diff1(d,hps[1])
    



my_gp = GP(1, x_data,y_data,init_hyperparameters=np.ones((2)), gp_kernel_function=kernel, info = True)

bounds = np.array([[0.001,100.],[0.001,100]])
my_gp.train(hyperparameter_bounds=bounds)

print("hyperparameters: ", my_gp.hyperparameters)
print("prediction : ",my_gp.posterior_mean(['full'])["f(x)"])
print("uncertainty: ",np.sqrt(my_gp.posterior_covariance(['full'])["v(x)"]))
/tmp/ipykernel_1174962/2027899493.py:33: UserWarning: No noise function or measurement noise provided. Noise variances will be set to 1% of mean(y_data).
  my_gp = GP(1, x_data,y_data,init_hyperparameters=np.ones((2)), gp_kernel_function=kernel, info = True)
differential_evolution step 1: f(x)= 8.43858
differential_evolution step 2: f(x)= 8.43858
differential_evolution step 3: f(x)= 8.43858
differential_evolution step 4: f(x)= 8.43858
differential_evolution step 5: f(x)= 8.31636
differential_evolution step 6: f(x)= 8.31636
differential_evolution step 7: f(x)= 8.31636
differential_evolution step 8: f(x)= 8.31636
differential_evolution step 9: f(x)= 8.31636
differential_evolution step 10: f(x)= 8.24918
differential_evolution step 11: f(x)= 8.06702
differential_evolution step 12: f(x)= 8.06702
differential_evolution step 13: f(x)= 8.05368
differential_evolution step 14: f(x)= 8.04585
differential_evolution step 15: f(x)= 8.04566
differential_evolution step 16: f(x)= 8.0449
differential_evolution step 17: f(x)= 8.0449
differential_evolution step 18: f(x)= 8.04489
differential_evolution step 19: f(x)= 8.04489
differential_evolution step 20: f(x)= 8.04489
differential_evolution step 21: f(x)= 8.04489
differential_evolution step 22: f(x)= 8.04489
hyperparameters:  [1.4347513  0.15607455]
prediction :  [2.74]
uncertainty:  [1.19781104]