Tutorial

If you didn't, please read first the Overview.

PosDefManifoldML features two bacic machine learning modes of operation:

  • train-test: a machine learning (ML) model is first fitted (trained), then it can be used to predict (test) the labels of testing data or the probability of the data to belong to each class. The raw prediction function of the models is available as well.

  • a k-fold cross-validation procedure allows to estimate the accuracy of ML models and compare them.

The train-test mode is useful in cross-subject and cross-session settings, while cross-validation is the standard for within-session settings.

What PosDefManifoldML does for you is to allow an homogeneous syntax to operate in these two modes for all implemented ML models, it does not matter if they act directly on the manifold of positive definite matrices or on the tangent space. It also features

  • Pre-conditining pipelines, which can drastically reduce the execution time
  • Adaptation techniques, which, besides being very useful in cross-session and cross-subject settings,

are instrumental for implementing on-line modes of operation.

Note that models acting on the tangent space can take as input Euclidean feature vectors instead of positive definite matrices, thus they can be used in many more situations.

Get data

Let us create simulated data for a 2-class example. First, let us create symmetric positive definite matrices (real positive definite matrices):

using PosDefManifoldML, PosDefManifold

PTr, PTe, yTr, yTe=gen2ClassData(10, 30, 40, 60, 80, 0.1);
  • PTr is the simulated training set, holding 30 matrices for class 1 and 40 matrices for class 2
  • PTe is the testing set, holding 60 matrices for class 1 and 80 matrices for class 2.
  • yTr is a vector of 70 labels for the training set
  • yTe is a vector of 140 labels for the testing set

All matrices are of size 10x10.

Examples using the MDM model

The minimum distance to mean (MDM) classifier is an example of classifier acting directly on the manifold. It is deterministic and no hyperparameter tuning is needed.

MDM train-test

Craete and fit an MDM model

An MDM model is created and fitted with training data such as

m = fit(MDM(Fisher), PTr, yTr)

where Fisher (affine-invariant) is the usual choice of a Metric as declared in the parent package PosDefManifold.

Since the Fisher metric is the default (for all ML models), the above is equivalent to:

m = fit(MDM(), PTr, yTr)

In order to adopt another metric:

m1 = fit(MDM(logEuclidean), PTr, yTr)

Predict (classify data)

In order to predict the labels of unlabeled data (which we have stored in PTe), we invoke

yPred=predict(m, PTe, :l)

The prediction error in percent can be retrived with

predictErr(yTe, yPred)

the predicton accuracy as

predictAcc(yTe, yPred)

and the confusion matrix as

confusionMat(yTe, yPred)

where in yTe we have stored the true labels for the matrices in PTe.

If instead we wish to estimate the probabilities for the matrices in PTe of belonging to each class:

predict(m, PTe, :p)

Finally, the output functions of the MDM are obtaine by (see predict)

predict(m, PTe, :f)

MDM cross-validation

The balanced accuracy estimated by a k-fold cross-validation is obtained such as (10-fold by default)

cv = crval(MDM(), PTr, yTr)

As for all functions in the Julia language, the first time you run a function it is compiled, so it is slow. To appreciate the speed, run it again

cv = crval(MDM(), PTr, yTr)

Struct cv has been created and therein you have access to average accuracy and confusion matrix as well as accuracies and confusion matrices for all folds. For example, print the average confusion matrix (expressed in proportions):

cv.avgCnf

See CVres for details on the fields of cross-validation objects.

MDM adaptation

Let's see how to adapt a pre-conditioning pipeline. Suppose you have data from two sessions or two subjects, s1 and s2. We want to use s1 to train a machine learning model on the tanget space and s2 to test it. A pipeline is fitted sor s1 and we want this pipeline to adapt to s2 for testing. If the pipeline includes a recentering pre-conditioner, we need to make sure that the dimensionality reduction determined on s2 is that same as in `s1.

Get data Let us get some simulated data. We generate random data and labels for session (or subject) 1 and 2.

Ps1, Ps2, ys1, ys2 = gen2ClassData(10, 30, 40, 60, 80);

Define the pre-conditioning pipeline for s1

p = @→ Recenter(; eVar=0.999) → Compress → Shrink(Fisher; radius=0.02)

Fit an MDM model on s1 using the pipeline

m = fit(MDM(), Ps1, ys1; pipeline = p)

The fitted pipeline with all learnt parameters is stored in model m. Instead of transforming the data in s2 using this pipeline, which is the default behavior of the predict function, let us define the same pipeline with a dimensionality reduction parameter fixed as it has been learnt on s1. This way this paramater cannot change and the transformed matrices in s1 and s2 will have equal size. That is, we allow adaptation of all parameters, but force the same dimension.

p = @→ Recenter(; eVar=dim(m.pipeline)) → Compress → Shrink(Fisher; radius=0.02)

Fit the pipeline to s2 and predict

predict(m, Ps2, :l; pipeline=p)

Examples using the ENLR model

The elastic net logistic regression (ENLR) classifier is an example of classifier acting on the tangent space. Besides the metric (see above) used to compute a base-point for projecting the data onto the tangent space, it has a parameter alpha and an hyperparameter lambda. The alpha parameter allows to trade off between a pure ridge LR model ($α=0$) and a pure lasso LR model ($α=1$), which is the default. Given an alpha value, the model is fitted with a number of values for the $λ$ (regularization) hyperparameter. Thus, differently from the previous example, tuning the $λ$ hyperparameter is necessary.

Also, keep in mind that the fit and predict methods for ENLR models accept optional keyword arguments that are specific to this model.

Get data

Let us get some simulated data (see the previous example for explanations).

using PosDefManifoldML, PosDefManifold

PTr, PTe, yTr, yTe=gen2ClassData(10, 30, 40, 60, 80, 0.1);

ENLR train-test

Craete and fit ENLR models

By default, the Fisher metric ic adopted and a lasso model is fitted. The best value for the lambda hyperparameter is found by cross-validation:

m1 = fit(ENLR(), PTr, yTr; w=:balanced)

Notice that since the class are unbalanced, with the w=:balanced argument (we may as well just use w=:b) we have requested to compute a balanced mean for projecting the matrices in PTr onto the tangent space.

The optimal value of lambda for this training data is

m1.best.lambda

As in GLMNet.jl, the intercept and beta terms are retrived by

m1.best.a0
m1.best.betas

The number of non-zero beta coefficients can be found, for example, by

length(unique(m1.best.betas))-1

In order to fit a ridge LR model:

m2 = fit(ENLR(), PTr, yTr; w=:b, alpha=0)

Values of alpha in range $(0, 1)$ fit instead an elastic net LR model. In the following we also request not to normalize predictors (by default they norm is fixed):

m3 = fit(ENLR(Fisher), PTr, yTr; w=:b, alpha=0.9, normalize=nothing)

Instead we could standardize predictors:

m4 = fit(ENLR(Fisher), PTr, yTr; w=:b, alpha=0.9, normalize=standardize!)

or rescale them within custom limits:

m5 = fit(ENLR(Fisher), PTr, yTr; w=:b, alpha=0.9, normalize=(-1.0, 1.0))

In order to find the regularization path we use the fitType keyword argument:

m1 = fit(ENLR(Fisher), PTr, yTr; w=:b, fitType=:path)

The values of lambda along the path are given by

m1.path.lambda

We can also find the best value of the lambda hyperparameter and the regularization path at once, calling:

m1 = fit(ENLR(Fisher), PTr, yTr; w=:b, fitType=:all)

For changing the metric see MDM train-test.

See the documentation of the fit ENLR method for details on all available optional arguments.

Classify data (predict)

For prediction, we can request to use the best model (optimal lambda), to use a specific model of the regularization path or to use all the models in the regalurization path. Note that with the last call we have done here above both the .best and .path field of the m1 structure have been created.

By default, prediction is obtained from the best model and we request to predict the labels:

yPred=predict(m1, PTe)

# prediction accuracy (in proportion)
predictAcc(yPred, yTe)

# confusion matrix
confusionMat(yPred, yTe)

# predict probabilities of matrices in `PTe` to belong to each class
predict(m1, PTe, :p)

# output function of the model for each class
predict(m1, PTe, :f)

In order to request the predition of labels for all models in the regularization path:

yPred=predict(m1, PTe, :l, :path, 0)

while for a specific model in the path (e.g., model #10):

yPred=predict(m1, PTe, :l, :path, 10)

ENLR cross-validation

The balanced accuracy estimated by a k-fold cross-validation is obtained with the exact same basic syntax for all models, with some specific optional keyword arguments for models acting in the tangent space, for example:

cv = crval(ENLR(), PTr, yTr; w=:b)

In order to perform another cross-validation arranging the training data differently in the folds:

cv = crval(ENLR(), PTr, yTr; w=:b, shuffle=true)

This last command can be invoked repeatedly.

ENLR adaptation

First, let's see how to adapt the base point for projecting the data onto the tangent space. Suppose you have data from two sessions or two subjects, s1 and s2. We want to use s1 to train a machine learning model on the tanget space and s2 to test it, however, the barycenter s1 cannot be assumed equal to the barycenter of s2. The barycenter determines the base point, therefore, we adapt it.

Get data Let us get some simulated data. We generate random data and labels for session (or subject) 1 and 2.

Ps1, Ps2, ys1, ys2 = gen2ClassData(10, 30, 40, 60, 80);

Craete and fit an ENLR model on s1

m = fit(ENLR(Fisher), Ps1, ys1)

Classify (predict) data of s2 adapting the base point

predict(m, PTe, :l; meanISR=invsqrt(mean(Fisher, PTe)))

Second, let's see how to adapt a pre-conditioning pipeline like we have done here above for the base point. Since the pipeline we will employ recenter the data around the identity, we can skip altogether the computation of the barycenter for s2, using the identity matrix as the base point.

The pipeline we will define comprises a recentering pre-conditioner with dimensionality reduction. While adapting the pipeline to s2, we need to make sure that the matrices in s2 are reduced to the same dimension as the matrices in s1, otherwise the machine learning model we fit on s1 cannot operate on s2. For this, we need to set the eVar argmument of the [Recenter] pre-conditioner to a integer matching the reduced dimension of s1. Note that the adaptation may not work well if the class proportions is different in s1 and s2.

Define the pre-conditioning pipeline for s1

p = @→ Recenter(; eVar=0.999) → Compress → Shrink(Fisher; radius=0.02)

Fit the model on s1 using the pipeline

m = fit(ENLR(), Ps1, ys1; pipeline = p)

Define the same pipeline with fixed dimensionality reduction parameter

p = @→ Recenter(; eVar=dim(m.pipeline)) → Compress → Shrink(Fisher; radius=0.02)

Fit the pipeline to s2 (adapt) and use the identity matrix as base point

predict(m, Ps2, :l; pipeline=p, meanISR=I) 

Examples using SVM models

The SVM ML model actually encapsulates several support-vector classification and support-vector regression models. Here we are concerned with the former, which include the C-Support Vector Classification (SVC), the Nu-Support Vector Classification (NuSVC), similar to SVC but using a parameter to control the number of support vectors, and the One-Class SVM (OneClassSVM), which is used in general for unsupervised outlier detection. They all act in the tangent space like ENLR models. Besides the metric (see MDM train-test) used to compute a base-point for projecting the data onto the tangent space and the type of SVM model (the svmType, = SVC (default), NuSVC or OneClassSVM), the main parameter is the kernel. Avaiable kernels are:

  • Linear (default)
  • RadialBasis
  • Polynomial
  • Sigmoid

Several parameters are available for building all these kernels besides the linear one, which has no parameter. Like for ENLR, for SVM models also an hyperparameter is to be found by cross-validation.

Get data

Let us get some simulated data as in the previous examples.

using PosDefManifoldML, PosDefManifold

PTr, PTe, yTr, yTe=gen2ClassData(10, 30, 40, 60, 80, 0.1);

SVM train-test

Craete and fit SVM models

By default, a C-Support Vector Classification model is fitted:

m1 = fit(SVM(), PTr, yTr; w=:b)

Notice that as for the example above with for ENLR model, we have requested to compute a balanced mean for projecting the matrices in PTr onto the tangent space.

In order to fit a Nu-Support Vector Classification model:

m2 = fit(SVM(), PTr, yTr; w=:b, svmType=NuSVC)

For using other kernels, e.g.:

m3 = fit(SVM(), PTr, yTr; w=:b, svmType=NuSVC, kernel=Polynomial)

In the following we request not to normalize predictors (by default they norm is fixed):

m4 = fit(SVM(), PTr, yTr; w=:b, normalize=nothing)

Instead we could standardize predictors:

m5 = fit(SVM(), PTr, yTr; w=:b, normalize=standardize!)

or rescale them within custom limits:

m6 = fit(SVM(), PTr, yTr; w=:b, normalize=(-1.0, 1.0))

By default the Fisher metric is used. For changing it, see MDM train-test.

See the documentation of the fit SVM method for details on all available optional arguments.

Classify data (predict)

Just the same as for the other models:

yPred=predict(m1, PTe)

# prediction accuracy (in proportion)
predictAcc(yPred, yTe)

# confusion matrix
confusionMat(yPred, yTe)

# predict probabilities of matrices in `PTe` to belong to each class
predict(m1, PTe, :p)

# output function of the model for each class
predict(m1, PTe, :f)

SVM cross-validation

Again, the balanced accuracy estimated by a k-fold cross-validation is obtained with the exact same basic syntax for all models, with some specific optional keyword arguments for models acting in the tangent space, for example:

cv = crval(SVM(), PTr, yTr; w=:b)

SVM adaptation

See the tutirial on ENLR adaptation; the code needed is exactly the same changing the machine learning model from ENLR to SVM.