cv.jl

This unit implements cross-validation procedures for estimating the accuracy and balanced accuracy of machine learning models. It also reports the documentation of the fit and predict functions, as they are common to all models.

Content

struct	description
`CVres`	encapsulate the results of cross-validation procedures for estimating accuracy

function	description
`fit`	fit a machine learning model with training data
`predict`	given a fitted model, predict labels, probabilities or scoring functions on test data
`crval`	perform a cross-validation and store accuracies, error losses, confusion matrices, the results of a statistical test and other informations
`cvSetup`	generate indexes for performing cross-validtions

PosDefManifoldML.CVres — Type

struct CVres <: CVresult
    cvType      :: String
    scoring     :: Union{String, Nothing}
    modelType   :: Union{String, Nothing}
    predLabels  :: Union{Vector{Vector{Vector{I}}}, Nothing} where I<:Int
    losses      :: Union{Vector{BitVector}, Nothing}
    cnfs        :: Union{Vector{Matrix{I}}, Nothing} where I<:Int
    avgCnf      :: Union{Matrix{T}, Nothing} where T<:Real
    accs        :: Union{Vector{T}, Nothing} where T<:Real
    avgAcc      :: Union{Real, Nothing}
    stdAcc      :: Union{Real, Nothing}
    z           :: Union{Real, Nothing}
    p           :: Union{Real, Nothing}
    ms          :: Union{Int64, Nothing}
end

A call to crval results in an instance of this structure.

Fields:

.cvTpe is the type of cross-validation technique, given as a string (e.g., "10-fold").

.scoring is the type of accuracy that is computed, given as a string. This is controlled when calling crval. Currently, accuracy and balanced accuracy are supported.

.modelType is the type of the machine learning model used for performing the cross-validation, given as a string.

.nTrials is the total number of trials entering the cross-validation.

.matSize is the size of the input matrices (trials).

.predLabels is an f-vector of z integer vectors holding the vectors of predicted labels. There is one vector for each fold (f) and each containes as many vector as classes (z), in turn each one containing the predicted labels for the trials.

.losses is an f-vector holding BitVector types (vectors of booleans), each holding the binary loss for a fold.

.cnfs is an f-vector of matrices holding the confusion matrices obtained at each fold of the cross-validation. These matrices holds frequencies (counts), that is, the sum of all elements equals the number of trials used for each fold.

.avgCnf is the average confusion matrix of proportions across the folds of the cross-validation. This matrix holds proportions, that is, the sum of all elements equal 1.0.

.accs is an f-vector of real numbers holding the accuracies obtained at each fold of the cross-validation.

.avgAcc is the average accuracy across the folds of the cross-validation.

.stdAcc is the standard deviation of the accuracy across the folds of the cross-validation.

.z is the test-statistic fot the hypothesis that the observed average error loss is inferior to the specified expected value.

.p is the p-value of the above hypothesis test.

.ms is the execution time in milliseconds, excluding the time to compute the confusion matrix and p-value. If function crval has not been compiled yet, the time includes the compilation time. In any case the provided estimate is subjected to high variability. To get a better estimate use the median or minimum across several runs or call the function using the BenchmarkTools.jl package.

See crval for more informations.

StatsAPI.fit — Function

function fit(model :: MDMmodel,
              𝐏Tr   :: ℍVector,
              yTr   :: IntVector;
        pipeline :: Union{Pipeline, Nothing} = nothing,
        w        :: Vector = [],
        ✓w       :: Bool  = true,
        meanInit :: Union{ℍVector, Nothing} = nothing,
        tol      :: Real  = 1e-5,
        verbose  :: Bool  = true,
        ⏩       :: Bool  = true)

Fit an MDM machine learning model, with training data 𝐏Tr, of type ℍVector, and corresponding labels yTr, of type IntVector. Return the fitted model.

Class Labels

Labels must be provided using the natural numbers, i.e., 1 for the first class, 2 for the second class, etc.

Fitting an MDM model involves only computing a mean (barycenter) of all the matrices in each class. Those class means are computed according to the metric specified by the MDM constructor.

Optional keyword arguments:

If a pipeline, of type Pipeline is provided, all necessary parameters of the sequence of conditioners are fitted and all input matrices 𝐏Tr are transformed according to the specified pipeline before fitting the ML model. The parameters are stored in the output ML model. Note that the fitted pipeline is automatically applied by any successive call to function predict to which the output ML model is passed as argument. Note that the input matrices 𝐏Tr are transformed; pass a copy of 𝐏Tr if you wish to mantain the original matrices.

w is a vector of non-negative weights associated with the matrices in 𝐏Tr. This weights are used to compute the mean for each class. See method (3) of the mean function for the meaning of the arguments w, ✓w and ⏩, to which they are passed. Keep in mind that here the weights should sum up to 1 separatedly for each class, which is what is ensured by this function if ✓w is true.

tol is the tolerance required for those algorithms that compute the mean iteratively (they are those adopting the Fisher, logdet0 or Wasserstein metric). It defaults to 1e-5. For details on this argument see the functions that are called for computing the means (from package PosDefManifold.jl):

Fisher metric: gmean
logdet0 metric: ld0mean
Wasserstein metric: Wasmean.

For those algorithm an initialization can be provided with optional keyword argument meanInit. If provided, this must be a vector of Hermitian matrices of the ℍVector type and must contain as many initializations as classes, in the natural order corresponding to the class labels (see above).

If verbose is true (default), information is printed in the REPL.

See: notation & nomenclature, the ℍVector type

See also: predict, crval

Examples

using PosDefManifoldML, PosDefManifold

# Generate some data
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80, 0.25)

# Create and fit a model:
m = fit(MDM(Fisher), PTr, yTr)

# Create and fit a model using a pre-conditioning pipeline:
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
m = fit(MDM(Fisher), PTr, yTr; pipeline=p)

function fit(model	:: ENLRmodel,
             𝐏Tr	 :: Union{HermitianVector, Matrix{Float64}},
             yTr	:: IntVector;

    # pipeline (data transformations)
    pipeline    :: Union{Pipeline, Nothing} = nothing,

    # parameters for projection onto the tangent space
    w           :: Union{Symbol, Tuple, Vector} = Float64[],
    meanISR     :: Union{Hermitian, Nothing, UniformScaling} = nothing,
    meanInit    :: Union{Hermitian, Nothing} = nothing,
    vecRange    :: UnitRange = 𝐏Tr isa ℍVector ? (1:size(𝐏Tr[1], 2)) : (1:size(𝐏Tr, 2)),
    normalize	:: Union{Function, Tuple, Nothing} = normalize!,

    # arguments for `GLMNet.glmnet` function
    alpha           :: Real = model.alpha,
    weights         :: Vector{Float64} = ones(Float64, length(yTr)),
    intercept       :: Bool = true,
    fitType         :: Symbol = :best,
    penalty_factor  :: Vector{Float64} = ones(Float64, _getDim(𝐏Tr, vecRange)),
    constraints     :: Matrix{Float64} = [x for x in (-Inf, Inf), y in 1:_getDim(𝐏Tr, vecRange)],
    offsets         :: Union{Vector{Float64}, Nothing} = nothing,
    dfmax           :: Int = _getDim(𝐏Tr, vecRange),
    pmax            :: Int = min(dfmax*2+20, _getDim(𝐏Tr, vecRange)),
    nlambda         :: Int = 100,
    lambda_min_ratio:: Real = (length(yTr)*2 < _getDim(𝐏Tr, vecRange) ? 1e-2 : 1e-4),
    lambda          :: Vector{Float64} = Float64[],
    maxit           :: Int = 1000000,
    algorithm       :: Symbol = :newtonraphson,
    checkArgs       :: Bool = true,

    # selection method
    λSelMeth        :: Symbol = :sd1,

    # arguments for `GLMNet.glmnetcv` function
    nfolds          :: Int = min(10, div(size(yTr, 1), 3)),
    folds           :: Vector{Int} =
    begin
        n, r = divrem(size(yTr, 1), nfolds)
        shuffle!([repeat(1:nfolds, outer=n); 1:r])
    end,
    parallel        :: Bool=true,

    # Generic and common parameters
    tol             :: Real = 1e-5,
    verbose         :: Bool = true,
    ⏩              :: Bool = true,
)

Create and fit an 2-class elastic net logistic regression (ENLR) machine learning model, with training data 𝐏Tr, of type ℍVector, and corresponding labels yTr, of type IntVector. Return the fitted model(s) as an instance of the ENLR structure.

Class Labels

Labels must be provided using the natural numbers, i.e., 1 for the first class, 2 for the second class, etc.

As for all ML models acting in the tangent space, fitting an ENLR model involves computing a mean (barycenter) of all the matrices in 𝐏Tr, projecting all matrices onto the tangent space after parallel transporting them at the identity matrix and vectorizing them using the vecP operation. Once this is done, the ENLR is fitted.

The mean is computed according to the .metric field of the model, with optional weights w. The .metric field of the model is passed internally to the tsMap function. By default the metric is the Fisher metric. See the examples here below to see how to change metric. See mdm.jl or check out directly the documentation of PosDefManifold.jl for the available metrics.

Optional keyword arguments

By default, uniform weights will be given to all observations for computing the mean to project the data in the tangent space. This is equivalent to passing as argument w=:uniform (or w=:u). You can also pass as argument:

w=:balanced (or simply w=:b). If the two classes are unbalanced, the weights should better be inversely proportional to the number of examples for each class, in such a way that each class contributes equally to the computation of the mean. This is equivalent of passing w=tsWeights(yTr). See the tsWeights function for details.
w=v, where v is a user defined vector of non-negative weights for the observations, thus, v must contain the same number of elements as yTr. For example, w=[1.0, 1.0, 2.0, 2.0, ...., 1.0]
w=t, where t is a 2-tuple of real weights, one weight for each class, for example w=(0.5, 1.5). This is equivalent to passing w=tsWeights(yTr; classWeights=collect(0.5, 1.5)), see the tsWeights function for details.

By default meanISR=nothing and the inverse square root (ISR) of the mean used for projecting the matrices onto the tangent space (see tsMap) is computed. An Hermitian matrix or I (the identity matrix) can also be passed as argument meanISR and in this case this matrix will be used as the ISR of the mean. Passed or computed, it will be written in the .meanISR field of the model structure created by this function. Notice that passing I, the matrices will be projected onto the tangent space at the identity without recentering them. This is possible if the matrices have been recentered by a pre-conditioning pipeline (see Pipeline).

If meanISR is not provided and the .metric field of the model is Fisher, logdet0 or Wasserstein, the tolerance of the iterative algorithm used to compute the mean is set to argument tol (default 1e-5). Also, in this case a particular initialization for those iterative algorithms can be provided as an Hermitian matrix with argument meanInit.

Euclidean ENLR models

ML models acting on the tangent space allows to fit a model passing as training data 𝐏Tr directly a matrix of feature vectors, where each feature vector is a row of the matrix. In this case none of the above keyword arguments are used.

The following optional keyword arguments act on any kind of input, that is, tangent vectors and generic feature vectors

If a UnitRange is passed with optional keyword argument vecRange, then if 𝐏Tr is a vector of Hermitian matrices, the vectorization of those matrices once they are projected onto the tangent space concerns only the rows (or columns) given in the specified range, else if 𝐏Tr is a matrix with feature vectors arranged in its rows, then only the columns of 𝐏Tr given in the specified range will be used. Argument vecRange will be ignored if a pre-conditioning pipeline is used and if the pipeline changes the dimension of the input matrices. In this case it will be set to its default value using the new dimension. You are not allowed to change this behavior.

With normalize the tangent (or feature) vectors can be normalized individually. Three functions can be passed, namely

demean! to remove the mean,
normalize! to fix the norm (default),
standardize! to fix the mean to zero and the standard deviation to 1.

As argument normalize you can also pass a 2-tuple of real numbers. In this case the numbers will be the lower and upper limit to bound the vectors within these limits - see rescale!.

Rescaling

If you wish to rescale, use (-1, 1), since tangent vectors of SPD matrices have positive and negative elements. If 𝐏Tr is a feature matrix and the features are only positive, use (0, 1) instead.

If you pass nothing as argument normalize, no normalization will be carried out.

The remaining optional keyword arguments, are

the arguments passed to the GLMNet.glmnet function for fitting the models. Those are always used.
the λSelMeth argument and the arguments passed to the GLMNet.glmnetcv function for finding the best lambda hyperparamater by cross-validation. Those are used only if fitType = :path or = :all.

Optional keyword arguments for fitting the model(s) using GLMNet.jl

alpha: the hyperparameter in $[0, 1]$ to trade-off an elestic-net model. α=0 requests a pure ridge model and α=1 a pure lasso model. This defaults to 1.0, which specifies a lasso model, unless the input ENLR model has another value in the alpha field, in which case this value is used. If argument alpha is passed here, it will overwrite the alpha field of the input model.

weights: a vector of weights for each matrix (or feature vectors) of the same size as yTr. It defaults to 1 for all matrices.

intercept: whether to fit an intercept term. The intercept is always unpenalized. Defaults to true.

If fitType = :best (default), a cross-validation procedure is run to find the best lambda hyperparameter for the given training data. This finds a single model that is written into the .best field of the ENLR structure that will be created.

If fitType = :path, the regularization path for several values of the lambda hyperparameter is found for the given training data. This creates several models, which are written into the .path field of the ENLR structure that will be created, none of which is optimal, in the cross-validation sense, for the given training data.

If fitType = :all, both the above fits are performed and all fields of the ENLR structure that will be created will be filled in.

penalty_factor: a vector of length n(n+1)/2, where n is the dimension of the original PD matrices on which the model is applied, of penalties for each predictor in the tangent vectors. This defaults to all ones, which weights each predictor equally. To specify that a predictor should be unpenalized, set the corresponding entry to zero.

constraints: an [n(n+1)/2] x 2 matrix specifying lower bounds (first column) and upper bounds (second column) on each predictor. By default, this is [-Inf Inf] for each predictor (each element of tangent vectors).

offset: see documentation of original GLMNet package 🎓.

dfmax: The maximum number of predictors in the largest model.

pmax: The maximum number of predictors in any model.

nlambda: The number of values of λ along the path to consider.

lambda_min_ratio: The smallest λ value to consider, as a ratio of the value of λ that gives the null model (i.e., the model with only an intercept). If the number of observations exceeds the number of variables, this defaults to 0.0001, otherwise 0.01.

lambda: The λ values to consider for fitting. By default, this is determined from nlambda and lambda_min_ratio.

maxit: The maximum number of iterations of the cyclic coordinate descent algorithm. If convergence is not achieved, a warning is returned.

algorithm: the algorithm used to find the regularization path. Possible values are :newtonraphson (default) and :modifiednewtonraphson.

For further informations on those arguments, refer to the resources on the GLMNet package 🎓.

Possible change of dimension

The provided arguments penalty_factor, constraints, dfmax, pmax and lambda_min_ratio will be ignored if a pre-conditioning pipeline is passed as argument and if the pipeline changes the dimension of the input matrices, thus of the tangent vectors. In this case they will be set to their default values using the new dimension. To force the use of the provided values instead, set checkArgs to false (true by default). Note however that in this case you must provide suitable values for all the abova arguments.

Optional Keyword arguments for finding the best model by cv

λSelMeth = :sd1 (default), the best model is defined as the one allowing the highest cvλ.meanloss within one standard deviation of the minimum, otherwise it is defined as the one allowing the minimum cvλ.meanloss. Note that in selecting a model, the model with only the intercept term, if it exists, is ignored. See ENLRmodel for a description of the .cvλ field of the model structure.

Arguments nfolds, folds and parallel are passed to the GLMNet.glmnetcv function along with the ⏩ argument. Please refer to the resources on GLMNet for details 🎓.

tol: Is the convergence criterion for both the computation of a mean for projecting onto the tangent space (if the metric requires an iterative algorithm) and for the GLMNet fitting algorithm. Defaults to 1e-5. In order to speed up computations, you may try to set a lower tol; The convergence will be faster but more coarse, with a possible drop of classification accuracy, depending on the signal-to-noise ratio of the input features.

If verbose is true (default), information is printed in the REPL.

The ⏩ argument (true by default) is passed to the tsMap function for projecting the matrices in 𝐏Tr onto the tangent space and to the GLMNet.glmnetcv function to run inner cross-validation to find the best model using multi-threading.

See: notation & nomenclature, the ℍVector type

See also: predict, crval

Tutorial: Examples using the ENLR model

Examples

using PosDefManifoldML, PosDefManifold

# Generate some data
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80, 0.1)

# Fit an ENLR lasso model and find the best model by cross-validation
m = fit(ENLR(), PTr, yTr)

# ... standardizing the tangent vectors
m = fit(ENLR(), PTr, yTr; pipeline=p, normalize=standardize!)

# ... balancing the weights for tangent space mapping
m = fit(ENLR(), PTr, yTr; w=tsWeights(yTr))

# ... using the log-Eucidean metric for tangent space projection
m = fit(ENLR(logEuclidean), PTr, yTr)

# Fit an ENLR ridge model and find the best model by cv:
m = fit(ENLR(Fisher), PTr, yTr; alpha=0)

# Fit an ENLR elastic-net model (α=0.9) and find the best model by cv:
m = fit(ENLR(Fisher), PTr, yTr; alpha=0.9)

# Fit an ENLR lasso model and its regularization path:
m = fit(ENLR(), PTr, yTr; fitType=:path)

# Fit an ENLR lasso model, its regularization path
# and the best model found by cv:
m = fit(ENLR(), PTr, yTr; fitType=:all)

# Fit using a pre-conditioning pipeline:
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
m = fit(ENLR(PosDefManifold.Euclidean), PTr, yTr; pipeline=p)

# Use a recentering pipeline and project the data
# onto the tangent space at the identity matrix.
# In this case the metric is irrilevant as the barycenter
# for determining the base point is not computed.
# Note that the previous call to 'fit' has modified `PTr`,
# so we generate new data.
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80, 0.1)
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
m = fit(ENLR(), PTr, yTr; pipeline=p, meanISR=I)

function fit(model     :: SVMmodel,
               𝐏Tr     :: Union{HermitianVector, Matrix{Float64}},
               yTr     :: IntVector=[];

	# pipeline (data transformations)
	pipeline    :: Union{Pipeline, Nothing} = nothing,

	# parameters for projection onto the tangent space
	w		:: Union{Symbol, Tuple, Vector} = Float64[],
	meanISR 	:: Union{Hermitian, Nothing, UniformScaling} = nothing,
	meanInit 	:: Union{Hermitian, Nothing} = nothing,
	vecRange	:: UnitRange = 𝐏Tr isa HermitianVector ? (1:size(𝐏Tr[1], 2)) : (1:size(𝐏Tr, 2)),
	normalize	:: Union{Function, Tuple, Nothing} = normalize!,

	# SVM parameters
	svmType 	:: Type = SVC,
	kernel 		:: LIBSVM.Kernel.KERNEL = Linear,
	epsilon 	:: Float64 = 0.1,
	cost		:: Float64 = 1.0,
	gamma 		:: Float64	= 1/_getDim(𝐏Tr, vecRange),
	degree 		:: Int64	= 3,
	coef0 		:: Float64	= 0.,
	nu 			:: Float64 = 0.5,
	shrinking 	:: Bool = true,
	probability	:: Bool = false,
	weights 	:: Union{Dict{Int, Float64}, Nothing} = nothing,
	cachesize 	:: Float64	= 200.0,
	checkArgs	:: Bool = true,

	# Generic and common parameters
	tol		:: Real = 1e-5,
	verbose		:: Bool = true,
	⏩		:: Bool = true)

Create and fit a 1-class or 2-class support vector machine (SVM) machine learning model, with training data 𝐏Tr, of type ℍVector, and corresponding labels yTr, of type IntVector. The label vector can be omitted if the svmType is OneClassSVM (see SVM). Return the fitted model as an instance of the SVM structure.

Class Labels

Labels must be provided using the natural numbers, i.e., 1 for the first class, 2 for the second class, etc.

As for all ML models acting in the tangent space, fitting an SVM model involves computing a mean (barycenter) of all the matrices in 𝐏Tr, projecting all matrices onto the tangent space after parallel transporting them at the identity matrix and vectorizing them using the vecP operation. Once this is done, the support-vector machine is fitted.

Optional keyword arguments

For the following keyword arguments see the documentation of the fit funtion for the ENLR (Elastic Net Logistic Regression) machine learning model:

pipeline, transform (pre-conditioning),
w, meanISR, meanInit, vecRange (tangent space projection),

Euclidean SVM models

normalize (tangent or feature vectors normalization).

Optional keyword arguments for fitting the model(s) using LIBSVM.jl

svmType and kernel allow to chose among several available SVM models. See the documentation of the SVM structure.

epsilon, with default 0.1, is the epsilon in loss function of the epsilonSVR SVM model.

cost, with default 1.0, is the cost parameter C of SVC, epsilonSVR, and nuSVR SVM models.

gamma, defaulting to 1 divided by the length of the tangent (or feature) vectors, is the γ parameter for RadialBasis, Polynomial and Sigmoid kernels. The provided argument gamma will be ignored if a pre-conditioning pipeline is passed as argument and if the pipeline changes the dimension of the input matrices, thus of the tangent vectors. In this case it will be set to its default value using the new dimension. To force the use of the provided gamma value instead, set checkArgs to false (true by default).

degree, with default 3, is the degree for Polynomial kernels

coef0, zero by default, is a parameter for the Sigmoid and Polynomial kernel.

nu, with default 0.5, is the parameter ν of nuSVC, OneClassSVM, and nuSVR SVM models. It should be in the interval (0, 1].

shrinking, true by default, sets whether to use the shrinking heuristics.

probability, false by default sets whether to train a SVC or SVR model allowing probability estimates.

if a Dict{Int, Float64} is passed as weights argument, it will be used to give weights to the classes. By default it is equal to nothing, implying equal weights to all classes.

cachesize for the kernel, 200.0 by defaut (in MB), can be increased for very large problems.

tol is the convergence criterion for both the computation of a mean for projecting onto the tangent space (if the metric requires an iterative algorithm) and for the LIBSVM fitting algorithm. Defaults to 1e-5.

If verbose is true (default), information is printed in the REPL. This option is included to allow repeated calls to this function without crowding the REPL.

The ⏩ argument (true by default) is passed to the tsMap function for projecting the matrices in 𝐏Tr onto the tangent space and to the LIBSVM function that perform the fit in order to run them in multi-threaded mode.

For further information on tho LIBSVM arguments, refer to the resources on the LIBSVM package 🎓.

See: notation & nomenclature, the ℍVector type

See also: predict, crval

Tutorial: Examples using SVM models

Examples

using PosDefManifoldML, PosDefManifold

# Generate some data
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80, 0.1);

# Fit a SVC SVM model and find the best model by cross-validation:
m = fit(SVM(), PTr, yTr)

# ... balancing the weights for tangent space mapping
m = fit(SVM(), PTr, yTr; w=:b)

# ... using the log-Eucidean metric for tangent space projection
m = fit(SVM(logEuclidean), PTr, yTr)

# ... using the polynomial kernel of degree 3 (default)
m = fit(SVM(logEuclidean), PTr, yTr, kernel=Polynomial)

# or

m = fit(SVM(logEuclidean; kernel=Polynomial), PTr, yTr)

# ... using the Nu-Support Vector Classification
m = fit(SVM(logEuclidean), PTr, yTr, kernel=Polynomial, svmtype=NuSVC)

# or

m = fit(SVM(logEuclidean; kernel=Polynomial, svmtype=NuSVC), PTr, yTr)

# N.B. all other keyword arguments must be passed to the fit function
# and not to the SVM constructor.

# Fit a SVC SVM model using a pre-conditioning pipeline:
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
m = fit(SVM(PosDefManifold.Euclidean), PTr, yTr; pipeline=p)

# Use a recentering pipeline and project the data
# onto the tangent space at the identity matrix.
# In this case the metric is irrilevant as the barycenter
# for determining the base point is not computed.
# Note that the previous call to 'fit' has modified `PTr`,
# so we generate new data.
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80, 0.1)
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
m = fit(SVM(), PTr, yTr; pipeline=p, meanISR=I)

StatsAPI.predict — Function

function predict(model  :: MDMmodel,
                 𝐏Te    :: ℍVector,
                 what   :: Symbol = :labels;
        pipeline    :: Union{Pipeline, Nothing} = nothing,
        verbose     :: Bool = true,
        ⏩          :: Bool = true)

Given an MDM model trained (fitted) on z classes and a testing set of k positive definite matrices 𝐏Te of type ℍVector:

if what is :labels or :l (default), return the predicted class labels for each matrix in 𝐏Te, as an IntVector. For MDM models, the predicted class 'label' of an unlabeled matrix is the serial number of the class whose mean is the closest to the matrix (minimum distance to mean). The labels are '1' for class 1, '2' for class 2, etc;

if what is :probabilities or :p, return the predicted probabilities for each matrix in 𝐏Te to belong to all classes, as a k-vector of z vectors holding reals in $[0, 1]$. The 'probabilities' are obtained passing to a softmax function the squared distances of each unlabeled matrix to all class means with inverted sign;

if what is :f or :functions, return the output function of the model as a k-vector of z vectors holding reals. The function of each element in 𝐏Te is the ratio of the squared distance from each class to the (scalar) geometric mean of the squared distances from all classes.

If verbose is true (default), information is printed in the REPL.

It f ⏩ is true (default), the computation of distances is multi-threaded.

Note that if the field pipeline of the provided model is not nothing, implying that a pre-conditioning pipeline has been fitted, the pipeline is applied to the data before to carry out the prediction. If you wish to adapt the pipeline to the testing data, just fit the pipeline to the testing data overwriting the model pipeline. This is useful in a cross-session and cross-subject setting.

Adapting the Pipeline

Be careful when adapting a pipeline; if a Recenter conditioner is included in the pipeline and dimensionality reduction was sought (parameter eVar different from nothing), then eVar must be set to an integer so that the dimension of the training ad testing data is the same after adaptation. See the example here below.

See: notation & nomenclature, the ℍVector type

See also: fit, crval, predictErr

Examples

using PosDefManifoldML, PosDefManifold

# Generate some data
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80)

# Craete and fit an MDM model
m = fit(MDM(Fisher), PTr, yTr)

# Predict labels
yPred = predict(m, PTe, :l)

# Prediction error
predErr = predictErr(yTe, yPred)

# Predict probabilities
predict(m, PTe, :p)

# Output functions
predict(m, PTe, :f)

# Using and adapting a pipeline

# get some random data and labels as an example
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80)

# For adaptation, we need to set `eVar` to an integer or to `nothing`.
# We will use the dimension determined on training data.
# Note that the adaptation does not work well if the class proportions
# of the training data is different from the class proportions of the test data.
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)

# Fit the model using the pre-conditioning pipeline
m = fit(MDM(), PTr, yTr; pipeline = p)

# Define the same pipeline with fixed dimensionality reduction parameter
p = @→ Recenter(; eVar=dim(m.pipeline)) Compress Shrink(Fisher; radius=0.02)

# Fit the pipeline to testing data (adapt):
predict(m, PTe, :l; pipeline=p) 

# Suppose we want to adapt recentering, but not shrinking, which also has a 
# learnable parameter. We would then use this pipeline instead:
p = deepcopy(m.pipeline)
p[1].eVar = dim(m.pipeline)

function predict(model   :: ENLRmodel,
		𝐏Te         :: Union{ℍVector, Matrix{Float64}},
		what        :: Symbol = :labels,
		fitType     :: Symbol = :best,
		onWhich     :: Int    = Int(fitType==:best);
    pipeline    :: Union{Pipeline, Nothing} = nothing,
    meanISR     :: Union{ℍ, Nothing, UniformScaling} = nothing,
    verbose     :: Bool = true,
    ⏩          :: Bool = true)

Given an ENLR model trained (fitted) on 2 classes and a testing set of k positive definite matrices 𝐏Te of type ℍVector,

if what is :labels or :l (default), return the predicted class labels for each matrix in 𝐏Te, as an IntVector. Those labels are '1' for class 1 and '2' for class 2;

if what is :probabilities or :p, return the predicted probabilities for each matrix in 𝐏Te to belong to each classe, as a k-vector of z vectors holding reals in [0, 1] (probabilities). The 'probabilities' are obtained passing to a softmax function the output of the ENLR model and zero;

if what is :f or :functions, return the output function of the model, which is the raw output of the ENLR model.

If fitType = :best (default), the best model that has been found by cross-validation is used for prediction.

If fitType = :path,

if onWhich is a valid serial number for a model in the model.path,

then this model is used for prediction,

if onWhich is zero, all models in the model.path will be used for predictions, thus the output will be multiplied by the number of models in model.path.

Argument onWhich has no effect if fitType = :best.

Nota Bene

By default, the fit function fits only the best model. If you want to use the fitType = :path option you need to invoke the fit function with optional keyword argument fitType=:path or fitType=:all. See the fit function for details.

Optional keyword argument meanISR can be used to specify the principal inverse square root (ISR) of a new mean to be used as base point for projecting the matrices in testing set 𝐏Te onto the tangent space. By default meanISR is equal to nothing, implying that the base point will be the mean used to fit the model. This corresponds to the classical 'training-test' mode of operation.

Passing with argument meanISR a new mean ISR allows the adaptation first described in Barachant et al. (2013)🎓. Typically meanISR is the ISR of the mean of the matrices in 𝐏Te or of a subset of them. Notice that this actually performs transfer learning by parallel transporting both the training and test data to the identity matrix as defined in Zanini et al. (2018) and later taken up in Rodrigues et al. (2019)🎓. You can aslo pass meanISR=I, in which case the base point is taken as the identity matrix. This is possible if the set 𝐏Te is centered to the identity, for instance, if a recentering pre-conditioner is included in a pipeline and the pipeline is adapted as well (see the example below).

If verbose is true (default), information is printed in the REPL. This option is included to allow repeated calls to this function without crowding the REPL.

If ⏩ = true (default) and 𝐏Te is an ℍVector type, the projection onto the tangent space is multi-threaded.

Note that if the field pipeline of the provided model is not nothing, implying that a pre-conditioning pipeline has been fitted during the fitting of the model, the pipeline is applied to the data before to carry out the prediction. If you wish to adapt the pipeline to the testing data, just pass the same pipeline as argument pipeline in this function.

Adapting the Pipeline

See: notation & nomenclature, the ℍVector type

See also: fit, crval, predictErr

Examples

using PosDefManifoldML, PosDefManifold

# Generate some data
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80)

# Fit an ENLR lasso model and find the best model by cv
m = fit(ENLR(Fisher), PTr, yTr)

# Predict labels from the best model
yPred = predict(m, PTe, :l)

# Prediction error
predErr = predictErr(yTe, yPred)

# Predict probabilities from the best model
predict(m, PTe, :p)

# Output functions from the best model
predict(m, PTe, :f)

# Fit a regularization path for an ENLR lasso model
m = fit(ENLR(Fisher), PTr, yTr; fitType=:path)

# Predict labels using a specific model
yPred = predict(m, PTe, :l, :path, 10)

# Predict labels for all models
yPred = predict(m, PTe, :l, :path, 0)

# Prediction error for all models
predErr = [predictErr(yTe, yPred[:, i]) for i=1:size(yPred, 2)]

# Predict probabilities from a specific model
predict(m, PTe, :p, :path, 12)

# Predict probabilities from all models
predict(m, PTe, :p, :path, 0)

# Output functions from specific model
predict(m, PTe, :f, :path, 3)

# Output functions for all models
predict(m, PTe, :f, :path, 0)

## Adapting the base point
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80)
m = fit(ENLR(Fisher), PTr, yTr)
predict(m, PTe, :l; meanISR=invsqrt(mean(Fisher, PTe)))

# Also using and adapting a pre-conditioning pipeline
# For adaptation, we need to set `eVar` to an integer or to `nothing`.
# We will use the dimension determined on training data.
# Note that the adaptation does not work well if the class proportions
# of the training data is different from the class proportions of the test data.
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)

# Fit the model using the pre-conditioning pipeline
m = fit(ENLR(), PTr, yTr; pipeline = p)

# Define the same pipeline with fixed dimensionality reduction parameter
p = @→ Recenter(; eVar=dim(m.pipeline)) Compress Shrink(Fisher; radius=0.02)

# Fit the pipeline to testing data (adapt) and use the identity matrix as base point:
predict(m, PTe, :l; pipeline=p, meanISR=I) 

# Suppose we want to adapt recentering, but not shrinking, which also has a 
# learnable parameter. We would then use this pipeline instead:
p = deepcopy(m.pipeline)
p[1].eVar = dim(m.pipeline)

function predict(model :: SVMmodel,
			𝐏Te	:: Union{ℍVector, Matrix{Float64}},
			what	:: Symbol = :labels;
		meanISR	:: Union{ℍ, Nothing, UniformScaling} = nothing,
		pipeline:: Union{Pipeline, Nothing} = nothing,
		verbose	:: Bool = true,
		⏩	:: Bool = true)

Compute predictions given an SVM model trained (fitted) on 2 classes and a testing set of k positive definite matrices 𝐏Te of type ℍVector.

For the meaning of arguments what, meanISR, pipeline and verbose, see the documentation of the predict function for the ENLR model.

If ⏩ = true (default) and 𝐏Te is an ℍVector type, the projection onto the tangent space will be multi-threaded. Also, the prediction of the LIBSVM.jl prediction function will be multi-threaded.

See: notation & nomenclature, the ℍVector type

See also: fit, crval, predictErr

Examples

see the examples for the predict function for the ENLR model; the syntax is identical, only the model used there has to be changed with a SVMmodel.

PosDefManifoldML.crval — Function

function crval(model    :: MLmodel,
               𝐏        :: ℍVector,
               y        :: IntVector;
            # Conditioners (Data transformation)
            pipeline        :: Union{Pipeline, Nothing} = nothing,
            # Cross-validation parameters
            nFolds          :: Int      = 8,           
            seed            :: Int      = 0,
            # Default performance metric and statistical test
            scoring         :: Symbol   = :b,
            hypTest         :: Union{Symbol, Nothing} = :Bayle,
            # arguments for this function
            verbose         :: Bool     = true,
            outModels       :: Bool     = false,
            ⏩              :: Bool     = true,
            BLAS_threads    :: Int = max(1, Threads.nthreads()-nFolds),
            # Optional arguments for the fit function of the `model`
            fitArgs...)

Stratified cross-validation accuracy for a machine learning model given an ℍVector $𝐏$ holding k Hermitian matrices and an IntVector y holding the k labels for these matrices. Return a CVres structure.

For each fold, a machine learning model is fitted on training data and labels are predicted on testing data. Summary classification performance statistics are stored in the output structure.

Optional keyword arguments

If a pipeline, of type Pipeline is provided, the pipeline is fitted on training data and applied for predicting the testing data.

nFolds, the number of folds, 8 by default.

If scoring=:b (default) the balanced accuracy is computed. Any other value will make the function returning the regular accuracy. Balanced accuracy is to be preferred for unbalanced classes. For balanced classes the balanced accuracy reduces to the regular accuracy, therefore there is no point in using regular accuracy, if not to avoid a few unnecessary computations when the classes are balanced.

hypTest can be nothing or a symbol specifying the kind of statistical test to be carried out. At the moment, only :Bayle is a possible symbol and this test is performed by default. Bayle's procedure tests whether the average observed binary error loss is inferior to what is to be expected by the hypothesis of random chance, which is set to $1-\frac{1}{z}$, where $z$ is the number of classes (see testCV).

Error loss and Bayle's test

Note that this function computes the error loss for each fold (see CVres). The average error loss is the complement of accuracy, not of balanced accuracy. If the classes are balanced and you use scoring=:a (accuracy), the average error loss within each fold is equal to 1 minus the average accuracy, which is also computed by this function. However, this is not true if the classes are unbalanced and you use scoring=:b (default). In this case the returned error loss and accuracy may appear incoherent. For the same reason, Bayle's test, which is based on the error loss, is a test on accuracy, not on balanced accuracy, regardless the value of scoring.

For the meaning of the seed argument (0 by default), see function cvSetup, to which this argument is passed internally.

If verbose is true (default), information is printed in the REPL.

If outModels is true, return a 2-tuple holding a CVres structure and a nFolds-vector of the model fitted for each fold, otherwise (default), return only a CVres structure.

If ⏩, the computations are multi-threaded across folds. It is true by default. Set it to false if there are problems in running this function and for debugging.

Multi-threading

If you run the cross-validation with independent threads per fold setting ⏩=true(default), the fit! and predict function that will be called within each fold will we run in single-threaded mode. Vice versa, if you pass ⏩=false, these two functions will be run in multi-threaded mode. This is done to avoid overshooting the number of threads to be activated. Because of this behavior, do not call this function in a multi-threaded loop.

fitArgs are optional keyword arguments that are passed to the fit function called for each fold of the cross-validation. For each machine learning model, all optional keyword arguments of their fit method are elegible to be passed here, however, the arguments listed in the following table for each model should not be passed. Note that if they are passed, they will be disabled:

MDM/MDMF	ENLR	SVM
`verbose`	`verbose`	`verbose`
`⏩`	`⏩`	`⏩`
`meanInit`	`meanInit`	`meanInit`
`meanISR`	`fitType`
	`offsets`
	`lambda`
	`folds`

If you pass the meanISR argument, this must be nothing (default) or I (the identity matrix). If you pass meanISR=I for a tangent space model, parallel transport of the points to the identity before projecting the points onto the tangent space will not be carried out. This can be used if a recentering conditioner is passed in the pipeline (see the fit method for the ENLR and SVM model).

Also, if you pass a w argument (weights for barycenter estimations), do not pass a vector of weights, just pass a symbol, e.g., w=:b for balancing weights.

See: notation & nomenclature, the ℍVector type

See also: fit, predict

Examples

using PosDefManifoldML, PosDefManifold, LinearAlgebra

# Generate some data
P, _dummyP, y, _dummyy = gen2ClassData(10, 60, 80, 30, 40, 0.2)

# Perform 10-fold cross-validation using the minimum distance to mean classifier
# adopting the Fisher-Rao (affine-invariant) metric (default)
cv = crval(MDM(), P, y)

# Adopting the log-Euclidean metric
cv = crval(MDM(logEuclidean), P, y)

# Apply a pre-conditioning pipeline to adopt the pseudo affine-invariant metric
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
cv = crval(MDM(Euclidean), P, y; pipeline = p)

# Apply a pre-conditioning pipeline and project the data 
# onto the tangent space at I without recentering the matrices.
# Note that this makes sense only for tangent space ML models.
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
cv = crval(ENLR(), P, y; pipeline = p, meanISR=I)

# Perform 10-fold cross-validation using the lasso logistic regression classifier
cv = crval(ENLR(Fisher), P, y)

# ...using the support-vector machine classifier
cv = crval(SVM(Fisher), P, y)

# ...with a Polynomial kernel of order 3 (default)
cv = crval(SVM(Fisher), P, y; kernel=Polynomial)

# Perform 8-fold cross-validation instead
cv = crval(SVM(Fisher), P, y; nFolds=8)

# ...balance the weights for tangent space projection
cv = crval(ENLR(Fisher), P, y; nFolds=8, w=:b)

# perform another cross-validation shuffling the folds
cv = crval(ENLR(Fisher), P, y; shuffle=true, nFolds=8, w=:b)

PosDefManifoldML.cvSetup — Function

function cvSetup(y          :: Vector{Int64},  
                 nFolds     :: Int64;           
                 seed       :: Int = 0)

Given a vector of labels y and the number of folds nFolds, this function generates indices for the cross-validation sets, organized by class.

The function performs a stratified cross-validation by maintaining the same class proportion across all folds as in y.

Each element is used exactly once as a test sample across all folds.

If seed=0 (default), the original sequence of indices before creating the cross-validation folds is preserved. If it is a positive number, the seed for shuffling the indices will be initialized to that number. Passing identical y and seed ensures the reproducibility of the generated folds across calls of this function.

This function is used in crval. It constitutes the fundamental basis to implement customized cross-validation procedures.

Return the 2-tuple (indTr, indTe) where:

indTr is an array of arrays where indTr[i][f] contains the training indices for class i in fold f
indTe is an array of arrays where indTe[i][f] contains the test indices for class i in fold f.

Each array is organized by class and then by fold, ensuring stratified sampling across the cross-validation sets.

Examples

using PosDefManifoldML, PosDefManifold

y = [1,1,1,1,2,2,2,2,2,2]

cvSetup(y, 2)
# returns:
# Training Arrays:
#   Class 1: Array{Int64}[[3, 4], [1, 2]]
#   Class 2: Array{Int64}[[4, 5, 6], [1, 2, 3]]
# Testing Arrays:
#   Class 1: Array{Int64}[[1, 2], [3, 4]]
#   Class 2: Array{Int64}[[1, 2, 3], [4, 5, 6]]

cvSetup(y, 2; seed=1234)
# returns:
# Training Arrays:
#   Class 1: Array{Int64}[[1, 4], [2, 3]]
#   Class 2: Array{Int64}[[4, 5, 6], [1, 2, 3]]]
# Testing Arrays:
#   Class 1: Array{Int64}[[2, 3], [1, 4]]
#   Class 2: Array{Int64}[[1, 2, 3], [4, 5, 6]]

cvSetup(y, 3)
# returns:
# Training Arrays:
#   Class 1: Array{Int64}[[2, 3], [1, 3, 4], [1, 2, 4]]
#   Class 2: Array{Int64}[[3, 4, 5, 6], [1, 2, 5, 6], [1, 2, 3, 4]]
# Testing Arrays:
#   Class 1: Array{Int64}[[1, 4], [2], [3]]
#   Class 2: Array{Int64}[[1, 2], [3, 4], [5, 6]]