cv.jl
This unit implements cross-validation procedures for estimating the accuracy and balanced accuracy of machine learning models. It also reports the documentation of the fit and predict functions, as they are common to all models.
Content
struct | description |
---|---|
CVres | Encapsulate the results of cross-validation procedures for estimating accuracy |
function | description |
---|---|
fit | Fit a machine learning model with training data |
predict | Given a fitted model, preidct labels, probabilities or scoring functions on test data |
crval | Perform a cross-validation and store accuracies, error losses, confusion matrices, the results of a statistical test and other informations |
cvSetup | Generate indexes for performing cross-validtions |
PosDefManifoldML.CVres
— Typestruct CVres <: CVresult
cvType :: String
scoring :: Union{String, Nothing}
modelType :: Union{String, Nothing}
predLabels :: Union{Vector{Vector{Vector{I}}}, Nothing} where I<:Int
losses :: Union{Vector{BitVector}, Nothing}
cnfs :: Union{Vector{Matrix{I}}, Nothing} where I<:Int
avgCnf :: Union{Matrix{T}, Nothing} where T<:Real
accs :: Union{Vector{T}, Nothing} where T<:Real
avgAcc :: Union{Real, Nothing}
stdAcc :: Union{Real, Nothing}
z :: Union{Real, Nothing}
p :: Union{Real, Nothing}
end
A call to crval
results in an instance of this structure.
Fields:
.cvTpe
is the type of cross-validation technique, given as a string (e.g., "10-fold").
.scoring
is the type of accuracy that is computed, given as a string. This is controlled when calling crval
. Currently accuracy and balanced accuracy are supported.
.modelType
is the type of the machine learning model used for performing the cross-validation, given as a string.
.predLabels
is an f
-vector of z
integer vectors holding the vectors of predicted labels. There is one vector for each fold (f
) and each containes as many vector as classes (z
), in turn each one containing the predicted labels for the trials.
.losses
is an f
-vector holding BitVector types (vectors of booleans), each holding the binary loss for a fold.
.cnfs
is an f
-vector of matrices holding the confusion matrices obtained at each fold of the cross-validation. These matrices holds frequencies (counts), that is, the sum of all elements equals the number of trials used for each fold.
.avgCnf
is the average confusion matrix of proportions across the folds of the cross-validation. This matrix holds proportions, that is, the sum of all elements equal 1.0.
.accs
is an f
-vector of real numbers holding the accuracies obtained at each fold of the cross-validation.
.avgAcc
is the average accuracy across the folds of the cross-validation.
.stdAcc
is the standard deviation of the accuracy across the folds of the cross-validation.
.z
is the test-statistic fot the hypothesis that the observed average error loss is inferior to the specified expected value.
.p
is the p-value of the above hypothesis test.
See crval
for more informations
StatsAPI.fit
— Functionfunction fit(model :: MDMmodel,
𝐏Tr :: ℍVector,
yTr :: IntVector;
pipeline :: Union{Pipeline, Nothing} = nothing,
w :: Vector = [],
✓w :: Bool = true,
meanInit :: Union{ℍVector, Nothing} = nothing,
tol :: Real = 1e-5,
verbose :: Bool = true,
⏩ :: Bool = true)
Fit an MDM
machine learning model, with training data 𝐏Tr
, of type ℍVector, and corresponding labels yTr
, of type IntVector. Return the fitted model.
Labels must be provided using the natural numbers, i.e., 1
for the first class, 2
for the second class, etc.
Fitting an MDM model involves only computing a mean (barycenter) of all the matrices in each class. Those class means are computed according to the metric specified by the MDM
constructor.
Optional keyword arguments:
If a pipeline
, of type Pipeline
is provided, all necessary parameters of the sequence of conditioners are fitted and all matrices are transformed according to the specified pipeline before fitting the ML model. The parameters are stored in the output ML model. Note that the fitted pipeline is automatically applied by any successive call to function predict
to which the output ML model is passed as argument.
w
is a vector of non-negative weights associated with the matrices in 𝐏Tr
. This weights are used to compute the mean for each class. See method (3) of the mean function for the meaning of the arguments w
, ✓w
and ⏩
, to which they are passed. Keep in mind that here the weights should sum up to 1 separatedly for each class, which is what is ensured by this function if ✓w
is true.
tol
is the tolerance required for those algorithms that compute the mean iteratively (they are those adopting the Fisher, logdet0 or Wasserstein metric). It defaults to 1e-5. For details on this argument see the functions that are called for computing the means (from package PosDefManifold.jl):
For those algorithm an initialization can be provided with optional keyword argument meanInit
. If provided, this must be a vector of Hermitian
matrices of the ℍVector type and must contain as many initializations as classes, in the natural order corresponding to the class labels (see above).
If verbose
is true (default), information is printed in the REPL.
See: notation & nomenclature, the ℍVector type
Examples
using PosDefManifoldML, PosDefManifold
# Generate some data
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80, 0.25)
# Create and fit a model:
m = fit(MDM(Fisher), PTr, yTr)
# Create and fit a model using a pre-conditioning pipeline:
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
m = fit(MDM(Fisher), PTr, yTr; pipeline=p)
function fit(model :: ENLRmodel,
𝐏Tr :: Union{HermitianVector, Matrix{Float64}},
yTr :: IntVector;
# pipeline (data transformations)
pipeline :: Union{Pipeline, Nothing} = nothing,
# parameters for projection onto the tangent space
w :: Union{Symbol, Tuple, Vector} = Float64[],
meanISR :: Union{Hermitian, Nothing, UniformScaling} = nothing,
meanInit :: Union{Hermitian, Nothing} = nothing,
vecRange :: UnitRange = 𝐏Tr isa ℍVector ? (1:size(𝐏Tr[1], 2)) : (1:size(𝐏Tr, 2)),
normalize :: Union{Function, Tuple, Nothing} = normalize!,
# arguments for `GLMNet.glmnet` function
alpha :: Real = model.alpha,
weights :: Vector{Float64} = ones(Float64, length(yTr)),
intercept :: Bool = true,
fitType :: Symbol = :best,
penalty_factor :: Vector{Float64} = ones(Float64, _getDim(𝐏Tr, vecRange)),
constraints :: Matrix{Float64} = [x for x in (-Inf, Inf), y in 1:_getDim(𝐏Tr, vecRange)],
offsets :: Union{Vector{Float64}, Nothing} = nothing,
dfmax :: Int = _getDim(𝐏Tr, vecRange),
pmax :: Int = min(dfmax*2+20, _getDim(𝐏Tr, vecRange)),
nlambda :: Int = 100,
lambda_min_ratio:: Real = (length(yTr)*2 < _getDim(𝐏Tr, vecRange) ? 1e-2 : 1e-4),
lambda :: Vector{Float64} = Float64[],
maxit :: Int = 1000000,
algorithm :: Symbol = :newtonraphson,
checkArgs :: Bool = true,
# selection method
λSelMeth :: Symbol = :sd1,
# arguments for `GLMNet.glmnetcv` function
nfolds :: Int = min(10, div(size(yTr, 1), 3)),
folds :: Vector{Int} =
begin
n, r = divrem(size(yTr, 1), nfolds)
shuffle!([repeat(1:nfolds, outer=n); 1:r])
end,
parallel :: Bool=true,
# Generic and common parameters
tol :: Real = 1e-5,
verbose :: Bool = true,
⏩ :: Bool = true,
)
Create and fit an 2-class elastic net logistic regression (ENLR
) machine learning model, with training data 𝐏Tr
, of type ℍVector, and corresponding labels yTr
, of type IntVector. Return the fitted model(s) as an instance of the ENLR
structure.
Labels must be provided using the natural numbers, i.e., 1
for the first class, 2
for the second class, etc.
As for all ML models acting in the tangent space, fitting an ENLR model involves computing a mean (barycenter) of all the matrices in 𝐏Tr
, projecting all matrices onto the tangent space after parallel transporting them at the identity matrix and vectorizing them using the vecP operation. Once this is done, the ENLR is fitted.
The mean is computed according to the .metric
field of the model
, with optional weights w
. The .metric
field of the model
is passed internally to the tsMap
function. By default the metric is the Fisher metric. See the examples here below to see how to change metric. See mdm.jl or check out directly the documentation of PosDefManifold.jl for the available metrics.
Optional keyword arguments
If a pipeline
, of type Pipeline
is provided, all necessary parameters of the sequence of conditioners are fitted and all matrices are transformed according to the specified pipeline before fitting the ML model. The parameters are stored in the output ML model. Note that the fitted pipeline is automatically applied by any successive call to function predict
to which the output ML model is passed as argument.
By default, uniform weights will be given to all observations for computing the mean to project the data in the tangent space. This is equivalent to passing as argument w=:uniform
(or w=:u
). You can also pass as argument:
w=:balanced
(or simplyw=:b
). If the two classes are unbalanced, the weights should better be inversely proportional to the number of examples for each class, in such a way that each class contributes equally to the computation of the mean. This is equivalent of passingw=tsWeights(yTr)
. See thetsWeights
function for details.w=v
, wherev
is a user defined vector of non-negative weights for the observations, thus,v
must contain the same number of elements asyTr
. For example,w=[1.0, 1.0, 2.0, 2.0, ...., 1.0]
w=t
, wheret
is a 2-tuple of real weights, one weight for each class, for examplew=(0.5, 1.5)
. This is equivalent to passingw=tsWeights(yTr; classWeights=collect(0.5, 1.5))
, see thetsWeights
function for details.
By default meanISR=nothing
and the inverse square root (ISR) of the mean used for projecting the matrices onto the tangent space (see tsMap
) is computed. An Hermitian matrix or I
(the identity matrix) can also be passed as argument meanISR
and in this case this matrix will be used as the ISR of the mean. Passed or computed, it will be written in the .meanISR
field of the model structure created by this function. Notice that passing I
, the matrices will be projected onto the tangent space at the identity without recentering them. This is possible if the matrices have been recentered bt a pre-conditioning pipeline (see Pipeline
).
If meanISR
is not provided and the .metric
field of the model
is Fisher, logdet0 or Wasserstein, the tolerance of the iterative algorithm used to compute the mean is set to argument tol
(default 1e-5). Also, in this case a particular initialization for those iterative algorithms can be provided as an Hermitian
matrix with argument meanInit
.
ML models acting on the tangent space allows to fit a model passing as training data 𝐏Tr
directly a matrix of feature vectors, where each feature vector is a row of the matrix. In this case none of the above keyword arguments are used.
The following optional keyword arguments act on any kind of input, that is, tangent vectors and generic feature vectors
If a UnitRange
is passed with optional keyword argument vecRange
, then if 𝐏Tr
is a vector of Hermitian
matrices, the vectorization of those matrices once they are projected onto the tangent space concerns only the rows (or columns) given in the specified range, else if 𝐏Tr
is a matrix with feature vectors arranged in its rows, then only the columns of 𝐏Tr
given in the specified range will be used. Argument vecRange
will be ignored if a pre-conditioning pipeline is used and if the pipeline changes the dimension of the input matrices. In this case it will be set to its default value using the new dimension. You are not allowed to change this behavior.
With normalize
the tangent (or feature) vectors can be normalized individually. Three functions can be passed, namely
demean!
to remove the mean,normalize!
to fix the norm (default),standardize!
to fix the mean to zero and the standard deviation to 1.
As argument normalize
you can also pass a 2-tuple of real numbers. In this case the numbers will be the lower and upper limit to bound the vectors within these limits - see rescale!
.
If you wish to rescale, use (-1, 1)
, since tangent vectors of SPD matrices have positive and negative elements. If 𝐏Tr
is a feature matrix and the features are only positive, use (0, 1)
instead.
If you pass nothing
as argument normalize
, no normalization will be carried out.
The remaining optional keyword arguments, are
the arguments passed to the
GLMNet.glmnet
function for fitting the models. Those are always used.the
λSelMeth
argument and the arguments passed to theGLMNet.glmnetcv
function for finding the best lambda hyperparamater by cross-validation. Those are used only iffitType
=:path
or =:all
.
Optional keyword arguments for fitting the model(s) using GLMNet.jl
alpha
: the hyperparameter in $[0, 1]$ to trade-off an elestic-net model. α=0 requests a pure ridge model and α=1 a pure lasso model. This defaults to 1.0, which specifies a lasso model, unless the input ENLR
model
has another value in the alpha
field, in which case this value is used. If argument alpha
is passed here, it will overwrite the alpha
field of the input model
.
weights
: a vector of weights for each matrix (or feature vectors) of the same size as yTr
. It defaults to 1 for all matrices.
intercept
: whether to fit an intercept term. The intercept is always unpenalized. Defaults to true.
If fitType
= :best
(default), a cross-validation procedure is run to find the best lambda hyperparameter for the given training data. This finds a single model that is written into the .best
field of the ENLR
structure that will be created.
If fitType
= :path
, the regularization path for several values of the lambda hyperparameter is found for the given training data. This creates several models, which are written into the .path
field of the ENLR
structure that will be created, none of which is optimal, in the cross-validation sense, for the given training data.
If fitType
= :all
, both the above fits are performed and all fields of the ENLR
structure that will be created will be filled in.
penalty_factor
: a vector of length n(n+1)/2, where n is the dimension of the original PD matrices on which the model is applied, of penalties for each predictor in the tangent vectors. This defaults to all ones, which weights each predictor equally. To specify that a predictor should be unpenalized, set the corresponding entry to zero.
constraints
: an [n(n+1)/2] x 2 matrix specifying lower bounds (first column) and upper bounds (second column) on each predictor. By default, this is [-Inf Inf] for each predictor (each element of tangent vectors).
offset
: see documentation of original GLMNet package 🎓.
dfmax
: The maximum number of predictors in the largest model.
pmax
: The maximum number of predictors in any model.
nlambda
: The number of values of λ along the path to consider.
lambda_min_ratio
: The smallest λ value to consider, as a ratio of the value of λ that gives the null model (i.e., the model with only an intercept). If the number of observations exceeds the number of variables, this defaults to 0.0001, otherwise 0.01.
lambda
: The λ values to consider for fitting. By default, this is determined from nlambda
and lambda_min_ratio
.
maxit
: The maximum number of iterations of the cyclic coordinate descent algorithm. If convergence is not achieved, a warning is returned.
algorithm
: the algorithm used to find the regularization path. Possible values are :newtonraphson
(default) and :modifiednewtonraphson
.
For further informations on those arguments, refer to the resources on the GLMNet package 🎓.
The provided arguments penalty_factor
, constraints
, dfmax
, pmax
and lambda_min_ratio
will be ignored if a pre-conditioning pipeline
is passed as argument and if the pipeline changes the dimension of the input matrices, thus of the tangent vectors. In this case they will be set to their default values using the new dimension. To force the use of the provided values instead, set checkArgs
to false (true by default). Note however that in this case you must provide suitable values for all the abova arguments.
Optional Keyword arguments for finding the best model by cv
λSelMeth
= :sd1
(default), the best model is defined as the one allowing the highest cvλ.meanloss
within one standard deviation of the minimum, otherwise it is defined as the one allowing the minimum cvλ.meanloss
. Note that in selecting a model, the model with only the intercept term, if it exists, is ignored. See ENLRmodel
for a description of the .cvλ
field of the model structure.
Arguments nfolds
, folds
and parallel
are passed to the GLMNet.glmnetcv
function along with the ⏩
argument. Please refer to the resources on GLMNet for details 🎓.
tol
: Is the convergence criterion for both the computation of a mean for projecting onto the tangent space (if the metric requires an iterative algorithm) and for the GLMNet fitting algorithm. Defaults to 1e-5. In order to speed up computations, you may try to set a lower tol
; The convergence will be faster but more coarse, with a possible drop of classification accuracy, depending on the signal-to-noise ratio of the input features.
If verbose
is true (default), information is printed in the REPL.
The ⏩
argument (true by default) is passed to the tsMap
function for projecting the matrices in 𝐏Tr
onto the tangent space and to the GLMNet.glmnetcv
function to run inner cross-validation to find the best
model using multi-threading.
See: notation & nomenclature, the ℍVector type
Tutorial: Examples using the ENLR model
Examples
using PosDefManifoldML, PosDefManifold
# Generate some data
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80, 0.1)
# Fit an ENLR lasso model and find the best model by cross-validation
m = fit(ENLR(), PTr, yTr)
# ... standardizing the tangent vectors
m = fit(ENLR(), PTr, yTr; pipeline=p, normalize=standardize!)
# ... balancing the weights for tangent space mapping
m = fit(ENLR(), PTr, yTr; w=tsWeights(yTr))
# ... using the log-Eucidean metric for tangent space projection
m = fit(ENLR(logEuclidean), PTr, yTr)
# Fit an ENLR ridge model and find the best model by cv:
m = fit(ENLR(Fisher), PTr, yTr; alpha=0)
# Fit an ENLR elastic-net model (α=0.9) and find the best model by cv:
m = fit(ENLR(Fisher), PTr, yTr; alpha=0.9)
# Fit an ENLR lasso model and its regularization path:
m = fit(ENLR(), PTr, yTr; fitType=:path)
# Fit an ENLR lasso model, its regularization path
# and the best model found by cv:
m = fit(ENLR(), PTr, yTr; fitType=:all)
# Fit using a pre-conditioning pipeline:
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
m = fit(ENLR(PosDefManifold.Euclidean), PTr, yTr; pipeline=p)
# Use a recentering pipeline and project the data
# onto the tangent space at the identity matrix.
# In this case the metric is irrilevant as the barycenter
# for determining the base point is not computed.
# Note that the previous call to 'fit' has modified `PTr`,
# so we generate new data.
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80, 0.1)
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
m = fit(ENLR(), PTr, yTr; pipeline=p, meanISR=I)
function fit(model :: SVMmodel,
𝐏Tr :: Union{HermitianVector, Matrix{Float64}},
yTr :: IntVector=[];
# pipeline (data transformations)
pipeline :: Union{Pipeline, Nothing} = nothing,
# parameters for projection onto the tangent space
w :: Union{Symbol, Tuple, Vector} = Float64[],
meanISR :: Union{Hermitian, Nothing, UniformScaling} = nothing,
meanInit :: Union{Hermitian, Nothing} = nothing,
vecRange :: UnitRange = 𝐏Tr isa HermitianVector ? (1:size(𝐏Tr[1], 2)) : (1:size(𝐏Tr, 2)),
normalize :: Union{Function, Tuple, Nothing} = normalize!,
# SVM parameters
svmType :: Type = SVC,
kernel :: Kernel.KERNEL = Linear,
epsilon :: Float64 = 0.1,
cost :: Float64 = 1.0,
gamma :: Float64 = 1/_getDim(𝐏Tr, vecRange),
degree :: Int64 = 3,
coef0 :: Float64 = 0.,
nu :: Float64 = 0.5,
shrinking :: Bool = true,
probability :: Bool = false,
weights :: Union{Dict{Int, Float64}, Nothing} = nothing,
cachesize :: Float64 = 200.0,
checkArgs :: Bool = true,
# Generic and common parameters
tol :: Real = 1e-5,
verbose :: Bool = true,
⏩ :: Bool = true)
Create and fit a 1-class or 2-class support vector machine (SVM
) machine learning model, with training data 𝐏Tr
, of type ℍVector, and corresponding labels yTr
, of type IntVector. The label vector can be omitted if the svmType
is OneClassSVM
(see SVM
). Return the fitted model as an instance of the SVM
structure.
Labels must be provided using the natural numbers, i.e., 1
for the first class, 2
for the second class, etc.
As for all ML models acting in the tangent space, fitting an SVM model involves computing a mean (barycenter) of all the matrices in 𝐏Tr
, projecting all matrices onto the tangent space after parallel transporting them at the identity matrix and vectorizing them using the vecP operation. Once this is done, the support-vector machine is fitted.
Optional keyword arguments
For the following keyword arguments see the documentation of the fit
funtion for the ENLR (Elastic Net Logistic Regression) machine learning model:
pipeline
(pre-conditioning),w
,meanISR
,meanInit
,vecRange
(tangent space projection),
ML models acting on the tangent space allows to fit a model passing as training data 𝐏Tr
directly a matrix of feature vectors, where each feature vector is a row of the matrix. In this case none of the above keyword arguments are used.
normalize
(tangent or feature vectors normalization).
Optional keyword arguments for fitting the model(s) using LIBSVM.jl
svmType
and kernel
allow to chose among several available SVM models. See the documentation of the SVM
structure.
epsilon
, with default 0.1, is the epsilon in loss function of the epsilonSVR
SVM model.
cost
, with default 1.0, is the cost parameter C of SVC
, epsilonSVR
, and nuSVR
SVM models.
gamma
, defaulting to 1 divided by the length of the tangent (or feature) vectors, is the γ parameter for RadialBasis
, Polynomial
and Sigmoid
kernels. The provided argument gamma
will be ignored if a pre-conditioning pipeline
is passed as argument and if the pipeline changes the dimension of the input matrices, thus of the tangent vectors. In this case it will be set to its default value using the new dimension. To force the use of the provided gamma
value instead, set checkArgs
to false (true by default).
degree
, with default 3, is the degree for Polynomial
kernels
coef0
, zero by default, is a parameter for the Sigmoid
and Polynomial
kernel.
nu
, with default 0.5, is the parameter ν of nuSVC
, OneClassSVM
, and nuSVR
SVM models. It should be in the interval (0, 1].
shrinking
, true by default, sets whether to use the shrinking heuristics.
probability
, false by default sets whether to train a SVC
or SVR
model allowing probability estimates.
if a Dict{Int, Float64}
is passed as weights
argument, it will be used to give weights to the classes. By default it is equal to nothing
, implying equal weights to all classes.
cachesize
for the kernel, 200.0 by defaut (in MB), can be increased for very large problems.
tol
is the convergence criterion for both the computation of a mean for projecting onto the tangent space (if the metric requires an iterative algorithm) and for the LIBSVM fitting algorithm. Defaults to 1e-5.
If verbose
is true (default), information is printed in the REPL. This option is included to allow repeated calls to this function without crowding the REPL.
The ⏩
argument (true by default) is passed to the tsMap
function for projecting the matrices in 𝐏Tr
onto the tangent space and to the LIBSVM function that perform the fit in order to run them in multi-threaded mode.
For further information on tho LIBSVM arguments, refer to the resources on the LIBSVM package 🎓.
See: notation & nomenclature, the ℍVector type
Tutorial: Examples using SVM models
Examples
using PosDefManifoldML, PosDefManifold
# Generate some data
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80, 0.1);
# Fit a SVC SVM model and find the best model by cross-validation:
m = fit(SVM(), PTr, yTr)
# ... balancing the weights for tangent space mapping
m = fit(SVM(), PTr, yTr; w=:b)
# ... using the log-Eucidean metric for tangent space projection
m = fit(SVM(logEuclidean), PTr, yTr)
# ... using the linear kernel
m = fit(SVM(logEuclidean), PTr, yTr, kernel=Linear)
# or
m = fit(SVM(logEuclidean; kernel=Linear), PTr, yTr)
# ... using the Nu-Support Vector Classification
m = fit(SVM(logEuclidean), PTr, yTr, kernel=Linear, svmtype=NuSVC)
# or
m = fit(SVM(logEuclidean; kernel=Linear, svmtype=NuSVC), PTr, yTr)
# N.B. all other keyword arguments must be passed to the fit function
# and not to the SVM constructor.
# Fit a SVC SVM model using a pre-conditioning pipeline:
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
m = fit(SVM(PosDefManifold.Euclidean), PTr, yTr; pipeline=p)
# Use a recentering pipeline and project the data
# onto the tangent space at the identity matrix.
# In this case the metric is irrilevant as the barycenter
# for determining the base point is not computed.
# Note that the previous call to 'fit' has modified `PTr`,
# so we generate new data.
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80, 0.1)
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
m = fit(SVM(), PTr, yTr; pipeline=p, meanISR=I)
StatsAPI.predict
— Functionfunction predict(model :: MDMmodel,
𝐏Te :: ℍVector,
what :: Symbol = :labels;
pipeline :: Union{Pipeline, Nothing} = nothing,
verbose :: Bool = true,
⏩ :: Bool = true)
Given an MDM
model
trained (fitted) on z classes and a testing set of k positive definite matrices 𝐏Te
of type ℍVector:
if what
is :labels
or :l
(default), return the predicted class labels for each matrix in 𝐏Te
, as an IntVector. For MDM models, the predicted class 'label' of an unlabeled matrix is the serial number of the class whose mean is the closest to the matrix (minimum distance to mean). The labels are '1' for class 1, '2' for class 2, etc;
if what
is :probabilities
or :p
, return the predicted probabilities for each matrix in 𝐏Te
to belong to all classes, as a k-vector of z vectors holding reals in $[0, 1]$. The 'probabilities' are obtained passing to a softmax function the squared distances of each unlabeled matrix to all class means with inverted sign;
if what
is :f
or :functions
, return the output function of the model as a k-vector of z vectors holding reals. The function of each element in 𝐏Te
is the ratio of the squared distance from each class to the (scalar) geometric mean of the squared distances from all classes.
If verbose
is true (default), information is printed in the REPL.
It f ⏩
is true (default), the computation of distances is multi-threaded.
Note that if the field pipeline
of the provided model
is not nothing
, implying that a pre-conditioning pipeline has been fitted, the pipeline is applied to the data before to carry out the prediction. If you wish to adapt the pipeline to the testing data, just fit the pipeline to the testing data overwriting the model pipeline. This is useful in a cross-session and cross-subject setting.
Be careful when adapting a pipeline; if a Recenter
conditioner is included in the pipeline and dimensionality reduction was sought (parameter eVar
different from nothing
), then eVar
must be set to an integer so that the dimension of the training ad testing data is the same after adaptation. See the example here below.
See: notation & nomenclature, the ℍVector type
See also: fit
, crval
, predictErr
Examples
using PosDefManifoldML, PosDefManifold
# Generate some data
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80)
# Craete and fit an MDM model
m = fit(MDM(Fisher), PTr, yTr)
# Predict labels
yPred = predict(m, PTe, :l)
# Prediction error
predErr = predictErr(yTe, yPred)
# Predict probabilities
predict(m, PTe, :p)
# Output functions
predict(m, PTe, :f)
# Using and adapting a pipeline
# get some random data and labels as an example
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80)
# For adaptation, we need to set `eVar` to an integer or to `nothing`.
# We will use the dimension determined on training data.
# Note that the adaptation does not work well if the class proportions
# of the training data is different from the class proportions of the test data.
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
# Fit the model using the pre-conditioning pipeline
m = fit(MDM(), PTr, yTr; pipeline = p)
# Define the same pipeline with fixed dimensionality reduction parameter
p = @→ Recenter(; eVar=dim(m.pipeline)) Compress Shrink(Fisher; radius=0.02)
# Fit the pipeline to testing data (adapt):
predict(m, PTe, :l; pipeline=p)
# Suppose we want to adapt recentering, but not shrinking, which also has a
# learnable parameter. We would then use this pipeline instead:
p = deepcopy(m.pipeline)
p[1].eVar = dim(m.pipeline)
function predict(model :: ENLRmodel,
𝐏Te :: Union{ℍVector, Matrix{Float64}},
what :: Symbol = :labels,
fitType :: Symbol = :best,
onWhich :: Int = Int(fitType==:best);
pipeline :: Union{Pipeline, Nothing} = nothing,
meanISR :: Union{ℍ, Nothing, UniformScaling} = nothing,
verbose :: Bool = true,
⏩ :: Bool = true)
Given an ENLR
model
trained (fitted) on 2 classes and a testing set of k positive definite matrices 𝐏Te
of type ℍVector,
if what
is :labels
or :l
(default), return the predicted class labels for each matrix in 𝐏Te
, as an IntVector. Those labels are '1' for class 1 and '2' for class 2;
if what
is :probabilities
or :p
, return the predicted probabilities for each matrix in 𝐏Te
to belong to each classe, as a k-vector of z vectors holding reals in [0, 1] (probabilities). The 'probabilities' are obtained passing to a softmax function the output of the ENLR model and zero;
if what
is :f
or :functions
, return the output function of the model, which is the raw output of the ENLR model.
If fitType
= :best
(default), the best model that has been found by cross-validation is used for prediction.
If fitType
= :path
,
- if
onWhich
is a valid serial number for a model in themodel.path
,
then this model is used for prediction,
- if
onWhich
is zero, all models in themodel.path
will be used for predictions, thus the output will be multiplied by the number of models inmodel.path
.
Argument onWhich
has no effect if fitType
= :best
.
Optional keyword argument meanISR
can be used to specify the principal inverse square root (ISR) of a new mean to be used as base point for projecting the matrices in testing set 𝐏Te
onto the tangent space. By default meanISR
is equal to nothing, implying that the base point will be the mean used to fit the model. This corresponds to the classical 'training-test' mode of operation.
Passing with argument meanISR
a new mean ISR allows the adaptation first described in Barachant et al. (2013)🎓. Typically meanISR
is the ISR of the mean of the matrices in 𝐏Te
or of a subset of them. Notice that this actually performs transfer learning by parallel transporting both the training and test data to the identity matrix as defined in Zanini et al. (2018) and later taken up in Rodrigues et al. (2019)🎓. You can aslo pass meanISR=I
, in which case the base point is taken as the identity matrix. This is possible if the set 𝐏Te
is centered to the identity, for instance, if a recentering pre-conditioner is included in a pipeline and the pipeline is adapted as well (see the example below).
If verbose
is true (default), information is printed in the REPL. This option is included to allow repeated calls to this function without crowding the REPL.
If ⏩ = true (default) and 𝐏Te
is an ℍVector type, the projection onto the tangent space is multi-threaded.
Note that if the field pipeline
of the provided model
is not nothing
, implying that a pre-conditioning pipeline has been fitted during the fitting of the model, the pipeline is applied to the data before to carry out the prediction. If you wish to adapt the pipeline to the testing data, just pass the same pipeline as argument pipeline
in this function.
Be careful when adapting a pipeline; if a Recenter
conditioner is included in the pipeline and dimensionality reduction was sought (parameter eVar
different from nothing
), then eVar
must be set to an integer so that the dimension of the training ad testing data is the same after adaptation. See the example here below.
See: notation & nomenclature, the ℍVector type
See also: fit
, crval
, predictErr
Examples
using PosDefManifoldML, PosDefManifold
# Generate some data
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80)
# Fit an ENLR lasso model and find the best model by cv
m = fit(ENLR(Fisher), PTr, yTr)
# Predict labels from the best model
yPred = predict(m, PTe, :l)
# Prediction error
predErr = predictErr(yTe, yPred)
# Predict probabilities from the best model
predict(m, PTe, :p)
# Output functions from the best model
predict(m, PTe, :f)
# Fit a regularization path for an ENLR lasso model
m = fit(ENLR(Fisher), PTr, yTr; fitType=:path)
# Predict labels using a specific model
yPred = predict(m, PTe, :l, :path, 10)
# Predict labels for all models
yPred = predict(m, PTe, :l, :path, 0)
# Prediction error for all models
predErr = [predictErr(yTe, yPred[:, i]) for i=1:size(yPred, 2)]
# Predict probabilities from a specific model
predict(m, PTe, :p, :path, 12)
# Predict probabilities from all models
predict(m, PTe, :p, :path, 0)
# Output functions from specific model
predict(m, PTe, :f, :path, 3)
# Output functions for all models
predict(m, PTe, :f, :path, 0)
## Adapting the base point
PTr, PTe, yTr, yTe = gen2ClassData(10, 30, 40, 60, 80)
m = fit(ENLR(Fisher), PTr, yTr)
predict(m, PTe, :l; meanISR=invsqrt(mean(Fisher, PTe)))
# Also using and adapting a pre-conditioning pipeline
# For adaptation, we need to set `eVar` to an integer or to `nothing`.
# We will use the dimension determined on training data.
# Note that the adaptation does not work well if the class proportions
# of the training data is different from the class proportions of the test data.
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
# Fit the model using the pre-conditioning pipeline
m = fit(ENLR(), PTr, yTr; pipeline = p)
# Define the same pipeline with fixed dimensionality reduction parameter
p = @→ Recenter(; eVar=dim(m.pipeline)) Compress Shrink(Fisher; radius=0.02)
# Fit the pipeline to testing data (adapt) and use the identity matrix as base point:
predict(m, PTe, :l; pipeline=p, meanISR=I)
# Suppose we want to adapt recentering, but not shrinking, which also has a
# learnable parameter. We would then use this pipeline instead:
p = deepcopy(m.pipeline)
p[1].eVar = dim(m.pipeline)
function predict(model :: SVMmodel,
𝐏Te :: Union{ℍVector, Matrix{Float64}},
what :: Symbol = :labels;
meanISR :: Union{ℍ, Nothing, UniformScaling} = nothing,
pipeline:: Union{Pipeline, Nothing} = nothing,
verbose :: Bool = true,
⏩ :: Bool = true)
Compute predictions given an SVM
model
trained (fitted) on 2 classes and a testing set of k positive definite matrices 𝐏Te
of type ℍVector.
For the meaning of arguments what
, meanISR
, pipeline
and verbose
, see the documentation of the predict
function for the ENLR model.
If ⏩ = true (default) and 𝐏Te
is an ℍVector type, the projection onto the tangent space will be multi-threaded. Also, the prediction of the LIBSVM.jl prediction function will be multi-threaded.
See: notation & nomenclature, the ℍVector type
See also: fit
, crval
, predictErr
Examples
see the examples for the predict
function for the ENLR model; the syntax is identical, only the model used there has to be changed with a SVMmodel
.
PosDefManifoldML.crval
— Functionfunction crval(model :: MLmodel,
𝐏 :: ℍVector,
y :: IntVector;
pipeline :: Union{Pipeline, Nothing} = nothing,
nFolds :: Int = min(10, length(y)÷3),
shuffle :: Bool = false,
scoring :: Symbol = :b,
hypTest :: Union{Symbol, Nothing} = :Bayle,
verbose :: Bool = true,
outModels :: Bool = false,
⏩ :: Bool = true,
fitArgs...)
Stratified cross-validation accuracy for a machine learning model
given an ℍVector $𝐏$ holding k Hermitian matrices and an IntVector y
holding the k labels for these matrices. Return a CVres
structure.
For each fold, a machine learning model is fitted on training data and labels are predicted on testing data. Summary classification performance statistics are stored in the output structure.
Optional keyword arguments
If a pipeline
, of type Pipeline
is provided, the pipeline is fitted on training data and applied for predicting the testing data.
nFolds
by default is set to the minimum between 10 and the number of observation ÷ 3 (integer division).
If scoring
=:b (default) the balanced accuracy is computed. Any other value will make the function returning the regular accuracy. Balanced accuracy is to be preferred for unbalanced classes. For balanced classes the balanced accuracy reduces to the regular accuracy, therefore there is no point in using regular accuracy if not to avoid a few unnecessary computations when the class are balanced.
Note that this function computes the error loss for each fold (see CVres
). The average error loss is the complement of accuracy, not of balanced accuracy. If the classes are balanced and you use scoring
=:a (accuracy), the average error loss within each fold is equal to 1 minus the average accuracy, which is also computed by this function. However, this is not true if the classes are unbalanced and you use scoring
=:b (default). In this case the returned error loss and accuracy may appear incoherent.
hypTest
can be nothing
or a symbol specifying the kind of statistical test to be carried out. At the moment, only :Bayle
is a possible symbol and this test is performed by default. Bayle's procedure tests whether the average observed binary error loss is inferior to what is to be expected by the hypothesis of random chance, which is set to $1-\frac{1}{z}$, where $z$ is the number of classes (see testCV
).
For the meaning of the shuffle
argument (false by default), see function cvSetup
, to which this argument is passed internally.
For the meaning of the seed
argument (1234 by default), see function cvSetup
, to which this argument is passed internally.
If verbose
is true (default), information is printed in the REPL.
If outModels
is true, return a 2-tuple holding a CVres
structure and a nFolds
-vector of the model fitted for each fold, otherwise (default), return only a CVres
structure.
If ⏩
the computations are multi-threaded across folds. It is true by default. Set it to false if there are problems in running this function and for debugging.
If you run the cross-validation with independent threads per fold setting ⏩=true
(default), the fit!
and predict
function that will be called within each fold will we run in single-threaded mode. Vice versa, if you pass ⏩=false
, these two functions will be run in multi-threaded mode. This is done to avoid overshooting the number of threads to be activated.
fitArgs
are optional keyword arguments that are passed to the fit
function called for each fold of the cross-validation. For each machine learning model, all optional keyword arguments of their fit method are elegible to be passed here, however, the arguments listed in the following table for each model should not be passed. Note that if they are passed, they will be disabled:
MDM/MDMF | ENLR | SVM |
---|---|---|
verbose | verbose | verbose |
⏩ | ⏩ | ⏩ |
meanInit | meanInit | meanInit |
meanISR | fitType | |
offsets | ||
lambda | ||
folds |
If you pass the meanISR
argument, this must be nothing (default) or I (the identity matrix). If you pass meanISR=I
for a tangent space model, parallel transport of the points to the identity before projecting the points onto the tangent space will not be carried out. This can be used if a recentering conditioner is passed in the pipeline
(see the fit
method for the ENLR and SVM model).
Also, if you pass a w
argument (weights for barycenter estimations), do not pass a vector of weights, just pass a symbol, e.g., w=:b
for balancing weights.
See: notation & nomenclature, the ℍVector type
Examples
using PosDefManifoldML, PosDefManifold
# Generate some data
P, _dummyP, y, _dummyy = gen2ClassData(10, 60, 80, 30, 40, 0.2)
# Perform 10-fold cross-validation using the minimum distance to mean classifier
cv = crval(MDM(Fisher), P, y)
# Do the same applying a pre-conditioning pipeline
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
cv = crval(MDM(Fisher), P, y; pipeline = p)
# Apply a pre-conditioning pipeline and project the data
# onto the tangent space at I without recentering the matrices.
# Note that this makes sense only for tangent space ML models.
p = @→ Recenter(; eVar=0.999) Compress Shrink(Fisher; radius=0.02)
cv = crval(ENLR(Fisher), P, y; pipeline = p, meanISR=I)
# Perform 10-fold cross-validation using the lasso logistic regression classifier
cv = crval(ENLR(Fisher), P, y)
# ...using the support-vector machine classifier
cv = crval(SVM(Fisher), P, y)
# ...with a Polynomial kernel of order 3 (default)
cv = crval(SVM(Fisher), P, y; kernel=kernel.Polynomial)
# Perform 8-fold cross-validation instead
# (and see that you can go pretty fast if your PC has 8 threads)
cv = crval(SVM(Fisher), P, y; nFolds=8)
# ...balance the weights for tangent space projection
cv = crval(ENLR(Fisher), P, y; nFolds=8, w=:b)
# perform another cross-validation shuffling the folds
cv = crval(ENLR(Fisher), P, y; shuffle=true, nFolds=8, w=:b)
PosDefManifoldML.cvSetup
— Functionfunction cvSetup(y :: Vector{Int64},
nCV :: Int64;
shuffle :: Bool = false,
seed :: Int = 1234)
Given a vector of labels y
and a parameter nCV
, this function generates indices for nCV-fold cross-validation sets, organized by class.
The function performs a stratified cross-validation by maintaining the same class distribution across all folds. This ensures that each fold contains approximately the same proportion of samples from each class as in the complete dataset.
Each element is used exactly once as a test sample across all folds, ensuring that the entire dataset is covered.
The shuffle
parameter controls whether the indices within each class are randomized. When shuffle
is false (default), the original sequence of indices is preserved, ensuring consistent results across multiple executions.
When shuffle
, is true the indices within each class are randomly permuted before creating the cross-validation folds. Randomization is controlled by the seed
parameter (default: 1234). Using the same seed
value generates identical cross-validation sets. Using different seed
values produce different random partitions.
This combination of shuffle
and seed
parameters allows you to generate reproducible random splits for consistent experimentation, create different random partitions to assess the robustness of your results and maintain exact reproducibility of your cross-validation experiments.
This function is used in crval
. It constitutes the fundamental basis to implement customized cross-validation procedures.
Return the 2-tuple (indTr, indTe) where:
- indTr is an array of arrays where indTr[i][f] contains the training indices for class i in fold f
- indTe is an array of arrays where indTe[i][f] contains the test indices for class i in fold f
Each array is organized by class and then by fold, ensuring stratified sampling across the cross-validation sets.
Examples
using PosDefManifoldML, PosDefManifold
y = [1,1,1,1,2,2,2,2,2,2]
cvSetup(y, 2)
# returns:
# Training Arrays:
# Class 1: Array{Int64}[[3, 4], [1, 2]]
# Class 2: Array{Int64}[[4, 5, 6], [1, 2, 3]]
# Testing Arrays:
# Class 1: Array{Int64}[[1, 2], [3, 4]]
# Class 2: Array{Int64}[[1, 2, 3], [4, 5, 6]]
cvSetup(y, 2; shuffle=true, seed=1)
# returns:
# Training Arrays:
# Class 1: Array{Int64}[[1, 4], [2, 3]]
# Class 2: Array{Int64}[[1, 3, 4], [2, 5, 6]]
# Testing Arrays:
# Class 1: Array{Int64}[[2, 3], [1, 4]]
# Class 2: Array{Int64}[[2, 5, 6], [1, 3, 4]]
cvSetup(y, 3)
# returns:
# Training Arrays:
# Class 1: Array{Int64}[[2, 3], [1, 3, 4], [1, 2, 4]]
# Class 2: Array{Int64}[[3, 4, 5, 6], [1, 2, 5, 6], [1, 2, 3, 4]]
# Testing Arrays:
# Class 1: Array{Int64}[[1, 4], [2], [3]]
# Class 2: Array{Int64}[[1, 2], [3, 4], [5, 6]]