riemannianGeometry.jl
This is the fundamental unit of PosDefManifold. It contains functions for manipulating points in the Riemannian manifold of Symmetric Positive Definite (SPD) or Hermitian Positive Definite (HPD) matrices. In Julia those are Hermitian
matrices, see typecasting matrices.
The functions are divided in six categories:
Category | Output |
---|---|
1. Geodesic equations | interpolation, extrapolation, weighted mean of two matrices, ... |
2. Distances | length of geodesics |
3. Graphs and Laplacians | inter-distance matrices, spectral embedding, eigenmaps, ... |
4. Means | mid-points of geodesics, Fréchet means of several points, midrange,... |
5. Tangent Space operations | maps from the manifold to the tangent space and viceversa, parallel transport,... |
6. Procrustes problems | data matching, transfer learning (domain adaptation), ... |
⋅
Geodesic equations
Function | Description |
---|---|
geodesic | Geodesic equations (weighted mean of two positive definite matrices) for any metric |
⋅
PosDefManifold.geodesic
— Function(1) geodesic(metric::Metric, P::ℍ{T}, Q::ℍ{T}, a::Real) where T<:RealOrComplex
(2) geodesic(metric::Metric, D::𝔻{S}, E::𝔻{S}, a::Real) where S<:Real
(1) Move along the geodesic from point $P$ to point $Q$ (two positive definite matrices) with arclegth $0<=a<=1$, using the specified metric, of type Metric::Enumerated type.
For all metrics,
- with $a=0$ we stay at $P$,
- with $a=1$ we move up to $Q$,
- with $a=1/2$ we move to the mid-point of $P$ and $Q$ (mean).
Using the Fisher metric, argument $a$ can be any real number, for instance:
- with $0<a<1$ we move toward $Q$ (attraction),
- with $a>1$ we move over and beyond $Q$ (extrapolation),
- with $a<0$ we move back away from Q (repulsion).
$P$ and $Q$ must be flagged by julia as Hermitian
. See typecasting matrices.
The Fisher geodesic move is computed by the Cholesky-Schur algorithm given in Eq. 4.2 by Iannazzo(2016)🎓. If $Q=I$, the Fisher geodesic move is simply $P^a$ (no need to call this funtion).
For the logdet zero and Jeffrey metric no closed form expression for the geodesic is available to the best of authors' knowledge, so in this case the geodesic is found as the weighted mean using the mean
function. For the Von Neumann not even an expression for the mean is available, so in this case the geodesic is not provided and a warning is printed.
(2) Like in (1), but for two real positive definite diagonal matrices $D$ and $E$.
Maths
For points $P$, $Q$ and arclength $a$, letting $b=1-a$, the geodesic equations for the supported metrics are:
Metric | geodesic equation |
---|---|
Euclidean | $bP + aQ$ |
invEuclidean | $\big(bP^{-1} + aQ^{-1}\big)^{-1}$ |
ChoEuclidean | $TT^*$, where $T=bL_P + aL_Q$ |
logEuclidean | $\text{exp}\big(b\hspace{2pt}\text{log}(P) + a\hspace{2pt}\text{log}(Q)\big)$ |
logCholesky | $TT^*$, where $T=S_P+a(S_Q-S_P)+D_P\hspace{2pt}\text{exp}\big(a(\text{log}D_Q-\text{log}D_P)\big)$ |
Fisher | $P^{1/2} \big(P^{-1/2} Q P^{-1/2}\big)^a P^{1/2}$ |
logdet0 | uses weighted mean algorithm logdet0Mean |
Jeffrey | uses weighted mean mean |
VonNeumann | N.A. |
Wasserstein | $b^2P+a^2Q +ab\big[(PQ)^{1/2} +(QP)^{1/2}\big]$ |
legend: $L_X$, $S_X$ and $D_X$ are the Cholesky lower triangle of $X$, its strictly lower triangular part and diagonal part, respectively (hence, $S_X+D_X=L_X$, $L_XL_X^*=X$).
See also: mean
.
Examples
using PosDefManifold
P=randP(10)
Q=randP(10)
# Wasserstein mean
M=geodesic(Wasserstein, P, Q, 0.5)
# extrapolate suing the Fisher metric
E=geodesic(Fisher, P, Q, 2)
Distances
Function | Description |
---|---|
distanceSqr , distance² | Squared distance between positive definite matrices |
distance | Distance between positive definite matrices |
⋅
PosDefManifold.distanceSqr
— Function(1) distanceSqr(metric::Metric, P::ℍ{T}) where T<:RealOrComplex
(2) distanceSqr(metric::Metric, P::ℍ{T}, Q::ℍ{T}) where T<:RealOrComplex
(3) distanceSqr(metric::Metric, D::𝔻{S}) where S<:Real
(4) distanceSqr(metric::Metric, D::𝔻{S}, E::𝔻{S}) where S<:Real
alias: distance²
(1) Return $δ^2(P, I)$, the square of the distance (or divergence) of positive definite matrix $P$ from the the identity matrix. See distance from the origin.
(2) Return $δ^2(P, Q)$, the square of the distance (or divergence) between two positive definite matrices $P$ and $Q$. See distance.
In both cases the distance function $δ$ is induced by the argument metric
of type Metric::Enumerated type.
$P$ in (1) and $P$, $Q$ in (2) must be flagged by julia as Hermitian
. See typecasting matrices.
(3) and (4) are specialized methods of (1) and (2), respectively, for real positive definite Diagonal
matrices. See ℍVector type and 𝔻Vector type.
Maths
For point $P$ the squared distances from the identity for the supported metrics are:
Metric | Squared Distance from the identity |
---|---|
Euclidean | $∥P-I∥^2$ |
invEuclidean | $∥P^{-1}-I∥^2$ |
ChoEuclidean | $∥L_P-I∥^2$ |
logEuclidean | $∥\textrm{log}P∥^2$ |
logCholesky | $∥S_P∥^2+∥\textrm{log}D_P∥^2$ |
Fisher | $∥\textrm{log}P∥^2$ |
logdet0 | $\textrm{logdet}\frac{1}{2}(P+I) - \frac{1}{2}\textrm{logdet}(P)$ |
Jeffrey | $\frac{1}{2}\textrm{tr}(P+P^{-1})-n$ |
VonNeumann | $\frac{1}{2}\textrm{tr}(P\textrm{log}P-\textrm{log}P)$ |
Wasserstein | $\textrm{tr}(P+I) -2\textrm{tr}(P^{1/2})$ |
For points $P$ and $Q$ their squared distances for the supported metrics are:
Metric | Squared Distance |
---|---|
Euclidean | $∥P-Q∥^2$ |
invEuclidean | $∥P^{-1}-Q^{-1}∥^2$ |
ChoEuclidean | $∥ L_P - L_Q ∥^2$ |
logEuclidean | $∥\textrm{log}P-\textrm{log}Q∥^2$ |
logCholesky | $∥S_P-S_Q∥^2+∥\textrm{log}D_P-\textrm{log}D_Q∥^2$ |
Fisher | $∥\textrm{log}(P^{-1/2}QP^{-1/2})∥^2$ |
logdet0 | $\textrm{logdet}\frac{1}{2}(P+Q) - \frac{1}{2}\textrm{logdet}(PQ)$ |
Jeffrey | $\frac{1}{2}\textrm{tr}(Q^{-1}P+P^{-1}Q)-n$ |
VonNeumann | $\frac{1}{2}\textrm{tr}(P\textrm{log}P-P\textrm{log}Q+Q\textrm{log}Q-Q\textrm{log}P)$ |
Wasserstein | $\textrm{tr}(P+Q) -2\textrm{tr}(P^{1/2}QP^{1/2})^{1/2}$ |
legend: $L_X$, $S_X$ and $D_X$ are the Cholesky lower triangle of $X$, its strictly lower triangular part and diagonal part, respectively (hence, $S_X+D_X=L_X$, $L_XL_X^*=X$).
See also: distanceSqrMat
.
Examples (1)
using PosDefManifold
P=randP(10)
d=distanceSqr(Wasserstein, P)
e=distanceSqr(Fisher, P)
metric=Metric(Int(logdet0)) # or metric=logdet0
s=string(metric) # check what is the current metric
f=distance²(metric, P) #using the alias distance²
Examples (2)
using PosDefManifold
P=randP(10)
Q=randP(10)
d=distanceSqr(logEuclidean, P, Q)
e=distance²(Jeffrey, P, Q)
PosDefManifold.distance
— Function(1) distance(metric::Metric, P::ℍ{T}) where T<:RealOrComplex
(2) distance(metric::Metric, P::ℍ{T}, Q::ℍ{T}) where T<:RealOrComplex
(3) distance(metric::Metric, D::𝔻{S}) where S<:Real
(4) distance(metric::Metric, D::𝔻{S}, E::𝔻{S}) where S<:Real
(1) Return $δ(P, I)$, the distance between positive definite matrix $P$ and the identity matrix.
(2) Return $δ(P, Q)$, the distance between positive definite matrices $P$ and $Q$.
(3) and (4) are specialized methods of (1) and (2), respectively, for real positive definite Diagonal
matrices.
This is the square root of distanceSqr
and is invoked with the same syntax therein.
See also: distanceMat
.
Graphs and Laplacians
Function | Description |
---|---|
distanceSqrMat , distance²Mat | Lower triangular matrix of all squared inter-distances |
distanceMat | Lower triangular matrix of all inter-distances |
laplacian | Laplacian of a squared inter-distances matrix |
laplacianEigenMaps , laplacianEM | Eigen maps (eigenvectors) of a Laplacian |
spectralEmbedding , spEmb | Spectral Embedding (the above functions run in series) |
⋅
PosDefManifold.distanceSqrMat
— Function (1) distanceSqrMat(metric::Metric, 𝐏::ℍVector;
<⏩=true>)
(2) distanceSqrMat(type::Type{T}, metric::Metric, 𝐏::ℍVector;
<⏩=true>) where T<:AbstractFloat
alias: distance²Mat
Given a 1d array $𝐏$ of $k$ positive definite matrices ${P_1,...,P_k}$ of ℍVector type, create the $k⋅k$ real LowerTriangular
matrix comprising elements $δ^2(P_i, P_j)\textrm{, for all }i>=j$.
This is the lower triangular matrix holding all squared inter-distances (zero on diagonal), using the specified metric
, of type Metric::Enumerated type, giving rise to distance function $δ$. See distanceSqr
.
Only the lower triangular part is computed in order to optimize memory use.
By default, the result matrix is of type Float32
. The type can be changed to another real type
using method (2).
<optional keyword arguments>:
- if ⏩=true (default) the computation of inter-distances is multi-threaded.
Multi-threading is automatically disabled if Julia is instructed to use only one thread. See Threads.
See: distance.
See also: laplacian
, laplacianEigenMaps
, spectralEmbedding
.
Examples
using PosDefManifold
# Generate a set of 8 random 10x10 SPD matrices
Pset=randP(10, 8) # or, using unicode: 𝐏=randP(10, 8)
# Compute the squared inter-distance matrix according to the log Euclidean metric.
# This is much faster as compared to the Fisher metric and in general
# it is a good approximation.
Δ²=distanceSqrMat(logEuclidean, Pset)
# return a matrix of type Float64
Δ²64=distanceSqrMat(Float64, logEuclidean, Pset)
# Get the full matrix of inter-distances
fullΔ²=Hermitian(Δ², :L)
PosDefManifold.distanceMat
— Function (1) distanceMat(metric::Metric, 𝐏::ℍVector;
<⏩=true>)
(2) distanceMat(type::Type{T}, metric::Metric, 𝐏::ℍVector;
<⏩=true>) where T<:AbstractFloat
Given a 1d array $𝐏$ of $k$ positive definite matrices ${P_1,...,P_k}$ of ℍVector type, create the $k⋅k$ real LowerTriangular
matrix comprising elements $δ(P_i, P_j)\textrm{, for all }i>=j$.
This is the lower triangular matrix holding all inter-distances (zero on diagonal), using the specified metric
, of type Metric::Enumerated type, giving rise to distance $δ$. See distance
.
Only the lower triangular part is computed in order to optimize memory use.
By default, the result matrix is of type Float32
. The type can be changed to another real type
using method (2).
The elements of this matrix are the square root of distanceSqrMat
.
<optional keyword arguments>:
- if ⏩=true the computation of inter-distances is multi-threaded.
Multi-threading is automatically disabled if Julia is instructed to use only one thread. See Threads.
See: distance.
Examples
using PosDefManifold
# Generate a set of 4 random 10x10 SPD matrices
Pset=randP(10, 4) # or, using unicode: 𝐏=randP(10, 4)
Δ=distanceMat(Fisher, Pset)
# return a matrix of type Float64
Δ64=distanceMat(Float64, Fisher, Pset)
# Get the full matrix of inter-distances
fullΔ=Hermitian(Δ, :L)
PosDefManifold.laplacian
— Functionlaplacian(Δ²::𝕃{S}, epsilon::Real=0;
<densityInvariant=false>) where S<:Real
Given a LowerTriangular
matrix of squared inter-distances $Δ^2$, return the lower triangular part of the so-called normalized Laplacian or density-invariant normalized Laplacian, which in both cases is a symmetric Laplacian. The elements of the Laplacian are of the same type as the elements of $Δ^2$. The result is a LowerTriangular
matrix.
The definition of Laplacian given by Lafon (2004)🎓 is implemented:
First, a Gaussian radial basis functions, known as Gaussian kernel or heat kernel, is applied to all elements of $Δ^2$, such as
$W_{ij} = exp\bigg(\frac{\displaystyle{-Δ^2_{ij}}}{\displaystyle{2ε}}\bigg)$,
where $ε$ is the bandwidth of the kernel.
If <optional keyword argument> densityInvariant=true
is used, then the density-invariant transformation is applied
$W \leftarrow E^{-1}WE^{-1}$
where $E$ is the diagonal matrix holding on the main diagonal the sum of the rows (or columns) of $W$.
Finally, the normalized Laplacian (density-invariant or not) is defined as
$Ω = D^{-1/2}WD^{-1/2}$,
where $D$ is the diagonal matrix holding on the main diagonal the sum of the rows (or columns) of $W$.
If you do not provide argument epsilon
, the bandwidth $ε$ is set to the median of the elements of squared distance matrix $Δ^2_{ij}$. Another educated guess is the dimension of the original data, that is, the data that has been used to compute the squared distance matrix. For positive definite matrices this is $n(n-1)/2$, where $n$ is the dimension of the matrices. Still another is the dimension of the ensuing spectralEmbedding
space. Keep in mind that by tuning the epsilon
parameter (which must be positive) you can control both the rate of compression of the embedding space and the spread of points in the embedding space. See Coifman et al. (2008)🎓 for a discussion on $ε$.
The Laplacian as here defined can be requested for any input matrix of squared inter-distances, for example, those obtained on scalars or on vectors using appropriate metrics. In any case, only the lower triangular part of the Laplacian is taken as input. See typecasting matrices.
See also: distanceSqrMat
, laplacianEigenMaps
, spectralEmbedding
.
Examples
using PosDefManifold
# Generate a set of 4 random 10x10 SPD matrices
Pset=randP(10, 4) # or, using unicode: 𝐏=randP(10, 4)
Δ²=distanceSqrMat(Fisher, Pset)
Ω=laplacian(Δ²)
# density-invariant Laplacian
Ω=laplacian(Δ²; densityInvariant=true)
# increase the bandwidth
r=size(Δ², 1)
myεFactor=0.1
med=Statistics.median([Δ²[i, j] for j=1:r-1 for i=j+1:r])
ε=2*myεFactor*med
Ω=laplacian(Δ², ε; densityInvariant=true)
PosDefManifold.laplacianEigenMaps
— Function laplacianEigenMaps(Ω::𝕃{S}, q::Int;
<
tol::Real=0.,
maxiter::Int=300,
verbose=false >) where S<:Real
alias: laplacianEM
Given the lower triangular part of a Laplacian $Ω$ (see laplacian
) return the eigen maps in $q$ dimensions, i.e., the $q$ eigenvectors of the Laplacian associated with the largest $q$ eigenvalues, excluding the first (which is always equal to 1.0). The eigenvectors are of the same type as $Ω$. They are all divided element-wise by the first eigenvector (see Lafon, 2004🎓).
The eigenvectors of the Laplacian are computed by the power iterations+modified Gram-Schmidt method (see powerIterations
), allowing the execution of this function for Laplacian matrices of very large size.
Return the 4-tuple $(Λ, U, iterations, convergence)$, where:
- $Λ$ is a $q⋅q$ diagonal matrix holding on diagonal the eigenvalues corresponding to the $q$ dimensions of the Laplacian eigen maps,
- $U$ holds in columns the $q$ eigenvectors holding the $q$ coordinates of the points in the embedding space,
- $iterations$ is the number of iterations executed by the power method,
- $convergence$ is the convergence attained by the power method.
Using the notion of Laplacian, spectral embedding seek a low-dimension representation of the data emphasizing local neighbothood information while neglecting long-distance information. The embedding is non-linear, however the embedding space is Euclidean. The eigenvectors of $U$ holds the coordinates of the points in the embedding space (typically two- or three-dimensional for plotting or more for clustering). Spectral embedding is done for plotting data in low-dimension, clustering, imaging, classification, following their trajectories over time or other dimensions, and much more. For examples of applications see Ridrigues et al. (2018) 🎓 and references therein.
Arguments:
- $Ω$ is a real
LowerTriangular
normalized Laplacian obtained by thelaplacian
function, - $q$ is the dimension of the Laplacian eigen maps;
- The following are <optional keyword arguments> for the power iterations:
tol
is the tolerance for convergence (see below),maxiter
is the maximum number of iterations allowed,- if
verbose
is true, the convergence at all iterations will be printed.
The maximum value of $q$ that can be requested is $n-1$, where $n$ is the size of the Laplacian. In general, $q=2$ or $q=3$ is requested.
$tol$ defaults to the square root of Base.eps
of the (real) type of $Ω$. This corresponds to requiring equality for the convergence criterion over two successive power iterations of about half of the significant digits.
See also: distanceSqrMat
, laplacian
, spectralEmbedding
.
Examples
using PosDefManifold
# Generate a set of 4 random 10x10 SPD matrices
Pset=randP(10, 4)
Δ²=distanceSqrMat(Fisher, Pset)
Ω=laplacian(Δ²)
evalues, maps, iterations, convergence=laplacianEM(Ω, 2)
evalues, maps, iterations, convergence=laplacianEM(Ω, 2; verbose=true)
evalues, maps, iterations, convergence=laplacianEM(Ω, 2; verbose=true, maxiter=500)
PosDefManifold.spectralEmbedding
— Function (1) spectralEmbedding(metric::Metric, 𝐏::ℍVector, q::Int, epsilon::Real=0;
<
tol::Real=0.,
maxiter::Int=300,
densityInvariant=false,
verbose=false,
⏩=true >)
(2) spectralEmbedding(type::Type{T}, metric::Metric, 𝐏::ℍVector, q::Int, epsilon::Real=0;
< same optional keyword arguments as in (1) >) where T<:Real
alias: spEmb
Given a 1d array $𝐏$ of $k$ positive definite matrices ${P_1,...,P_k}$ (real or complex), compute its eigen maps in $q$ dimensions.
This function runs one after the other the functions:
distanceSqrMat
(compute the squared inter-distance matrix),laplacian
(compute the normalized Laplacian),laplacianEigenMaps
(get the eigen maps).
By default all computations above are done with Float32
precision. Another real type can be requested using method (2), where the type
argument is defined.
Return the 4-tuple (Λ, U, iterations, convergence)
, where:
- $Λ$ is a $q⋅q$ diagonal matrix holding on diagonal the eigenvalues corresponding to the $q$ dimensions of the Laplacian eigen maps,
- $U$ holds in columns the $q$ eigenvectors holding the $q$ coordinates of the points in the embedding space,
- $iterations$ is the number of iterations executed by the power method,
- $convergence$ is the convergence attained by the power method.
Arguments:
metric
is the metric of type Metric::Enumerated type used for computing the inter-distances,- $𝐏$ is a 1d array of $k$ positive matrices of ℍVector type,
- $q$ is the dimension of the Laplacian eigen maps,
- $epsilon$ is the bandwidth of the Laplacian (see
laplacian
); - The following <optional keyword argument> applyies for computing the inter-distances:
- if
⏩=true
(default) the computation of inter-distances is multi-threaded.
- if
- The following <optional keyword argument> applyies to the computation of the Laplacian by the
laplacian
function:- if
densityInvariant=true
the density-invariant Laplacian is computed (seelaplacian
).
- if
- The following are <optional keyword arguments> for the power method iterative algorithm invoked by
laplacianEigenMaps
:tol
is the tolerance for convergence of the power method (see below),maxiter
is the maximum number of iterations allowed for the power method,- if
verbose=true
the convergence at all iterations will be printed;
$tol$ defaults to the square root of Base.eps
of the Float32
type (1) or of the type
passed as argumant (2). This corresponds to requiring equality for the convergence criterion over two successive power iterations of about half of the significant digits.
Multi-threading is automatically disabled if Julia is instructed to use only one thread. See Threads.
See also: distanceSqrMat
, laplacian
, laplacianEigenMaps
.
Examples
using PosDefManifold
# Generate a set of k random 10x10 SPD matrices
k=10
Pset=randP(10, k)
evalues, maps, iter, conv=spectralEmbedding(Fisher, Pset, 2)
# show convergence information
evalues, maps, iter, conv=spectralEmbedding(Fisher, Pset, 2; verbose=true)
# use Float64 precision.
evalues, maps, iter, conv=spectralEmbedding(Float64, Fisher, Pset, 2)
using Plots
# check eigevalues and eigenvectors
plot(diag(evalues))
plot(maps[:, 1])
plot!(maps[:, 2])
plot!(maps[:, 3])
# plot the data in the embedded space
plot(maps[:, 1], maps[:, 2], seriestype=:scatter, title="Spectral Embedding", label="Pset")
# try a different value of epsilon
evalues, maps, iter, conv=spEmb(Fisher, Pset, k-1, 0.01; maxiter=1000)
plot(maps[:, 1], maps[:, 2], seriestype=:scatter, title="Spectral Embedding", label="Pset")
# see the example in `Laplacian` function for more on this
Means
Function | Description |
---|---|
mean | Weighted Fréchet mean (wFm) of a scalar or matrix set using any metric |
means | As above for several sets at once |
generalizedMean | Generalized wFm of a matrix set |
geometricMean , gMean | wFm of a matrix set minimizing the dispersion according to the Fisher metric (iterative) |
geometricpMean , gpMean | robust wFm of a matrix set minimizing the p-dispersion according to the Fisher metric (iterative) |
logdet0Mean , ld0Mean | wFm of a matrix set according to the logdet0 metric (iterative) |
wasMean | wFm of a matrix set according to the Wasserstein metric (iterative) |
powerMean | Power wFm of a matrix set (iterative) |
inductiveMean , indMean | Recursive Fréchet mean of a matrix set (constructive) |
midrange | Geometric midrange of two matrices |
⋅
Statistics.mean
— Function (1) mean(metric::Metric, P::ℍ{T}, Q::ℍ{T}) where T<:RealOrComplex
(2) mean(metric::Metric, D::𝔻{T}, E::𝔻{T}) where T<:Real
(3) mean(metric::Metric, 𝐏::ℍVector;
<
w::Vector=[],
✓w=true,
init::Union{ℍ, Nothing}=nothing,
tol::Real=0.,
verbose=false,
⏩=true >)
(4) mean(metric::Metric, 𝐃::𝔻Vector;
< same optional keyword arguments as in (3) >)
(1) Mean of two positive definite matrices, passed in arbitrary order as arguments $P$ and $Q$, using the specified metric
of type Metric::Enumerated type. The order is arbitrary as all metrics implemented in PosDefManifold are symmetric. This is the midpoint of the geodesic. For the weighted mean of two positive definite matrices use instead the geodesic
function. $P$ and $Q$ must be flagged as Hermitian
. See typecasting matrices.
(2) Like in (1), but for two real diagonal positive definite matrices $D$ and $E$.
(3) Fréchet mean of an 1d array $𝐏$ of $k$ positive definite matrices $𝐏={P_1,...,P_k}$ of ℍVector type, with optional non-negative real weights $w={w_1,...,w_k}$ and using the specified metric
as in (1).
(4) Fréchet mean of an 1d array $𝐃$ of $k$ positive definite matrices $𝐃={D_1,...,D_k}$ of 𝔻Vector type, with optional non-negative real weights $w={w_1,...,w_k}$ and using the specified metric
as in (1).
If you don't pass a weight vector with <optional keyword argument> $w$, return the unweighted mean.
If <optional keyword argument> ✓w=true
(default), the weights are normalized so as to sum up to 1, otherwise they are used as they are passed and should be already normalized. This option is provided to allow calling this function repeatedly without normalizing the same weights vector each time.
Adopting the Fisher
, logdet0
and Wasserstein
metric in (3) and the logdet0
metric in (4), the mean is computed by means of an iterative algorithm. A particular initialization for these algorithms can be provided passing an Hermitian matrix as <optional keyword argument> init
. The convergence for these algorithm is required with a tolerance given by <optional keyword argument> tol
. if verbose=true
the covergence attained at each iteration is printed. Other information such as if the algorithm has diverged is also printed. For more options in computing these means call directly functions geometricMean
, logdet0Mean
and wasMean
, which are called hereby. For the meaning of the tol
default value see the documentation of these functions. See also the robust mean function geometricpMean
, which cannot be called from here. Notice that arguments init
and tol
have an effect only for the aferomentioned metrics in methods (3) and (4).
For (3) and (4), if ⏩=true
(default), the computation of the mean is multi-threaded for all metrics.
Multi-threading is automatically disabled if Julia is instructed to use only one thread. See Threads.
Math
The Fréchet mean of a set of $k$ matrices ${P_1, P_2,..., P_k}$ weighted by ${w_1, w_2,..., w_k}:\sum_{i=1}^{k}w_i=1$ for the supported metrics are, for those with closed form expression:
Metric | weighted Fréchet mean |
---|---|
Euclidean | $\sum_{i=1}^{k}w_i P_i$ |
invEuclidean | $\big(\sum_{i=1}^{k}w_i P_i^{-1}\big)^{-1}$ |
ChoEuclidean | $TT^*$, where $T=bL_P + aL_Q$ |
logEuclidean | $\textrm{exp}\big(\sum_{i=1}^{k}w_i\hspace{1pt} \textrm{log}P_i \big)$ |
logCholesky | $TT^*$, where $T=\sum_{i=1}^{k}(w_kS_k)+\sum_{i=1}^{k}(w_k\textrm{log}D_k)$ |
Jeffrey | $A^{1/2}\big(A^{-1/2}HA^{-1/2}\big)^{1/2}A^{1/2}$ |
and for those that are found by an iterative algorithm and that verify an equation:
Metric | equation verified by the weighted Fréchet mean |
---|---|
Fisher | $\sum_{i=1}^{k}w_i\textrm{log}\big(G^{-1/2} P_k G^{-1/2}\big)=0.$ |
logdet0 | $\sum_{i=1}^{k}w_i\big(\frac{1}{2}P_i+\frac{1}{2}G\big)^{-1}=G^{-1}$ |
VonNeumann | N.A. |
Wasserstein | $G=\sum_{i=1}^{k}w_i\big( G^{1/2} P_i G^{1/2}\big)^{1/2}$ |
legend: $L_X$, $S_X$ and $D_X$ are the Cholesky lower triangle of $X$, its strictly lower triangular part and diagonal part, respectively (hence, $S_X+D_X=L_X$, $L_XL_X^*=X$). $A$ and $H$ are the weighted arithmetic and weighted harmonic mean, respectively.
See: geodesic, mean, Fréchet mean.
Examples
using LinearAlgebra, Statistics, PosDefManifold
# Generate 2 random 3x3 SPD matrices
P=randP(3)
Q=randP(3)
M=mean(logdet0, P, Q) # (1)
M=mean(Euclidean, P, Q) # (1)
# passing several matrices and associated weights listing them
# weights vector, does not need to be normalized
R=randP(3)
mean(Fisher, ℍVector([P, Q, R]); w=[1, 2, 3])
# Generate a set of 4 random 3x3 SPD matrices
Pset=randP(3, 4)
weights=[1, 2, 3, 1]
# passing a vector of Hermitian matrices (ℍVector type)
M=mean(Euclidean, Pset; w=weights) # (2) weighted Euclidean mean
M=mean(Wasserstein, Pset) # (2) unweighted Wassertein mean
# display convergence information when using an iterative algorithm
M=mean(Fisher, Pset; verbose=true)
# run multi-threaded when the number of matrices is high
using BenchmarkTools
Pset=randP(20, 160)
@benchmark(mean(logEuclidean, Pset; ⏩=false)) # single-threaded
@benchmark(mean(logEuclidean, Pset)) # multi-threaded
mean(metric::Metric, ν::Vector{T}) where T<:RealOrComplex
Mean of $k$ real or complex scalars, using the specified metric
of type Metric::Enumerated type. Note that using the Fisher, logEuclidean and Jeffrey metric, the resulting mean is the scalar geometric mean. Note also that the code of this method is in unit statistics.jl, while the code for all the others is in unit riemannianGeometry.jl.
Examples
using PosDefManifold
# Generate 10 random numbers distributed as a chi-square with 2 df.
ν=[randχ²(2) for i=1:10]
arithmetic_mean=mean(Euclidean, ν)
geometric_mean=mean(Fisher, ν)
harmonic_mean=mean(invEuclidean, ν)
harmonic_mean<=geometric_mean<=arithmetic_mean # AGH inequality
PosDefManifold.means
— Function (1) means(metric::Metric, 𝒫::ℍVector₂;
<⏩=true>)
(2) means(metric::Metric, 𝒟::𝔻Vector₂;
<⏩=true>)
(1) Given a 2d array $𝒫$ of positive definite matrices as an ℍVector₂ type compute the Fréchet mean for as many ℍVector type objects as hold in $𝒫$, using the specified metric
of type Metric::Enumerated type. Return the means in a vector of Hermitian
matrices, that is, as an ℍVector
type.
(2) Given a 2d array $𝒟$ of real positive definite matrices as an 𝔻Vector₂ type compute the Fréchet mean for as many 𝔻Vector type objects as hold in $𝒟$, using the specified metric
of type Metric::Enumerated type. Return the means in a vector of Diagonal
matrices, that is, as a 𝔻Vector
type.
The weigted Fréchet mean is not supported in this function.
If ⏩=true
(default) the computation of the means is multi-threaded.
Multi-threading is automatically disabled if Julia is instructed to use only one thread. See Threads.
See also: mean
.
Examples
using PosDefManifold
# Generate a set of 4 random 3x3 SPD matrices
Pset=randP(3, 4) # or, using unicode: 𝐏=randP(3, 4)
# Generate a set of 40 random 4x4 SPD matrices
Qset=randP(3, 40) # or, using unicode: 𝐐=randP(3, 40)
# listing directly ℍVector objects
means(logEuclidean, ℍVector₂([Pset, Qset])) # or: means(logEuclidean, ℍVector₂([𝐏, 𝐐]))
# note that [𝐏, 𝐐] is actually a ℍVector₂ type object
# creating and passing an object of ℍVector₂ type
sets=ℍVector₂(undef, 2) # or: 𝒫=ℍVector₂(undef, 2)
sets[1]=Pset # or: 𝒫[1]=𝐏
sets[2]=Qset # or: 𝒫[2]=𝐐
means(logEuclidean, sets) # or: means(logEuclidean, 𝒫)
# going multi-threated
# first, create 20 sets of 200 50x50 SPD matrices
sets=ℍVector₂([randP(50, 200) for i=1:20])
# How much computing time we save ?
# (example min time obtained with 4 threads & 4 BLAS threads)
using BenchmarkTools
# non multi-threaded, mean with closed-form solution
@benchmark(means(logEuclidean, sets; ⏩=false)) # (6.196 s)
# multi-threaded, mean with closed-form solution
@benchmark(means(logEuclidean, sets)) # (1.897 s)
sets=ℍVector₂([randP(10, 200) for i=1:10])
# non multi-threaded, mean with iterative solution
# wait a bit
@benchmark(means(Fisher, sets; ⏩=false)) # (4.672 s )
# multi-threaded, mean with iterative solution
@benchmark(means(Fisher, sets)) # (1.510 s)
PosDefManifold.generalizedMean
— Function generalizedMean(𝐏::Union{ℍVector, 𝔻Vector}, p::Real;
<
w::Vector=[],
✓w=true,
⏩=true >)
Given a 1d array $𝐏={P_1,...,P_k}$ of $k$ positive definite matrices of ℍVector type or real positive definite diagonal matrices of 𝔻Vector type and optional non-negative real weights vector $w={w_1,...,w_k}$, return the weighted generalized means $G$ with real parameter $p$, that is,
$G=\big(\sum_{i=1}^{k}w_iP_i^p\big)^{1/p}$.
If you don't pass a weight vector with <optional keyword argument> $w$, return the unweighted generalized mean
$G=\big(\sum_{i=1}^{k}P_i^p\big)^{1/p}$.
If <optional keyword argument> ✓w=true
(default), the weights are normalized so as to sum up to 1, otherwise they are used as they are passed and should be already normalized. This option is provided to allow calling this function repeatedly without normalizing the weights each time.
If <optional key argmuent> ⏩=true the computation of the generalized mean is multi-threaded.
Multi-threading is automatically disabled if Julia is instructed to use only one thread. See Threads.
The following special cases for parameter $p$ are noteworthy:
- For $p=\frac{1}{2}$ the generalized mean is the modified Bhattacharyya mean.
- For $p=1$ the generalized mean is the Euclidean mean.
- For $p=-1$ the generalized mean is the inverse Euclidean mean.
- For (the limit of) $p=0$ the generalized mean is the log Euclidean mean, which is the Fisher mean when matrices in 𝐏 all pair-wise commute.
Notice that when matrices in 𝐏 all pair-wise commute, for instance if the matrices are diagonal, the generalized means coincide with the power means for any $p∈[-1, 1]$ and for $p=0.5$ it coincides also with the Wasserstein mean. For this reason the generalized means are used as default initialization of both the powerMean
and wasMean
algorithm.
See: generalized means.
See also: powerMean
, wasMean
, mean
.
Examples
using LinearAlgebra, Statistics, PosDefManifold
# Generate a set of 4 random 3x3 SPD matrices
Pset=randP(3, 4) # or, using unicode: 𝐏=randP(3, 4)
# weights vector, does not need to be normalized
weights=[1, 2, 3, 1]
# unweighted mean
G = generalizedMean(Pset, 0.25) # or: G = generalizedMean(𝐏, 0.25)
# weighted mean
G = generalizedMean(Pset, 0.5; w=weights)
# with weights previously normalized we can set ✓w=false
weights=weights./sum(weights)
G = generalizedMean(Pset, 0.5; w=weights, ✓w=false)
# run multi-threaded when the number of matrices is high
using BenchmarkTools
Pset=randP(20, 160)
@benchmark(generalizedMean(Pset; ⏩=false)) # single-threaded
@benchmark(generalizedMean(Pset)) # multi-threaded
PosDefManifold.geometricMean
— Function geometricMean(𝐏::Union{ℍVector, 𝔻Vector};
<
w::Vector=[],
✓w=true,
init=nothing,
tol::Real=0.,
maxiter::Int=500,
adaptStepSize::Bool=true,
verbose=false,
⏩=true >)
alias: gmean
Given a 1d array $𝐏={P_1,...,P_k}$ of $k$ positive definite matrices of ℍVector type or diagonal matrices of 𝔻Vector type and optional non-negative real weights vector $w={w_1,...,w_k}$, return the 3-tuple $(G, iter, conv)$, where $G$ is the mean according to the Fisher metric and $iter$, $conv$ are the number of iterations and convergence attained by the algorithm. Mean $G$ is the unique positive definite matrix satisfying
$\sum_{i=1}^{k}w_i\textrm{log}\big(G^{-1/2} P_i G^{-1/2}\big)=0.$
For estimating it, this function implements the well-known gradient descent algorithm, but with an exponential decaying step size $ς$, yielding iterations
$G ←G^{1/2}\textrm{exp}\big(ς\sum_{i=1}^{k}w_i\textrm{log}(G^{-1/2} P_i G^{-1/2})\big)G^{1/2}.$
If you don't pass a weight vector with <optional keyword argument> $w$, return the unweighted geometric mean.
If <optional keyword argument> ✓w=true
(default), the weights are normalized so as to sum up to 1, otherwise they are used as they are passed and should be already normalized. This option is provided to allow calling this function repeatedly without normalizing the same weights vector each time.
The following are more <optional keyword arguments>:
init
is a matrix to be used as initialization for the mean. If no matrix is provided, the log Euclidean mean will be used,tol
is the tolerance for the convergence (see below).maxiter
is the maximum number of iterations allowed- if
verbose
=true, the convergence attained at each iteration and the step size $ς$ is printed. Also, a warning is printed if convergence is not attained. - if ⏩=true the iterations are multi-threaded (see below).
- if
adaptStepSize
=false the step sizeς
is fixed to 1 at all iterations.
If the input is a 1d array of $k$ real positive definite diagonal matrices the solution is available in closed-form as the log Euclidean mean, hence the <optional keyword arguments> init
, tol
and verbose
have no effect and return the 3-tuple $(G, 1, 0)$. See the log Euclidean metric.
Multi-threading is automatically disabled if Julia is instructed to use only one thread. See Threads.
In normal circumstances this algorithm converges monothonically. If the algorithm diverges and verbose
is true a warning is printed indicating the iteration when this happened.
The exponential decaying step size features a faster convergence rate as compared to the fixed step size $ς=1$ that is usually adopted. The decaying rate is inversely proportional to maxiter
, thus, increase/decrease maxiter
in order to set a slower/faster decaying rate. maxiter
should not be set too low though.
$tol$ defaults to the square root of Base.eps
of the nearest real type of data input $𝐏$. This corresponds to requiring the norm of the satisfying matrix equation divided by the number of elements to vanish for about half the significant digits.
See: Fisher metric.
See also: geometricpMean
, powerMean
, wasMean
, logdet0Mean
, mean
.
Examples
using LinearAlgebra, PosDefManifold
# Generate a set of 4 random 3x3 SPD matrices
Pset=randP(3, 4) # or, using unicode: 𝐏=randP(3, 4)
# unweighted mean
G, iter, conv = geometricMean(Pset) # or G, iter, conv = geometricMean(𝐏)
# weights vector, does not need to be normalized
weights=[1, 2, 3, 1]
# weighted mean
G, iter, conv = geometricMean(Pset, w=weights)
# print the convergence at all iterations
G, iter, conv = geometricMean(Pset; verbose=true)
# now suppose Pset has changed a bit, initialize with G to hasten convergence
Pset[1]=ℍ(Pset[1]+(randP(3)/100))
G, iter, conv = geometricMean(Pset; w=weights, ✓w=true, verbose=true, init=G)
# run multi-threaded when the number of matrices is high
using BenchmarkTools
Pset=randP(20, 120)
@benchmark(geometricMean(Pset; ⏩=false)) # single-threaded
@benchmark(geometricMean(Pset)) # multi-threaded
# show the mean and the input points using spectral embedding
using Plots
k=80
Pset=randP(20, k)
G, iter, conv = geometricMean(Pset)
push!(Pset, G)
Λ, U, iter, conv=spectralEmbedding(Fisher, Pset, 2; verbose=true)
plot(U[1:k, 1], U[1:k, 2], seriestype=:scatter, title="Spectral Embedding", label="Pset")
plot!(U[k+1:k+1, 1], U[k+1:k+1, 2], seriestype=:scatter, label="mean")
PosDefManifold.geometricpMean
— Function geometricpMean(𝐏::ℍVector, p::Real=0.5;
<
w::Vector=[],
✓w=true,
init=nothing,
tol::Real=0.,
maxiter::Int=500,
adaptStepSize=true,
verbose=false,
⏩=true >)
alias: gpmean
Given a 1d array $𝐏={P_1,...,P_k}$ of $k$ positive definite matrices of ℍVector type, a real parameter $0<p<=1$ and optional non-negative real weights vector $w={w_1,...,w_k}$, return the 3-tuple $(G, iter, conv)$, where $G$ is the p-mean, i.e., the mean according to the Fisher metric minimizing the p-dispersion (see below) and $iter$, $conv$ are the number of iterations and convergence attained by the algorithm.
This function implements the p-dispersion gradient descent algorithm with step-size $ς$ (to be published), yielding iterations
$G ←G^{1/2}\textrm{exp}\big(ς\sum_{i=1}^{k}pδ^2(G, P_i)^{p-1}w_i\textrm{log}(G^{-1/2} P_i G^{-1/2})\big)G^{1/2}.$
- if $p=1$ this yields the geometric mean (implemented specifically in
geometricMean
). - if $p=0.5$ this yields the geometric median (default).
If you don't pass a weight vector with <optional keyword argument> $w$, return the unweighted geometric-p mean.
If <optional keyword argument> ✓w=true
(default), the weights are normalized so as to sum up to 1, otherwise they are used as they are passed and should be already normalized. This option is provided to allow calling this function repeatedly without normalizing the same weights vector each time.
The following are more <optional keyword arguments>:
init
is a matrix to be used as initialization for the mean. If no matrix is provided, the log Euclidean mean will be used,tol
is the tolerance for the convergence (see below).maxiter
is the maximum number of iterations allowed.- if
adaptStepSize
=true (default) the step size $ς$ for the gradient descent is adapted at each iteration (see below). - if
verbose
=true, the step-size and convergence attained at each iteration is printed. Also, a warning is printed if convergence is not attained. - if ⏩=true the iterations are multi-threaded (see below).
Multi-threading is automatically disabled if Julia is instructed to use only one thread. See Threads.
If the algorithm diverges and verbose
is true a warning is printed indicating the iteration when this happened. This algorithm may temporary diverge, still reach convergence. Overall, while all other iterative algorithms implemented in PosDefMaifold are very stable, this is not.
The smaller the parameter $p$ is, the slower and less likely the convergence is. If the algorithm does not converge, try increasing $p$, initializing the algorithm with the output of geometricMean
and/or eliminating the otliers from the input set $𝐏$.
If adaptStepSize
is true (default) the step-size $ς$ is adapted at each iteration, otherwise a fixed step size $ς=1$ is used. Adapting the step size in general hastens convergence and improves the convergence behavior.
$tol$ defaults to the square root of Base.eps
of the nearest real type of data input $𝐏$. This corresponds to requiring the norm of the satisfying matrix equation divided by the number of elements to vanish for about half the significant digits.
See: Fisher metric.
See also: geometricMean
, powerMean
, wasMean
, logdet0Mean
, mean
.
Examples
using LinearAlgebra, PosDefManifold, Plots
# This examples show that this algorithm is more robust to outliers
# as compared to the standard geometric mean algorithm
# Generate a set of 100 random 10x10 SPD matrices
Pset=randP(10, 100)
# Get the usual geometric mean for comparison
G, iter1, conv1 = geometricMean(Pset, verbose=true)
# change p to observe how the convergence behavior changes accordingly
# Get the median (default)
H, iter2, conv2 = geometricpMean(Pset, verbose=true)
# Get the p-mean for p=0.25
H, iter2, conv2 = geometricpMean(Pset, 0.25, verbose=true)
println(iter1, " ", iter2); println(conv1, " ", conv2)
# move the first matrix in Pset to possibly create an otlier
Pset[1]=geodesic(Fisher, G, Pset[1], 3)
G1, iter1, conv1 = geometricMean(Pset, verbose=true)
H1, iter2, conv2 = geometricpMean(Pset, 0.25, verbose=true)
println(iter1, " ", iter2); println(conv1, " ", conv2)
# collect the geometric and p-means, before and after the
# introduction of the outier in vector of Hermitian matrices `S`.
S=HermitianVector([G, G1, H, H1])
# check the interdistance matrix Δ² to verify that the geometric mean
# after the introduction of the outlier (``G1``) is farther away from
# the geometric mean as compared to how much ``H1`` is further away
# from ``H``, i.e., that element (4,3) is much smaller than element (2,1).
Δ²=distance²Mat(Float64, Fisher, S)
# how far are all these matrices from all the others?
fullΔ²=Hermitian(Δ², :L)
dist=[sum(fullΔ²[:, i]) for i=1:size(fullΔ², 1)]
# plot the matrices in `S` using spectral embedding.
using Plots
Λ, U, iter, conv = laplacianEM(laplacian(Δ²), 3; verbose=true)
plot([U[1, 1]], [U[1, 2]], seriestype=:scatter, label="g-mean")
plot!([U[2, 1]], [U[2, 2]], seriestype=:scatter, label="g-mean outlier")
plot!([U[3, 1]], [U[3, 2]], seriestype=:scatter, label="p-mean")
plot!([U[4, 1]], [U[4, 2]], seriestype=:scatter, label="p-mean outlier")
# estimate how much you gain running the algorithm in multi-threaded mode
using BenchmarkTools
Pset=randP(20, 120)
@benchmark(geometricpMean(Pset; ⏩=true)) # single-threaded
@benchmark(geometricpMean(Pset)) # multi-threaded
PosDefManifold.logdet0Mean
— Function logdet0Mean(𝐏::Union{ℍVector, 𝔻Vector};
<
w::Vector=[],
✓w=true,
init=nothing,
tol::Real=0.,
maxiter::Int=500,
verbose=false,
⏩=true >)
alias: ld0Mean
Given a 1d array $𝐏={P_1,...,P_k}$ of $k$ positive definite matrices of ℍVector type or real positive definite diagonal matrices of 𝔻Vector type and optional non-negative real weights vector $w={w_1,...,w_k}$, return the 3-tuple $(G, iter, conv)$, where $G$ is the mean according to the logdet zero metric and $iter$, $conv$ are the number of iterations and convergence attained by the algorithm. Mean $G$ is the unique positive definite matrix satisfying
$\sum_{i=1}^{k}w_i\big(\frac{1}{2}P_i+\frac{1}{2}G\big)^{-1}-G^{-1}=0$.
For estimating it, this function implements the fixed-point iteration algorithm suggested by (Moakher, 2012, p315)🎓, yielding iterations
$G ← \frac{1}{2}\big(\sum_{i=1}^{k}w_i(P_i+G)^{-1}\big)^{-1}$.
If you don't pass a weight vector with <optional keyword argument> $w$, return the unweighted logdet zero mean.
If <optional keyword argument> ✓w=true
(default), the weights are normalized so as to sum up to 1, otherwise they are used as they are passed and should be already normalized. This option is provided to allow calling this function repeatedly without normalizing the same weights vector each time.
The following are more <optional keyword arguments>:
init
is a matrix to be used as initialization for the mean. If no matrix is provided, the log Euclidean mean will be used,tol
is the tolerance for the convergence (see below).maxiter
is the maximum number of iterations allowed.- if
verbose
=true, the convergence attained at each iteration is printed and a warning is printed if convergence is not attained. - if ⏩=true the iterations are multi-threaded (see below).
Multi-threading is automatically disabled if Julia is instructed to use only one thread. See Threads.
In normal circumstances this algorithm converges monothonically. If the algorithm diverges and verbose
is true a warning is printed indicating the iteration when this happened.
$tol$ defaults to 100 times the square root of Base.eps
of the nearest real type of data input $𝐏$. This corresponds to requiring the square root of the relative convergence criterion over two successive iterations to vanish for about half the significant digits minus 2.
See: logdet zero metric, modified Bhattacharyya mean.
See also: powerMean
, wasMean
, logdet0Mean
, mean
.
Examples
using LinearAlgebra, PosDefManifold
# Generate a set of 4 random 3x3 SPD matrices
Pset=randP(3, 4) # or, using unicode: 𝐏=randP(3, 4)
# unweighted mean
G, iter, conv = logdet0Mean(Pset) # or G, iter, conv = logdet0Mean(𝐏)
# weights vector, does not need to be normalized
weights=[1, 2, 3, 1]
# weighted mean
G, iter, conv = logdet0Mean(Pset, w=weights)
# print the convergence at all iterations
G, iter, conv = logdet0Mean(Pset; w=weights, verbose=true)
# suppose Pset has changed a bit; initialize with G to hasten convergence
Pset[1]=ℍ(Pset[1]+(randP(3)/100))
G, iter, conv = logdet0Mean(Pset; w=weights, ✓w=false, verbose=true, init=G)
# estimate how much you gain running the algorithm in multi-threaded mode
using BenchmarkTools
Pset=randP(20, 120)
@benchmark(logdet0Mean(Pset; ⏩=false)) # single-threaded
@benchmark(logdet0Mean(Pset)) # multi-threaded
PosDefManifold.wasMean
— Function wasMean(𝐏::Union{ℍVector, 𝔻Vector};
<
w::Vector=[],
✓w=true,
init=nothing,
tol::Real=0.,
maxiter::Int=500,
verbose=false,
⏩=true >)
Given a 1d array $𝐏={P_1,...,P_k}$ of $k$ positive definite matrices of ℍVector type or real positive definite diagonal matrices of 𝔻Vector type and optional non-negative real weights vector $w={w_1,...,w_k}$, return the 3-tuple $(G, iter, conv)$, where $G$ is the mean according to the Wasserstein metric and $iter$, $conv$ are the number of iterations and convergence attained by the algorithm. Mean $G$ is the unique positive definite matrix satisfying
$G=\sum_{i=1}^{k}w_i\big( G^{1/2} P_i G^{1/2}\big)^{1/2}$.
For estimating it, this function implements the fixed-point iterative algorithm proposed by (Álvarez-Esteban et al., 2016)🎓:
$G ← G^{-1/2}\big(\sum_{i=1}^{k} w_i(G^{1/2}P_i G^{1/2})^{1/2}\big)^2 G^{-1/2}$.
If you don't pass a weight vector with <optional keyword argument> $w$, return the unweighted Wassertein mean.
If <optional keyword argument> ✓w=true
(default), the weights are normalized so as to sum up to 1, otherwise they are used as they are passed and they should be already normalized. This option is provided to allow calling this function repeatedly without normalizing the same weights vector each time.
The following are more <optional keyword arguments>:
init
is a matrix to be used as initialization for the mean. If no matrix is provided, the instance of generalized means with $p=0.5$ will be used,tol
is the tolerance for the convergence (see below).maxiter
is the maximum number of iterations allowed.- if
verbose
=true, the convergence attained at each iteration is printed and a warning is printed if convergence is not attained. - if ⏩=true the iterations are multi-threaded (see below).
If the input is a 1d array of $k$ real positive definite diagonal matrices the solution is available in closed-form as the modified Bhattacharyya mean, hence the <optional keyword arguments> init
, tol
and verbose
have no effect and return the 3-tuple $(G, 1, 0)$. See modified Bhattacharyya mean.
Multi-threading is automatically disabled if Julia is instructed to use only one thread. See Threads.
In normal circumstances this algorithm converges monothonically. If the algorithm diverges and verbose
is true a warning is printed indicating the iteration when this happened.
$tol$ defaults to the square root of Base.eps
of the nearest real type of data input $𝐏$. This corresponds to requiring the norm of the satisfying matrix equation divided by the number of elements to vanish for about half the significant digits.
See: Wasserstein metric.
See also: powerMean
, wasMean
, logdet0Mean
, mean
.
Examples
using LinearAlgebra, PosDefManifold
# Generate a set of 4 random 3x3 SPD matrices
Pset=randP(3, 4) # or, using unicode: 𝐏=randP(3, 4)
# unweighted mean
G, iter, conv = wasMean(Pset) # or: G, iter, conv = wasMean(𝐏)
# weights vector, does not need to be normalized
weights=[1, 2, 3, 1]
# weighted mean
G, iter, conv = wasMean(Pset; w=weights)
# print the convergence at all iterations
G, iter, conv = wasMean(Pset; w=weights, verbose=true)
# suppose 𝐏 has changed a bit; initialize with G to hasten convergence
Pset[1]=ℍ(Pset[1]+(randP(3)/100))
G, iter, conv = wasMean(Pset; w=weights, verbose=true, init=G)
# estimate how much you gain running the algorithm in multi-threaded mode
using BenchmarkTools
Pset=randP(20, 120)
@benchmark(wasMean(Pset; ⏩=false)) # single-threaded
@benchmark(wasMean(Pset)) # multi-threaded
PosDefManifold.powerMean
— Function powerMean(𝐏::Union{ℍVector, 𝔻Vector}, p::Real;
<
w::Vector=[],
✓w=true,
init=nothing,
tol::Real=0.,
maxiter::Int=500,
verbose=false,
⏩=true >)
Given a 1d array $𝐏={P_1,...,P_k}$ of $k$ positive definite matrices of ℍVector type or real positive definite diagonal matrices of 𝔻Vector type, an optional non-negative real weights vector $w={w_1,...,w_k}$ and a real parameter p
$\in[-1, 1]$, return the 3-tuple $(G, iter, conv)$, where $G$ is Lim and Palfia (2012)'s power means of order $p$ and $iter$, $conv$ are the number of iterations and convergence attained by the algorithm, respectively. Mean $G$ is the unique positive definite matrix satisfying
$G=\sum_{i=1}^{k}(w_iG\textrm{#}_pP_i)$,
where $G\textrm{#}_pP_i$ is the Fisher geodesic equation. In particular:
- with $p=-1$ this is the harmonic mean (see the inverse Euclidean metric),
- with $p=+1$ this is the arithmetic mean (see the Euclidean metric),
- at the limit of $p$ evaluated at zero from both side this is the geometric mean (see Fisher metric).
For estimating power means for $p\in(-1, 1)$, this function implements the fixed-point iterative algorithm of (Congedo et al., 2017b)🎓. For $p=0$ (geometric mean) this algorithm is run two times with a small positive and negative value of $p$ and the geometric mean of the two resulting means is returned, as suggested in (Congedo et al., 2017b)🎓. This way of estimating the geometric mean of a set of matrices is faster as compared to the usual gradient descent algorithm.
If you don't pass a weight vector with <optional keyword argument> $w$, return the unweighted power mean.
If <optional keyword argument> ✓w=true
(default), the weights are normalized so as to sum up to 1, otherwise they are used as they are passed and should be already normalized. This option is provided to allow calling this function repeatedly without normalizing the same weights vector each time.
The following are more <optional keyword arguments>:
init
is a matrix to be used as initialization for the mean. If no matrix is provided, the instance of generalized means with parameter $p$ will be used.tol
is the tolerance for the convergence (see below).maxiter
is the maximum number of iterations allowed.- if
verbose
=true, the convergence attained at each iteration is printed and a warning is printed if convergence is not attained. - if ⏩=true the iterations are multi-threaded.
If the input is a 1d array of $k$ real positive definite diagonal matrices the solution is available in closed-form as the generalized mean of order p
, hence the <optional keyword arguments> init
, tol
and verbose
have no effect and return the 3-tuple $(G, 1, 0)$. See generalized means.
Multi-threading is automatically disabled if Julia is instructed to use only one thread. See Threads.
In normal circumstances this algorithm converges monothonically. If the algorithm diverges and verbose
is true a warning is printed indicating the iteration when this happened.
$tol$ defaults to the square root of Base.eps
of the nearest real type of data input $𝐏$. This corresponds to requiring the norm of the difference of the matrix solution over two successive iterations divided by the number of elements in the matrix to vanish for about half the significant digits.
(2) Like in (1), but for a 1d array $𝐃={D_1,...,D_k}$ of $k$ real positive definite diagonal matrices of 𝔻Vector type. In this case the solution is available in closed-form, hence the <optional keyword arguments> init
, tol
and verbose
have no effect and return the 3-tuple $(G, 1, 0)$. See generalized means.
See: power means, generalized means, modified Bhattacharyya mean.
See also: generalizedMean
, wasMean
, logdet0Mean
, mean
.
Examples
using LinearAlgebra, PosDefManifold
# Generate a set of 4 random 3x3 SPD matrices
Pset=randP(3, 4) # or, using unicode: 𝐏=randP(3, 4)
# unweighted mean
G, iter, conv = powerMean(Pset, 0.5) # or G, iter, conv = powerMean(𝐏, 0.5)
# weights vector, does not need to be normalized
weights=[1, 2, 3, 1]
# weighted mean
G, iter, conv = powerMean(Pset, 0.5; w=weights)
# print the convergence at all iterations
G, iter, conv = powerMean(Pset, 0.5; w=weights, verbose=true)
# suppose 𝐏 has changed a bit; initialize with G to hasten convergence
Pset[1]=ℍ(Pset[1]+(randP(3)/100))
G, iter, conv = powerMean(Pset, 0.5; w=weights, verbose=true, init=G)
# estimate how much you gain running the algorithm in multi-threaded mode
using BenchmarkTools
Pset=randP(20, 120)
@benchmark(powerMean(Pset, 0.5; ⏩=false)) # single-threaded
@benchmark(powerMean(Pset, 0.5)) # multi-threaded
PosDefManifold.inductiveMean
— Function(1) inductiveMean(metric::Metric, 𝐏::ℍVector)
(2) inductiveMean(metric::Metric, 𝐏::ℍVector, q::Int, Q::ℍ)
alias: indMean
(1) Compute the Fréchet mean of 1d array $𝐏={P_1,...,P_k}$ of $k$ positive definite matrices of ℍVector type with a law of large number inductive procedure (Ho et al., 2013; Lim and Palfia, 2019; Massart et al., 2018)🎓, such as
$G_1=P_1,$
$G_i=γ(i^{-1}, G_{(i-1)}, P_i), i=2,...,k,$
where $γ(i^{-1}, G_{(i-1)}, P_i)$ is a step on the geodesic relying $G_{(i-1)}$ to $P_i$ with arclength $i^{-1}$ using the specified metric
, of type Metric::Enumerated type.
(2) Like (1), but for the set of matrices $𝐐 ∪ 𝐏$, where it is assumed knowledge only of the set $𝐏$, the mean of $𝐐$ (Hermitian matrix argument Q
) and the number of matrices in $𝐐$ (integer argument q
). This method can be used, for example, for updating a block on-line algorithm, where $𝐏$ is the incoming block, Q
the previous mean estimation and q
the cumulative number of matrices on which the mean has been computed on-line.
For Fréchet means that do not have a closed form expression, this procedure features a computational complexity amounting to less than two iterations of gradient descent or fixed-point algorithms. This comes at the price of an approximation. In fact, the solution is not invariant to permutations of the matrices in array 𝐏 and convergence to the Fréchet mean for finite samples is not ensured (see Lim and Palfia, 2019; Massart et al., 2018)🎓.
Since the inductive mean uses the geodesic
function, it is not available for the Von Neumann metric.
Examples
# A set of 100 matrices for which we want to compute the mean
𝐏=randP(10, 100)
𝐏1=ℍVector(collect(𝐏[i] for i=1:50)) # first 50
𝐏2=ℍVector(collect(𝐏[i] for i=51:100)) # last 50
# inductive mean of the whole set 𝐏
G=inductiveMean(Fisher, 𝐏)
# mean using the usual gradient descent algorithm
H, iter, conv=geometricMean(𝐏)
# inductive mean of 𝐏 given only 𝐏2,
# the number of matrices in 𝐏1 and the mean of 𝐏1
G2=inductiveMean(Fisher, 𝐏2, length(𝐏1), mean(Fisher, 𝐏1))
# average error
norm(G-H)/(dim(G, 1)^2)
norm(G2-H)/(dim(G, 1)^2)
PosDefManifold.midrange
— Functionmidrange(metric::Metric, P::ℍ{T}, Q::ℍ{T}) where T<:RealOrComplex
Midrange (average of extremal values) of positive definite matrices $P$ and $Q$. Only the Fisher metric is supported, allowing the so-called geometric midrange. This has been defined in Mostajeran et al. (2019) 🎓 as
$P * Q = \frac{1}{\sqrt{\lambda_(min)}+\sqrt{\lambda_(max)}}\Big(Q+\sqrt{\lambda_(min)*\lambda_(max)}P\Big)$,
where $\lambda_(min)$ and $\lambda_(max)$ are the extremal generalized eigenvalues of $P$ and $Q$.
Examples
P=randP(3)
Q=randP(3)
M=midrange(Fisher, P, Q)
Tangent Space operations
Function | Description |
---|---|
logMap | Logarithmic map (from manifold to tangent space) |
expMap | Exponential map (from tangent space to manifold) |
vecP | vectorization of matrices in the tangent space |
matP | matrization of matrices in the tangent space (inverse of vecp ) |
parallelTransport , pt | Parallel transport of tangent vectors and matrices |
⋅
PosDefManifold.logMap
— Function(1) logMap(metric::Metric, P::ℍ{T}, G::ℍ{T})
(2) logMap(metric::Metric, 𝐏::ℍVector, G::ℍ{T})
for all the above: where T<:RealOrComplex
(1) Logaritmic Map: map a positive definite matrix $P$ from the SPD or Hermitian manifold into the tangent space at base-point $G$ using the Fisher metric.
$P$ and $G$ must be flagged as Hermitian
. See typecasting matrices.
The map is defined as
$Log_G(P)=S=G^{1/2}\textrm{log}\big(G^{-1/2}PG^{-1/2}\big)G^{1/2}$.
metric
is a metric of type Metric::Enumerated type.
The result is an Hermitian
matrix.
(2) Logarithmic map (1) at base-point $G$ at once for $k$ positive definite matrices in 1d array $𝐏={P_1,...,P_k}$ of ℍVector type.
The result is an ℍVector
.
Currently only the Fisher metric is supported for tangent space operations.
The inverse operation is expMap
.
See also: vecP
, parallelTransport
.
Examples
using PosDefManifold
(1)
P=randP(3)
Q=randP(3)
metric=Fisher
G=mean(metric, P, Q)
# projecting P at the base point given by the geometric mean of P and Q
S=logMap(metric, P, G)
(2)
Pset=randP(3, 4)
# projecting all matrices in Pset at the base point given by their geometric mean.
Sset=logMap(Fisher, Pset, mean(Fisher, Pset))
PosDefManifold.expMap
— Function(1) expMap(metric::Metric, S::ℍ{T}, G::ℍ{T})
(2) expMap(metric::Metric, 𝐒::ℍVector, G::ℍ{T})
for all the above: where T<:RealOrComplex
(1) Exponential Map: map a tangent vector (a matrix) $S$ from the tangent space at base-point $G$ into the SPD or Hermitian manifold (using the Fisher metric).
$S$ and $G$ must be flagged as Hermitian
. See typecasting matrices.
The map is defined as
$Exp_G(S)=P=G^{1/2}\textrm{exp}\big(G^{-1/2}SG^{-1/2}\big)G^{1/2}$.
metric
is a metric of type Metric::Enumerated type.
The result is an Hermitian
matrix.
(2) Exponential map (1) at base-point $G$ at once for $k$ tangent vectors (matrices) in 1d array $𝐒={S_1,...,S_k}$ of ℍVector type.
The result is an ℍVector
.
Currently only the Fisher metric is supported for tangent space operations.
The inverse operation is logMap
.
Examples
(1)
using PosDefManifold, LinearAlgebra
P=randP(3)
Q=randP(3)
G=mean(Fisher, P, Q)
# projecting P on the tangent space at the Fisher mean base point G
S=logMap(Fisher, P, G)
# projecting back onto the manifold
P2=expMap(Fisher, S, G)
(2)
Pset=randP(3, 4)
# projecting all matrices in Pset at the base point given by their geometric mean.
G=mean(Fisher, Pset)
Sset=logMap(Fisher, Pset, G)
# projecting back onto the manifold
Pset2=expMap(Fisher, Sset, G)
PosDefManifold.vecP
— FunctionvecP(S::Union{ℍ{T}, Symmetric{R}};
range::UnitRange=1:size(S, 2)) where T<:RealOrComplex where R<:Real =
Vectorize a tangent vector (which is an Hermitian
or Symmetric
matrix) $S$: mat ↦ vec.
It gives weight $1$ to diagonal elements and $√2$ to off-diagonal elements so as to preserve the norm (Barachant et al., 201E)🎓, such as
$∥S∥_F=∥vecP(S)∥_F$.
The result is a vector holding $n(n+1)/2$ elements, where $n$ is the size of $S$.
$S$ must be flagged as Hermitian
or Symmetric
. See typecasting matrices.
The reverse operation is provided by matP
, which always return an Hermitian
matrix.
If an optional keyword argument range
is provided, the vectorization concerns only the rows (or columns, since the input matrix is symmetric or Hermitian) in the range. Note that in this case the operation cannot be reverted by the matP
, that is, in this case the matrix is 'stuck' in the tangent space.
Examples
using PosDefManifold
P=randP(3)
Q=randP(3)
G=mean(Fisher, P, Q)
# projecting P at the base point given by the geometric mean of P and Q
S=logMap(Fisher, P, G)
# vectorize S
v=vecP(S)
# vectorize onlt the first two columns of S
v=vecP(S; range=1:2)
PosDefManifold.matP
— FunctionmatP(ς::Vector{T}) where T<:RealOrComplex
Matrizize a tangent vector (vector) ς : vec -> mat.
This is the function reversing the vecP
function, thus the weighting applied therein is reversed as well.
If $ς=vecP(S)$ and $S$ is a $n⋅n$ Hermitian or Symmetric matrix, $ς$ is a tangent vector of size $n(n+1)/2$. The result of calling matP(ς)
is then $n⋅n$ matrix $S$. $S$ is always returned flagged as Hermitian
.
To Do: This function may be rewritten more efficiently.
Examples
using PosDefManifold
P=randP(3)
Q=randP(3)
G=mean(Fishr, P, Q)
# projecting P at onto the tangent space at the Fisher mean base point
S=logMap(Fisher, P, G)
# vectorize S
v=vecP(S)
# Rotate the vector by an orthogonal matrix
n=Int(size(S, 1)*(size(S, 1)+1)/2)
U=randP(n)
z=U*v
# Get the point in the tangent space
S=matP(z)
PosDefManifold.parallelTransport
— Function(1) parallelTransport(S::ℍ{T}, P::ℍ{T}, Q::ℍ{T})
(2) parallelTransport(S::ℍ{T}, P::ℍ{T})
(3) parallelTransport(𝐒::ℍVector, P::ℍ{T}, Q::ℍ{T})
(4) parallelTransport(𝐒::ℍVector, P::ℍ{T})
for all the above: where T<:RealOrComplex
alias: pt
(1) Parallel transport of tangent vector $S$ (a matrix) lying on the tangent space at base-point $P$ to the tangent space at base-point $Q$.
$S$, $P$ and $Q$ must all be Hermitian
matrices. Return an Hermitian
matrix. The transport is defined as:
$∥_{(P→Q)}(S)=\big(QP^{-1}\big)^{1/2}S\big(QP^{-1}\big)^{H/2}$.
If $S$ is a positive definite matrix in the manifold (and not a tangent vector) it will be 'trasported' from $P$ to $Q$, amounting to (Yair et al., 2019🎓)
- project $S$ onto the tangent space at base-point $P$,
- parallel transport it to the tangent space at base-point $Q$,
- project it back onto the manifold at base-point $Q$.
(2) Parallel transport as in (1), but to the tangent space at base-point the identity matrix.
The transport reduces in this case to:
$∥_{(P→I)}(S)=P^{-1/2}SP^{-1/2}$.
(3) Parallel transport as in (1) at once for $k$ tangent vectors (matrices) in 1d array $𝐒={S_1,...,S_k}$ of ℍVector type.
(4) Parallel transport as in (2) at once for $k$ tangent vectors (matrices) in 1d array $𝐒={S_1,...,S_k}$ of ℍVector type.
Currently only the Fisher metric is supported for parallel transport.
See also: logMap
, expMap
, vecP
, matP
.
Examples
using PosDefManifold
(1)
P=randP(3)
Q=randP(3)
G=mean(Fisher, P, Q)
# i. projecting P onto the tangent space at base-point G
S=logMap(Fisher, P, G)
# ii. parallel transport S to the tangent space at base-point Q
S_=parallelTransport(S, G, Q)
# iii. projecting back into the manifold at base-point Q
P_=expMap(Fisher, S_, Q)
# i., ii. and iii. can be done simply by
PP_=parallelTransport(P, G, Q)
# check
P_≈PP_ ? println(" ⭐ ") : println(" ⛔ ")
(2)
P=randP(3)
Q=randP(3)
G=mean(Fisher, P, Q)
# transport to the tangent space at base-point the identity
PP_=parallelTransport(P, G)
(3)
Pset=randP(3, 4)
Q=randP(3)
G=mean(Fisher, Pset)
# trasport at once all matrices in Pset
Pset2=parallelTransport(Pset, G, Q)
(4)
Pset=randP(3, 4)
G=mean(Fisher, Pset)
# recenter all matrices so to have mean=I
Pset2=parallelTransport(Pset, G)
# check
mean(Fisher, Pset2) ≈ I ? println(" ⭐ ") : println(" ⛔ ")
Procrustes problems
Function | Description |
---|---|
procrustes | Solution to the Procrustes problem in the manifold of positive definite matrices |
⋅
PosDefManifold.procrustes
— Functionprocrustes(P::ℍ{T}, Q::ℍ{T}, extremum="min") where T<:RealOrComplex
Given two positive definite matrices $P$ and $Q$, return by default the solution of problem
$\textrm{argmin}_Uδ(P,U^HQU)$,
where $U$ varies over the set of unitary matrices and $δ(.,.)$ is a distance or divergence function.
$U^HQU$ is named in physics the unitary orbit of $Q$.
If the argument extremum
is passed as "max", it returns instead the solution of
$\textrm{argmax}_Uδ(P,U^HQU)$.
$P$ and $Q$ must be flagged as Hermitian
. See typecasting matrices.
As it has been shown in Bhatia and Congedo (2019)🎓, using each of the Fisher, logdet zero, Wasserstein and the Kullback-Leibler divergence (see logdet α), the best approximant to $P$ from the unitary orbit of $Q$ commutes with $P$ and, surprisingly, has the same closed-form expression, namely
$U_Q^↓U_P^{↓H}$ for the argmin and $U_Q^↑U_P^{↓H}$ for the argmax,
where $U^↓$ denotes the eigenvector matrix of the subscript argument with eigenvectors in columns sorted by decreasing order of corresponding eigenvalues and $U^↑$ denotes the eigenvector matrix of the subscript argument with eigenvectors in columns sorted by increasing order of corresponding eigenvalues.
The same solutions are known since a long time also by solving the extremal problem here above using the Euclidean metric (Umeyama, 1988).
The generalized Procrustes problem
$\textrm{argmin}_Usum_{i=1}^{k}δ(P_i,U^HQ_iU)$
can be solved using Julia package Manopt.
Examples
using PosDefManifold
P=randP(3)
Q=randP(3)
# argmin problem
U=procrustes(P, Q)
# argmax problem
V=procrustes(P, Q, "max")