Diagonalizations.jl

dependencies

standard Julia packagesexternal packages
LinearAlgebraCovarianceEstimation
StatisticsPosDefManifold
StatsBase

types

abstract type LinearFilters end

is the abstract type for all filters.

All filters are instances of the following immutable structure:

LinearFilter

struct LinearFilter <: LinearFilters
   F     :: AbstractArray
   iF    :: AbstractArray
   D     :: Diagonal
   eVar  :: Float64
   ev    :: Vec
   arev  :: Vec
   name  :: String

Fields:

.F: for simple filters (PCA, Whitening, CSP, AJD), this is an $n⋅p$ matrix, where $n$ is the number of variables in the data the filter has been derived from and $p$ is the subspace dimension. For composite filters this is a vector of $m$ of such $n⋅p$ matrices, where

.iF: the $p⋅n$ left-inverse of the filter(s) in .F, that is, multiplying on the right all matrices in .iF by the corresponding matrices in .F yields the $p⋅p$ identity matrix.

The following fields are populated by default, but may be set altogether to nothing by all constructors using the simple optional keyword argument:

.D: a $p⋅p$ Diagonal matrix holding the (generalized) eigenvalues or singular values, depending on the filter, of the last diagonalization that has been used to derive the filter. Since .D is in diagonal form, this matrix can be used directly in algebraic computations. For example, if a is a PCA filter computed from covariance matric $C$ and $p=n$, then C≈a.F*a.D*a.iF is true. In a similar way, if b is a MCA filter computed from cross-covariance matric $C_{xy}$ and $p=n$, then C_xy≈b.F[1]*b.D*b.F[2]' is true.

.eVar: the actual explained variance or variance ratio of the filter(s) in .F. This depends on the filter, see the documentation of each filter for details.

.ev: a vector holding all the n (generalized) eigenvalues or singular values, depending on the filter, of the last diagonalization that has been used to derive the filter. If $p=n$, this is a vector holding the diagonal of .D, otherwise .D holds in diagonal form only the first p elements of .ev.

arev.: a vector holding an accumulated regularized function of .ev used to find the subspace dimension. This depends on the filter, see the documentation of each filter for details.

LinearFilter methods

The LinearFilter structure supports the following methods:

size(f::LF)

Return the size of f.F if it is a matrix, an iterator over the sizes of all matrices in f.F if it is a vector of matrices.

length(f::LF)

Return 1 if f.F is a matrix, the number of matrices in f.F if it is a vector of matrices. Referring to Table 1 and Fig. 1 (see Overview), this is the number of datasets $m$.

eltype(f::LF)

Return the element type of the matrix(ces) in f.F.

==(f::LF, g::LF), ≈(f::LF, g::LF)

Return true if all fields of LinearFilter f and g are equivalent, false otherwise. All but the .F and .iF fields are requested to be equal, where for vector fields approximate equality is ascetrained using the function. For the equivalence of the matrices in fields .F and .iF, it is requested that the mean spForm index of the matrices in f.iF times the corresponding matrices in g.F and of the matrices in g.iF times the corresponding matrices in f.F is smaller then 0.05.

≠(f::LF, g::LF), ≉ (f::LF, g::LF)

The negation of ==.

function cut(f::LinearFilter, p::Int64)

Create another LinearFilter object with a smaller subspace dimension given by argument p. This applies to the matrix(ces) in fields .F, .iF and .D. All other fields remain the same.

data input

All filter constructors take as input either data matrices or covariance matrices. Covariance matrices must be flagged by Julia as either Symmetric, if they are real, or Hermitian, if they are real or complex, e.g.,

X=randn(100, 30)
C=(X'*X)/100
p=pca(Symmetric(C))
# p=pca(C) will throw an ArgumentError

the above call to the pca constructor is equivalent to

X=randn(100, 30)
p=pca(X)

Some methods take as input a vector of Hermitian matrices, of type ℍVector, see typecasting matrices.

Using real data matrices as input, shrinked covariance matrix estimators can be used for several filters (e.g., PCA, Whitening, CSP, CSTP). See here below.

covariance matrix estimations

By default, when data matrices are used as data input, Diagonalizations.jl computes covariance matrices along the larger dimension of the data matrices. That is, for $r⋅c$ data matrix $X$, if $r>c$ $\frac{1}{r}X^{T}X$ is computed, otherwise $\frac{1}{c}XX^{T}$ is computed. Hence, the default behavior assumes that the number of observations is larger than the number of variables, as it is usually appropriate. Covariance matrices can be computed along a specific dimension using optional keyword argument dims, as in StatsBase.

Many filter constructors allow to use shrinked covariance matrix estimations (only for real data) by means of the CovarianceEstimation package. The following constants are provided to allow quick access to the most popular choices among the many estimators implemented therein:

SCM=SimpleCovariance()

This is the default for all constructors and corresponds to the standard sample covariance matrix (SCM) estimation.

LShrLW=LShr(ConstantCorrelation())

This corresponds to the SCM estimator shrinked by the linear method of Ledoit and Wolf (2004) 🎓.

LShr=LinearShrinkage

This is a shortcut for requesting other types of linear Shrinkage. See the CovarianceEstimation package for details.

NShrLW=AnalyticalNonlinearShrinkage()

This corresponds to the SCM estimator shrinked by the analytical non-linear method of Ledoit and Wolf (2018) 🎓.

Also, several filter constructors allow to use a mean and w keyword arguments:

mean can be used to subtract the mean from the variables of data matrices (e.g., data matrix X).

  • if mean=0, the mean will not be subtracted (default);
  • if mean=nothing, the mean will be computed and subtracted;
  • mean can be a vector of means to be subtracted:
    • it must have length=size(X, 2) if dims=1, length=size(X, 2) if dims=2;
  • mean can also be a matrix of means to be subtracted:
    • it must have size=(1, size(X, 2)) if dims=1, size=(size(X, 1), 1) if dims=2.
Nota Bene

For filter constructors taking as input sets of data matrices, the mean argument can be set only to 0 (default) or nothing.

w can be nothing (default) or a StatsBase.AbstractWeights object to weights the samples of data matrices (e.g., data matrix X). It must have length=size(X, dims), where dims by default is set to the larger dimesnion of X. For some constructors w can also be a function. This is documented in the concerned constructors.

Note that if several data matrices can be given as input to filter constructors, for example X, Y,..., then you will find arguments named such as meanX, meanY,... and wX, wY,... to differentiate the mean ad weights of the several input data matrices.

Examples:

using Diagonalizations

X=randn(100, 30) # X is 'tall'
p=pca(X)

The call here above uses the default SCM estimator and computes the PCA from the $30⋅30$ covariance matrix $\frac{1}{100}X^{T}X$. The 'filter' p.F is $30⋅p$, where $p$ is the subspace dimension. For complex data the call is the same:

Xc=randn(ComplexF64, 100, 30)
pc=pca(Xc)

This call

p=pca(X; covEst=LShrLW)

uses the linear shrinked estimator of Ledoit and Wolf (2004).

The call

p=pca(X; dims=2)

uses the default SCM estimator and computes the PCA from the $100⋅100$ (rank-deficient) covariance matrix $\frac{1}{30}XX^{T}$. The 'filter' p.F is in this case $100⋅p$.

mean covariance matrix estimations

Some filters can take as input data a set of data matrices (a vector of matrices). In this case a covariance matrix is estimated for each data matrix in the set and then a mean of these covariance matrices is estimated.

If the covariance matrices are actually cross-covariance matrices, no option is provided and the usual arithmetic mean is computed. If they are Symmetric or Hermitian covariance matrices, the mean function of package PosDefManifold.jl is used, since those matrices may be positive definite by construction, hence a mean using a metric acting on the Riemannian manifold of positive definite matrices may be used.

The constructors using this feature employ the following optional keyword arguments for regulating the computation of the mean:

metric: the metric used to compute the mean. PosDefManifold.jl supports 10 metrics, nine of which can be used here (all but the VonNeumann metric). Of particular interest are the following

  1. Fisher: the natural affine invariant metric, possessing all good properties of a mean.
  2. logEuclidean, Jeffrey: computationally cheaper alternatives to 1), but not possessing all good properties of a mean.
  3. invEuclidean: leading to the matrix harmonic mean.
  4. Euclidean: (default) leading to the usual matrix arithmetic mean, thus applying also if the input matrices are not positive-definite.
  5. Wasserstein: a metric widely adopted in statistics, optimal transport and quantum physics (also known as Bures-Hellinger), also applying if the input matrices are not positive-definite.

Note that, since by default covariance matrices are computed along the larger dimension of data matrices, the covariance matrices will be positive definite as long as the number of observations is sufficiently larger then the number of variables.

See the documentation of the mean function for arguments w, ✓w, init, tol and verbose. Note that the name of the w (and init) arguments may actually be wCx, w₁ (initCx, init₁) and similar. This is to allow using them for different data matrices used to construct the filter.

subspace dimension

For a 'wide' $n⋅t$ data matrix $X$, where $n$ is the number of variables and $t>n$ the number of samples, with $n⋅n$ covariance matrix $C$, the transformed data is given by $F^{H}X$ and the transformed covariance matrix by $F^{H}CF$.

In the above, $F$ is the $n⋅p$ filter matrix and $p$ is named the subspace dimension. The data filtered in this subspace is given by

$\widetilde{X}=F^{-H}F^{H}X$

and the filtered covariance by

$\widetilde{C}=F^{-H}F^{H}CFF^{-1}$.

If the matrix $X$ is available in the 'tall' form $t⋅n$ (default), the tranformed data is given by $XF$ and the data filtered in the subspace is given by

$\widetilde{X}=XFF^{-1}$.

The expressions for the transformed and filtered covariance matrix are the same as before.

For all filters Diagonalizations.jl allows to set the subspace dimension $p$ using the eVar and eVarMeth optional keyword arguments. Ultimately, $p$ will be en integer $∈[1, n]$ representing the subspace dimension. The user may set $p$:

  • manually,

either setting eVar explicitly to an integer $∈[1, n]$, or setting eVar to the desired explained variance in the subspace filtered data, as a float $∈(0, 1]$, where $1$ corresponds to the total variance.

  • automatically (default),

according to the .arev (accumulated regularized eigenvalues) vector that is computed by the filter constructors. This vector is non-decreasing and the last element is always $1.0$. The way it is computed depends on the filter. Please refer to the documentation of each filter for details on how the .arev vector is defined.

When eVar is given as a float or when such float ia allowed to be chosen automatically (default), the function passed as the eVarMeth argument determines the subspace dimension $p$ so as to explain an amount of variance as close as possible to the desierd eVar. In fact .arev holds only $n$ discrete possible values of explained variance. Therefore, the .arev vector is passed to the eVarMeth function. By default, eVarMeth is set to the Julia standard searchsortedfirst function, which will select the smallest $p$ allowing at least eVar explained variance. This amounts to rounding up the desired eVar variance. Another useful choice is the Julia standard searchsortedlast function, which will select the largest $p$ allowing at most eVar explained variance. This amounts to rounding down the desired eVar variance.

You can pass a user-defined function as eVarMeth. The function you define will take the .arev vector computed by the filter constructor as input and will return an integer, which will be automatically clamped to be $∈[1, n]$.

Note that once the filter has been constructed, its .eVar field will hold the actual explained variance, not the desired one that has been passed to the constructor using the eVar argument.

Note also that for some filter constructors you will find the eVar optional keyword argument and also other arguments with simialr name, such as eVarCx and eVarCy. These arguments act in a similar way as the main eVar argument, but apply to determine the subspace dimension of intermediate diagonalization procedures, typically, pre-whitening procedures. See also notation & nomenclature and covariance matrix estimations.

scale and permutation

Let $F$ be a diagonalizer of matrix $C$, i.e.,

$F^{H}CF=Λ$

with $Λ$ a diagonal matrix. Let $P$ a permutation matrix and $O$ a diagonal matrix whose entries are either $1$ or $-1$. It is easy to verify then that any matrix $FPO$ is an identical diagonalizer of $C$, since $OP^{H}F^{H}CFPO=Λ$. This implies that the filter matrices found by an exact diagonalization procedures are arbitrary up to sign and permutation of their columns.

If $D$ is a generic diagonal matrix, it is easy to verify then that any matrix $FPD$ is an equivalent diagonalizer of $C$ (Belouchrani et al., 1997 🎓), since $DP^{H}F^{H}CFPD$ is also diagonal, albeit different from $Λ$. This implies that there exist infinite equivalent exact diagonalizers and that the solution is arbitrary up to scale and permutation of the columns. Of course, the scale ambiguity implies the sign ambiguity, but not vice versa. All exact diagonalization procedures implicitly constraint the solution to find $P$ and $D$ such that $Λ$ possesses a desired property. For example, in principal component analysis the elements of $Λ$ are the maximum values that can be attained constraining $F$ to be orthogonal.

The same ambiguity applies to approximate joint diagonalization. Let $F$ be an approximate joint diagonalizer of matrix set ${C_1,...,C_K}$, i.e.,

$F^{H}C_lF≈Λ_k$ for $l∈[1, k]$

and let $D$ be a diagonal matrix, then it is easy to verify that any matrix $FPD$ is an equivalent approximate joint diagonalizer of the set $C$. To check if two diagonalizers are equivaent, you can use the spForm function.

notation & nomenclature

Throughout the code and documentation of this package the following notation is followed:

  • scalars and vectors are denoted using lower-case letters, e.g., x, y,
  • matrices using upper case letters, e.g., X, Y,
  • sets (vectors) of matrices using bold upper-case letters, e.g., 𝐗, 𝐘.
  • superscripts H and T denote matrix complex conjugate-transpose and transpose.

The following nomenclature is used consistently:

  • $X$, $Y$: data matrices
  • $𝐗$, $𝐘$: vectors of data matrices
  • $C$: a covariance matrix
  • $𝐂$: a vector of covariance matrices
  • $C_x$: the covariance matrix of data matrix $X$
  • $C_{xy}$: the cross-covariance matrix of $X$ and $Y$
  • $U$, $V$: orthogonal matrices of eigenvectors or the left and right singular vectors
  • $λ$: vector of eigenvalues, singular values or a function thereof
  • $Λ$: diagonal matrix of eigenvalues, singular values or a function thereof
  • $B$, $F$: non-singular matrices

In the examples, bold upper-case letters are replaced by upper case letters in order to allow reading in the REPL.

acronyms

  • AJD: Approximate Joint Diagonalization (Cardoso & Souloumiac, 1996; Flury & Gautschi, 1986)
  • AJEVD: Approximate Joint Eigenvalue-Eigenvector Decomposition
  • AJSVD: Approximate Joint Singular Value Decomposition (Congedo et al., 2011)
  • AMUSE: Algorithm for Multiple Source Extraction (Molgedey & Schuster, 1994; Tong et al., 1991)
  • BSS: Blind Source Separation
  • CCA: Canonical Correlation Analysis (Hotelling, 1936)
  • CSP: Common Spatial Pattern (Fukunaga, 1990)
  • CSTP: Common Spatio-Temporal Pattern (Congedo et al., 2016)
  • EEG: Electroencephalography
  • ERP: Event-Related Potentials
  • FOBI: Fourth-Order Blind Identification (Cardoso, 1989)
  • GAJD: Gauss AJD algorithm (Frobenius cost function), unpublished from the author of this package.
  • GLogLike: Gauss AJD algorithm (log-likelihood cost function), still experimental.
  • gCCA: generalized CCA
  • gMCA: generalized MCA
  • JADE: Joint Diagonalization of Eigenmatrices AJD algorithm (Cardoso & Souloumiac, 1993)
  • JADEmax: JADE using the Riemannian gradient to order the rotations (Usevich et al., 2020)
  • LogLike: AJD algorithm optimizing the Log-Likelihood cost function (Pham, 2000)
  • LogLikeR: another implementation of LogLike for real data only.
  • LShrLW: Linear Shrinkage of Ledoit and Wolf (2004)
  • NShrLW: Non-linear Shrinkage of Ledoit and Wolf (2018)
  • mAJD: multiple AJD (jointly on several data sets)
  • MCA: Maximum Covariance Analysis
  • NoJoB: Non-orthogonal mAJD algorithm (Congedo et al., 2012)
  • OJoB: Orthogonal mAJD algorithm (Congedo et al., 2012)
  • PCA: Principal Component Analysis (Pearson, 1901)
  • QNLogLike: Quasi-Newton LogLike (Ablin et al., 2019)
  • SCM: Sample Covariance Matrix
  • SOBI: Second-Order Blind Identification (Belouchrani et al., 1997)

For the references see 🎓.