CSP

The Common Spatial Pattern (CSP) are filters obtained by generalized eigenvalue-eigenvector decomposition. They corresponds to the situation $m=1$ (one dataset) and $k=2$ (two observation). The goal of a CSP filter is to maximize a variance ratio. Let $(X_1, X_2)$ be $(n⋅t_1, n⋅t_2)$ data matrices, where $n$ is their number of variables and $(t_1, t_2)$ their number of samples. Let $(C_1, C_2)$ be the $n⋅n$ covariance matrices of data matrices $(X_1, X_2)$ and $C=C_1+C_2$. The CSP consists in the joint diagonalization of $C$ and $C_1$ or, equivalently, of $C$ and $C_2$ (Fukunaga, 1990, p. 31-33 🎓). The joint diagonalizer $B$ is scaled such as to verify (see scale and permutation)

$\left \{ \begin{array}{rl}B^TCB=I\\B^TC_1B=Λ\\B^TC_2B=I-Λ \end{array} \right.$, $\hspace{1cm}$ [csp.1]

That is to say, $λ_1≥\ldots≥λ_n$ are the diagonal elements of $B^TC_1B$ and $1-λ_1≤\ldots≤1-λ_n$ the diagonal elements of $B^TC_2B$. The CSP maximizes the ratio between the variance of the corresponding components of transformed processes $B^TX_1$ and $B^TX_2$, constraining their sum to unity. The ratio is ordered by descending order such as

$σ=[λ_1/(1-λ_1)≥\ldots≥λ_n/(1-λ_n)]$, $\hspace{1cm}$ [csp.2]

The CSP has two use cases, which can be selected using the selMeth optional keyword argument of its constructors:

  • a) Separating two classes. In this case $C_1$ and $C_2$ are covariance matrices of two distinct classes. The constructors will retain a filter with form $\widetilde{B}=[B_1 B_2]$, where we have defined partitions $B=[B_1 B_0 B_2]$. $B_1$ and $B_2$ are the first $p_1$ and last $p_2$ vectors of $B$ corresponding to high and low values of the variance ratio [csp.2], that is, explaining variance that is useful for separating the classes, while $B_0$ corresponds to components corresponding to variance ratios close to $1$, meaning that are not useful for separating the classes. The subspace dimension in this use case will be given by $p=p_1+p_2$. Notice that in Diagonalizations.jl we allow $p_1≠p_2$.
  • b) Enhance the signal-to-noise ratio. In this case $C_1$ and $C_2$ are covariance matrices of the 'signal' and of the 'noise', respectively. The constructors will retain a filter $\widetilde{B}=[b_1 \ldots b_p]$ holding the first $p$ vectors of $B$ corresponding to the highest values of the variance ratio [csp.2]. The discared $n-p$ vectors correspond to components that explains progressively more and more variance related to the noise process. In this case we have a natural landmark for selecting automatically the subspace dimension $p$, as the larger dimension whose variance ratio is greater then $1$.

In order to retrive the appropriate partitions of $B$ to construct a filter given a subspace dimension $p$, we need a way to measure the distance of the ratios [csp.2] from $1$. For this purpose we will make use of the Fisher distance adopted on the Riemannian manifold of positive definite matrices, in its scalar form, yielding

$δ_i=\textrm{log}^2(σ_i)$, for $i=[1 \ldots n]$. $\hspace{1cm}$ [csp.3]

After this transformation, extreme values of the ratio becomes high, that is, the function $δ_i$ assumes the shape of a (non-symmetric) wine cup and has a minimum close to zero.

Now let $δ_{TOT}=\sum_{i=1}^nδ_i$ be the total distance and define the explained variance of the CSP for dimension $p$ such as

$v_p=\frac{\sum_{j=1}^pδ_j}{δ_{TOT}}$, $\hspace{1cm}$ [csp.4]

where the $δ_j$'s are given by [csp.3]. Note that for use case a) described here above the accumulated sums in [csp.4] are computed after sorting the $δ_i$ values in descending order.

The $.arev$ field of CSP filter is defined as the accumulated variance ratio given by

$[v_1≤\ldots≤v_n]$, $\hspace{1cm}$ [csp.5]

where the $v_i$'s' are defined in [csp.4].

For setting the subspace dimension $p$ manually, set the eVar optional keyword argument of the CSP constructors either to an integer or to a real number (see subspace dimension). If you don't, by default $p$ is chosen (see Fig. 2)

  • in use case a) as the larger dimension $p$ whose sorted $δ_p$ value [csp.3] is larger then the geometric mean of $δ$ $1$. $n÷2$ is taken as upper bound.
  • in use case b) as the larger dimension $p$ whose $σ_p$ value [csp.2] is larger then $1$.

In both cases the dimension corresponding to the munimum of [csp.3] is taken as an upper bound and $p$ can be equal to at most $n-1$.

Figure 2 Figure 2 Illustration of the way CSP constrcuctors determines automatically the subspace dimension $p$ under use case a) and b). On the left, we are interested in the $p_1$ and $p_2$ vectors of $B$ associated with extremal values of the variance ratios [csp.2], which are the high values of $δ$ [csp.3] enclosed in the shaded boxes of the figure. The threshold is set to the geometric mean of $δ$. Since Diagonalizations.jl always sorts the explained variance in descending order, $\widetilde{B}$ is defined as $[b_1, b_{10}, b_9, b_2, b_8, b_3]$. On the right, we are interested in the first $p$ vectors of $B$ associated with positive values of the variance ratios [csp.2], which are the high values on the left of the minimum of $δ$ enclosed in the shaded box of the figure. The threshold is set in this case to the first value of $σ$ [csp.2] smaller then $1$, which roughly coincides with the minimum of $δ$ reported in the figure. Assuming in this example they do coincide, $\widetilde{B}$ would be defined as $[b_1 \ldots b_5]$.

Solution

The CSP solution $B$ is given by the generalized eigenvalue-eigenvector decomposition of the pair $(C, C_1)$.

A numerically preferable solution is the following two-step procedure:

  1. get a whitening matrix $\hspace{0.1cm}W\hspace{0.1cm}$ such that $\hspace{0.1cm}W^TCW=I\hspace{0.1cm}$.
  2. do $\hspace{0.1cm}\textrm{EVD}(W^TC_{1}W)=UΛU^{T}$

The solution is $\hspace{0.1cm}B=WU$.

Constructors

Three constructors are available (see here below). The constructed LinearFilter object holding the CSP will have fields:

.F: matrix $\widetilde{B}$ as defined above. This is just $B$ of [csp.1] if optional keyword argument simple=true is passed to the constructors (see below).

.iF: the left-inverse of .F

.D: the $p⋅p$ diagonal matrix with the elements of $Λ$ [csp.1] corresponding to the vectors of $\widetilde{B}$.

.eVar: the explained variance for the chosen value of $p$, given by the $p^{th}$ value of [csp.5]

.ev: the vector holding all $n$ generalized eigenvalues, i.e., the diagonal elements of matrix $Λ$ [csp.1]. Notice that if selMeth=:extremal these elements are sorted differently than in .D and their position do not correspond to the vectors of .F.

.arev: the accumulated regularized eigenvalues, defined in [csp.5].

Nota Bene

.eVar and .arev are computed on sorted $δ_i$ values in use case a), see [csp.5], [csp.4] and Fig. 2.

Diagonalizations.cspFunction
(1)
function csp(C₁ :: SorH, C₂ :: SorH;
             eVar     :: TeVaro = ○,
             eVarC    :: TeVaro = ○,
             eVarMeth :: Function = searchsortedfirst,
             selMeth  :: Symbol = :extremal,
             simple   :: Bool = false)

(2)
function csp(X₁ :: Mat, X₂ :: Mat;
             covEst   :: StatsBase.CovarianceEstimator = SCM,
             dims     :: Into = ○,
             meanX₁   :: Tmean = 0,
             meanX₂   :: Tmean = 0,
             wX₁      :: Tw = ○,
             wX₂      :: Tw = ○,
          eVar     :: TeVaro = ○,
          eVarC    :: TeVaro = ○,
          eVarMeth :: Function = searchsortedfirst,
          selMeth  :: Symbol = :extremal,
          simple   :: Bool = false)

(3)
function csp(𝐗₁::VecMat, 𝐗₂::VecMat;
             covEst   :: StatsBase.CovarianceEstimator = SCM,
             dims     :: Into = ○,
             meanX₁   :: Into = 0,
             meanX₂   :: Into = 0,
          eVar     :: TeVaro = ○,
          eVarC    :: TeVaro = ○,
          eVarMeth :: Function = searchsortedfirst,
          selMeth  :: Symbol = :extremal,
          simple   :: Bool = false,
       metric   :: Metric = Euclidean,
       w₁       :: Vector = [],
       w₂       :: Vector = [],
       ✓w       :: Bool = true,
       init₁    :: SorHo = nothing,
       init₂    :: SorHo = nothing,
       tol      :: Real = 0.,
       verbose  :: Bool = false)

Return a LinearFilter object:

(1) Common spatial pattern with covariance matrices C_1 and C_2 of dimension $n⋅n$ as input. The subscript of the covariance matrices refers to the dims used to compute it (see above).

eVar, eVarC and eVarMeth are keyword optional arguments for defining the subspace dimension $p$. Particularly:

  • By default, the two-step procedure described above is used to find the solution. In this case eVarC is used for defining the subspace dimension of the whitening step. If eVarC=0.0 is passed (not to be confused with eVarC=0 ), the solution will be find by the generalized eigenvalue-eigenvector procedure.
  • eVar is the keyword optional argument for defining the subspace dimension $p$ using the .arev vector given by [csp.5].
  • eVarMeth applies to both eVarC and eVar. The default value is evarMeth=searchsortedfirst.

if selMeth=:extremal (default) use case a) Separating two classes described above is considered. Any other symbol for selMeth will instruct to consider instead the use case b) Enhance the signal-to-noise ratio.

If simple is set to true, $p$ is set equal to $n$ and only the fields .F and .iF are written in the constructed object. This option is provided for low-level work when you don't need to define a subspace dimension or you want to define it by your own methods.

(2) Common spatial pattern with data matrices X₁ and X₂ as input.

X₁ and X₂ are real or complex data matrices.

covEst, dims, meanX₁, meanX₂, wX₁ and wX₂ are optional keyword arguments to regulate the estimation of the covariance matrices $(C_1, C_2)$ of (X₁, X₂). Particularly (See covariance matrix estimations),

  • meanX₁ is the mean argument for data matrix X₁.
  • meanX₂ is the mean argument for data matrix X₂.
  • wX₁ is the w argument for estimating a weighted covariance matrix for X₁.
  • wX₂ is the w argument for estimating a weighted covariance matrix for X₂.
  • covEst applies to the estimations of both covariance matrices.

Once the two covariance matrices $C_1$ and $C_2$ estimated, method (1) is invoked with optional keyword arguments eVar, eVarC, eVarMeth, selMeth and simple. See method (1) for details.

(3) Common spatial pattern with two vectors of data matrices 𝐗₁ and 𝐗₂ as input.

𝐗₁ and 𝐗₂ do not need to hold the same number of matrices and the number of samples in the matrices they contain is arbitrary.

covEst, dims, meanX₁ and meanX₂ are optional keyword arguments to regulate the estimation of the covariance matrices for all matrices in 𝐗₁ and 𝐗₂. See method (2) and covariance matrix estimations.

A mean covariance matrix is computed separatedly from the covariance matrices computed from the data matrices in 𝐗₁ and 𝐗₂, using optional keywords arguments metric, w₁, w₂, ✓w, init₁, init₂, tol and verbose. Particularly (see mean covariance matrix estimations),

  • w₁ are the weights for the covariance matrices computed from 𝐗₁,
  • w₂ are the weights for the covariance matrices computed from 𝐗₂,
  • init₁ is the initialization for the mean of the covariance matrices computed from 𝐗₁,
  • init₂ is the initialization for the mean of the covariance matrices computed from 𝐗₂.

By default, the arithmetic mean is computed.

See also: CSTP, PCA, AJD, mAJD.

Examples:

using Diagonalizations, LinearAlgebra, PosDefManifold, Test

# Method (1) real
t, n=50, 10
X1=genDataMatrix(n, t)
X2=genDataMatrix(n, t)
Cx1=Symmetric((X1*X1')/t)
Cx2=Symmetric((X2*X2')/t)
C=Cx1+Cx2
cC=csp(Cx1, Cx2; simple=true)
Dx1=cC.F'*Cx1*cC.F
@test norm(Dx1-Diagonal(Dx1))+1≈1.
Dx2=cC.F'*Cx2*cC.F
@test norm(Dx2-Diagonal(Dx2))+1≈1.
@test cC.F'*C*cC.F≈I
@test norm(Dx1-(I-Dx2))+1≈1.

# Method (1) complex
t, n=50, 10
X1c=genDataMatrix(ComplexF64, n, t)
X2c=genDataMatrix(ComplexF64, n, t)
Cx1c=Hermitian((X1c*X1c')/t)
Cx2c=Hermitian((X2c*X2c')/t)
Cc=Cx1c+Cx2c
cCc=csp(Cx1c, Cx2c; simple=true)
Dx1c=cCc.F'*Cx1c*cCc.F
@test norm(Dx1c-Diagonal(Dx1c))+1. ≈ 1.
Dx2c=cCc.F'*Cx2c*cCc.F
@test norm(Dx2c-Diagonal(Dx2c))+1. ≈ 1.
@test cCc.F'*Cc*cCc.F≈I
@test norm(Dx1c-(I-Dx2c))+1. ≈ 1.


# Method (2) real
c12=csp(X1, X2, simple=true)
Dx1=c12.F'*Cx1*c12.F
@test norm(Dx1-Diagonal(Dx1))+1≈1.
Dx2=c12.F'*Cx2*c12.F
@test norm(Dx2-Diagonal(Dx2))+1≈1.
@test c12.F'*C*c12.F≈I
@test norm(Dx1-(I-Dx2))+1≈1.
@test cC==c12

# Method (2) complex
c12c=csp(X1c, X2c, simple=true)
Dx1c=c12c.F'*Cx1c*c12c.F
@test norm(Dx1c-Diagonal(Dx1c))+1. ≈ 1.
Dx2c=c12c.F'*Cx2c*c12c.F
@test norm(Dx2c-Diagonal(Dx2c))+1. ≈ 1.
@test c12c.F'*Cc*c12c.F≈I
@test norm(Dx1c-(I-Dx2c))+1. ≈ 1.
@test cCc==c12c


# Method (3) real
# CSP of the average covariance matrices
k=10
Xset=[genDataMatrix(n, t) for i=1:k]
Yset=[genDataMatrix(n, t) for i=1:k]

c=csp(Xset, Yset)

# ... selecting subspace dimension allowing an explained variance = 0.9
c=csp(Xset, Yset; eVar=0.9)

# ... subtracting the mean from the matrices in Xset and Yset
c=csp(Xset, Yset; meanX₁=nothing, meanX₂=nothing, eVar=0.9)

# csp on the average of the covariance and cross-covariance matrices
# computed along dims 1
c=csp(Xset, Yset; dims=1, eVar=0.9)

# name of the filter
c.name

using Plots
# plot regularized accumulated eigenvalues
plot(c.arev)


# plot the original covariance matrices and the transformed counterpart
# example when argument `selMeth` is `extremal` (default): 2-class separation
 cC=csp(Cx1, Cx2)
 Cx1Max=maximum(abs.(Cx1));
 h1 = heatmap(Cx1, clim=(-Cx1Max, Cx1Max), title="Cx1", yflip=true, c=:bluesreds);
 h2 = heatmap(cC.F'*Cx1*cC.F, clim=(0, 1), title="F'*Cx1*F", yflip=true, c=:amp);
 Cx2Max=maximum(abs.(Cx2));
 h3 = heatmap(Cx2, clim=(-Cx2Max, Cx2Max), title="Cx2", yflip=true, c=:bluesreds);
 h4 = heatmap(cC.F'*Cx2*cC.F, clim=(0, 1), title="F'*Cx2*F", yflip=true, c=:amp);
 CMax=maximum(abs.(C));
 h5 = heatmap(C, clim=(-CMax, CMax), title="Cx1+Cx2", yflip=true, c=:bluesreds);
 h6 = heatmap(cC.F'*C*cC.F, clim=(0, 1), title="F'*(Cx1+Cx2)*F", yflip=true, c=:amp);
 📈=plot(h1, h3, h5, h2, h4, h6, size=(800,400))
# savefig(📈, homedir()*"\Documents\Code\julia\Diagonalizations\docs\src\assets\FigCSP1.png")

Figure CSP1

# example when argument `selMeth` is different from `extremal`: enhance snr
 cC=csp(Cx1, Cx2; selMeth=:enhaceSNR)
 Cx1Max=maximum(abs.(Cx1));
 h1 = heatmap(Cx1, clim=(-Cx1Max, Cx1Max), title="Cx1", yflip=true, c=:bluesreds);
 h2 = heatmap(cC.F'*Cx1*cC.F, clim=(0, 1), title="F'*Cx1*F", yflip=true, c=:amp);
 Cx2Max=maximum(abs.(Cx2));
 h3 = heatmap(Cx2, clim=(-Cx2Max, Cx2Max), title="Cx2", yflip=true, c=:bluesreds);
 h4 = heatmap(cC.F'*Cx2*cC.F, clim=(0, 1), title="F'*Cx2*F", yflip=true, c=:amp);
 CMax=maximum(abs.(C));
 h5 = heatmap(C, clim=(-CMax, CMax), title="Cx1+Cx2", yflip=true, c=:bluesreds);
 h6 = heatmap(cC.F'*C*cC.F, clim=(0, 1), title="F'*(Cx1+Cx2)*F", yflip=true, c=:amp);
 📉=plot(h1, h3, h5, h2, h4, h6, size=(800,400))
# savefig(📉, homedir()*"\Documents\Code\julia\Diagonalizations\docs\src\assets\FigCSP2.png")

Figure CSP2

# Method (3) complex
# CSP of the average covariance matrices
k=10
Xsetc=[genDataMatrix(ComplexF64, n, t) for i=1:k]
Ysetc=[genDataMatrix(ComplexF64, n, t) for i=1:k]

cc=csp(Xsetc, Ysetc)

# ... selecting subspace dimension allowing an explained variance = 0.9
cc=csp(Xsetc, Ysetc; eVar=0.9)

# ... subtracting the mean from the matrices in Xset and Yset
cc=csp(Xsetc, Ysetc; meanX₁=nothing, meanX₂=nothing, eVar=0.9)

# csp on the average of the covariance and cross-covariance matrices
# computed along dims 1
cc=csp(Xsetc, Ysetc; dims=1, eVar=0.9)

# name of the filter
cc.name
source