Create your own test

This page illustrates how you can create univariate and multiple comparisons permutation tests besides those already supported by PermutationsTests.jl.

Good knowledge and understanding of the whole documentation is required for continuing reading this page, including the Extras pages.

There are two ways you can extend the capability of PermutationTests.jl.

  • The first, very simple, but limited, is by using existing tests with particular data input.

  • The second, which needs a little development, but is much more general, allows you to create new tests defining your own test statistics.

The illustration proceeds by examples of increasing complexity.

Index of examples

SerialExamples
1Univariate autocorrelation test
2Post-hoc tests for 1-way repeated-measures ANOVA
3Univariate t-test for independent samples
4Multiple comparison correlation test
5Univariate Chatterjee correlation
6Multiple comparisons Chatterjee correlation
7Univariate distance correlation
8Multiple comparison distance correlation

Using existing tests

The approach for constructing univariate and multiple comparisons tests is illustrated by creating tests for the autocorrelation, which are not in the API of the package.


Example 1: univariate autocorrelation test

As an example, let us detect the presence of autocorrelation at lag 1, similar to what the Durbin-Watson test does.

PermutationTests.jl implements the Pearson product moment correlation statistic and a test for it. We can then test the null hypothesis that the autocorrelation at lag 1 is equal to zero by performing a correlation test on a data vector x and a lagged version of itself y.

This is simply achieved by calling the _permTest! function, which ultimately performs all univariate tests implemented in the package.

First, let us create some data autorrelated at lag 1 to get a working example:

using PermutationTests

N=300
data=randn(N) 
for i=2:length(data)
  data[i-1]=data[i-1]+data[i]
end

# Data and lagged version of the data
x=data[1:end-1]
y=data[2:end]

Our univariate test on autocorrelation is obtained with

t = _permTest!(copy(x), y, N-1, PearsonR(), PearsonR())

NB: we pass a copy of x as this vector will be permuted.

If we wish to wrap the test in a friendly function, we can write

autocorTest(x; kwargs...) = 
  _permTest!(x[1:end-1], x[2:end], length(x)-1, PearsonR(), PearsonR(); kwargs...)

The test will be executed then simply as

t2 = autocorTest(data)

By default the test is bi-directional, thus the above call is a test for the autocorrelation regardless its sign. Our function autocorTest accepts all optional keyword arguments of the _permTest! function, thus we can easily obtain exact or approximate tests with a specified number of permutations, left- or right-directional tests, etc.

Nota Bene

A multiple comparisons test for autocorrelation, that is, one testing for the autocorrelation at several lags simultaneously, cannot be done trivially by data permutation. See for example Romano and Tirlea (2022). For this reason such a test is not given as an example here below. Reference: J.P. Romano, M.A. Tirlea (2022) Permutation testing for dependence in time series. Journal of Time Series Analysis, 43(5), 781-807.


Example 2: Post-hoc tests for 1-way repeated-measures ANOVA

When a 1-way ANOVA for repeated measures is significant, it is most often of interest to examine the differences between the treatment all taken pair-wise to understand where the significance comes from. Formally, given $K$ treatments the null hypothesis of the ANOVA is stated such as

$H_0: μ_1= ... =μ_K$.

Post-hoc tests take the form

$H_0: μ_i=μ_j, i≠j=1,...,K$

and can be tested as a series of t-tests for repeated measures.

The post-hoc tests are to be done simultaneously and a correction for multiple comparisons is to be applied. A control of the family-wise error rate in the strong sense is achieved with a multiple-comparison permutation test, as shown in the example below:

using PermutationTests

N=12 # Number of observation units per treatment
K=3 # Number of treatments

# Some random Gaussian data for example
# The last (third) treatment will have mean higher than the other two
yvec = [randn(K) for n=1:N];  
for n=1:N
    yvec[n][K]+=2.0
end
t = anovaTestRM(yvec) # ANOVA test

# Let us create the K(K-1)/2 all-pair-wise difference vectors
NK=N*K
y=vcat(yvec...)
d12=y[1:K:NK].-y[2:K:NK]
d13=y[1:K:NK].-y[3:K:NK]
d23=y[2:K:NK].-y[3:K:NK]

# post-hoc tests
pht = studentMcTestRM([d12, d13, d23])

Verify that the second and third p-value in pht.p are significant. Inspecting the mean of d12, d13 and d23 will reveal the form of the effect across tratments.

A general code for post-hoc test (for any K) is:

d=[y[i:K:NK].-y[j:K:NK] for i=1:K, j=1:K if i≠j && j>i]
pht2 = studentMcTestRM(d)
Nota Bene

Note that post-hoc tests cannot be trivially performed by data permutation as here above in the case of a 1-way ANOVA for independent samples.


Defining new test statistics

You can create your own permutation tests based on a custom test statistic as long as the permutation scheme for your test is supported by PermutationTests.jl. The ability to create a test for virtually whatever test-statistic is a fine characteristics of PermutationTests.jl and will be illustrated below with examples of ascending complexity.

The procedure to follow is the same whether you want to create your own univariate or multiple comparisons test. It involves two preliminary steps:

  1. define a new Statistic type to name the test statistic you want to use
  2. write a method for the testStatistic function to compute the test statistic for both the observed and permuted data

You then obtain the test just calling the _permTest! function for an univariate test and the _permMcTest! for a multiple comparisons test, as we will show. Two arguments of these functions are particularly important:

  • Argument asStat, which determines the permutation scheme of the test.
keep in mind

Four permutation schemes are implemented in the package, corresponding to four implemented groups of test-statistics. The asStat argument can be one of the following built-in test statistics or any test statistic belonging to the same group:

  • PearsonR()
  • AnovaF_IS()
  • AnovaF_RM()
  • StudentT_1S()

Refer to the documentation of genPerms and the documentation linked therein.

  • Argument fstat, which is a function applied to all observed and permuted statistic.
keep in mind

Argument fstat of _permTest! by default is the abs julia function, which yields a bi-directional test for a test-statistic that is distributed symmetrically around zero. For such test-statistics you will use fstat=identity for a right-directional test and fstat=flip for a left-directional test. Note that this may not be appropriate for your own test statistic. For example, if your test statistic is not symmetric and a right-tailed test is of interest, you must use fstat=identity. See below for examples.

important

The permutation vector, which is always the first argument of the _permTest! function for an univariate test and of the _permMcTest! for a multiple comparisons test must correspond to the permutation yielding the observed statistic (i.e., no data permutation).


Example 3: univariate t-test for independent samples

We create a new test for the difference of the mean between two independent samples. Such a test is equivalent to the Student's t-test for independent samples, since the mean difference is the nominator of the t statistic and the denominator is invariant with respect to data permutation.

The permutations scheme is the same as the permutation scheme of the AnovaF_IS() (or StudentT_IS) test statistic, hence it is supported by PermutationTests.jl and we can reuse it.

First, let us import testStatistic from PermutationTests.jl since we will write a new method for it:

using PermutationTests
import PermutationTests: testStatistic

(1) Define a new Statistic type to name the test statistic you want to use

struct MeanDiff <: Statistic end

(2) Write a method for the testStatistic function to compute the mean difference, it does not matter if the data is observed or permuted.

We will input the data exactly as for the AnovaF_IS() (or StudentT_IS) statistic. Our observations are divided in two groups. In this case, for this and equivalent statistics, data input is given as a single vector y of length $N1+N2$, with the N1 observations followed by the N2 observations (see studentTestIS).

The first argument of the testStatistic method is always the permutation vector x, holding in this case 1 for subjects of group 1 and 2 for subjects of group 2, that is, x=[repeat ([1], N1); repeat([2], N2)]. When given as input, x will reflect the positioning of the observed data and for permuted data it will be a permutation thereof.

A simple implementation could be:

function testStatistic(x, y, stat::MeanDiff; kwargs...)
    # get the number of subjects in each gruoup
    ns=[count(j->j==i, x) for i=1:length(unique(x))]

    s=[0., 0.]
    @simd for i ∈ eachindex(x, y) 
        @inbounds (s[x[i]] += y[i])
    end
    return (s[1]/ns[1]) - (s[2]/ns[2]) # mean difference
end

That's it.

Let's test our own test. First create some data

y=[4., 5., 6., 7., 1., 2., 3]
ns=[4, 3] # N1=4, N2=3

For the permuttaion vector the dedicated function membership is available:

x=membership(StudentT_IS(), ns) # = [repeat ([1], N1); repeat([2], N2)]

A two-directional exact test, as performed by PermutationTests.jl is

println("t-test independent samples (bi-directional) using PermutationsTest.jl: ");
t1 = studentTestIS(y, ns)

Let us run the test we have created. Notice that argument stat of _permTest! is MeanDiff(), while argument asStat is AnovaF_IS() The default fstat is julia abs() function, thus the test will be bi-directional as for the test t1 here above.

println("t-test independent samples (bi-directional) using our own statistic: ");
t2 = _permTest!(copy(x), y, ns, MeanDiff(), AnovaF_IS())

NB: we pass a copy of x as this vector will be permuted.


We now illustrate with a simple example a nice opportunity we have to write efficient code to compute test statistics; the above implementation of the testStatistic function computes the group numerosity (ns=[N1, N2]) at each permutation, however ns is invariant to permutations.

Let us rewrite the function then using the cpcd argument, which will be passed to the _permTest! functions in order to (optionally) pre-compute ns:

function testStatistic(x, y, stat::MeanDiff; cpcd=nothing, kwargs...)
    if cpcd===nothing 
        ns=[count(j->j==i, x) for i=1:length(unique(x))]
    else
        ns=cpcd
    end

    s=[0., 0.]
    @simd for i ∈ eachindex(x, y) 
        @inbounds (s[x[i]] += y[i])
    end
    return (s[1]/ns[1]) - (s[2]/ns[2]) # mean difference
end

Running the test now can use the cpcd argument to avoid redundant computations:

t3 = _permTest!(copy(x), y, ns, MeanDiff(), StudentT_IS(); cpcd=ns) 

You can check that the p-values obtained by all the above tests is identical:

t1.p≈t2.p≈t3.p ? (@info "It works!") : 
                 (@error "It does not work!")

To run a right-directional test:

t4 = _permTest!(copy(x), y, ns, MeanDiff(), StudentT_IS(); fstat = identity)

To run a left-directional approximate test (see _permTest!):

t5 = _permTest!(copy(x), y, ns, MeanDiff(), StudentT_IS(); fstat = flip, switch2rand=1)

Etc.

As an exercise, wrap the test we have created in a friendly function.

Nota Bene

We are not restricted to the use of a cpcd argument. In later examples we will show how to use an arbitrary number of specially defined keywords in order to obtain efficient tests.


Example 4: multiple comparisons correlation test

We create another version of the Pearson product-moment correlation test between a fixed variable given as a vector x and $M$ variables given as a vector of $M$ vectors Y. We want to test simultaneously all correlations between x and Y[m], for $m=1...M$.

The procedure to follow is the same with two differences: the function testStatistic will now compute each one of the `$M$ test statistics and we will finally call the _permMcTest! function instead of the _permMcTest! function.

The permutations scheme for the correlation test is the one of the PearsonR() statistic, hence it is supported by PermutationTests.jl and we can reuse it (using the appropriate asStat argument of _permMcTest!).

First, let us import testStatistic from PermutationTests.jl and what we need for the computations:

using PermutationTests
import PermutationTests: testStatistic

using LinearAlgebra: dot, ⋅ # we are going to use the dot product

(1) Define a new Statistic type to name the test statistic you want to use, such as

struct MyPearsonR <: Statistic end

(2) Write a function to compute the $i{th}$ observed and permuted test statistic. In order to obtain a faster test we will work with standardized variables, thus the Pearson correlation is given simply by the cross-product divided by N. We can write then

testStatistic(x, Y, i, stat::MyPearsonR; kwargs...) = (x ⋅ Y[i])/length(x)

That's it.

Let's test our own test. First create some data as example:

N, M = 7, 100 # N=7 observations, M=100 hypotheses
x=randn(N);
Y=[randn(N) for m=1:M];

A two-directional exact test, as performed by PermutationTests.jl is obtained by

println("correlation test using PermutationsTest.jl: ");
t1 = rMcTest!(x, Y)

Let's run our own test. Remind that our test works only for standardized variables, thus we have to input standardized variables. We can use function μ0σ1:

println("correlation test using our own test: ");
t2 = _permMcTest!(μ0σ1(x), [μ0σ1(y) for y ∈ Y], length(x), MyPearsonR(), PearsonR()) 

Check that the p-values obtained by the two tests is identical:

abs(sum(t1.p-t2.p))<1e-8 ?  (@info "my test works!") : 
                            (@error "my test does not work!")

Check also the observed statistics:

abs(sum(t1.obsstat-t2.obsstat))<1e-8 ?  (@info "my test works!") : 
                                        (@error "my test does not work!")

To run a right-directional test:

t3 = _permMcTest!(μ0σ1(x), [μ0σ1(y) for y ∈ Y], length(x), MyPearsonR(), PearsonR(); 
        fstat = identity) 

To run a left-directional approximate test:

t4 = _permMcTest!(μ0σ1(x), [μ0σ1(y) for y ∈ Y], length(x), MyPearsonR(), PearsonR(); 
        fstat = identity, switch2rand=1)

As an excercise, wrap our test in a friendly function that will standardize the input data before running the test.


Example 5: univariate Chatterjee correlation

Examples (4) and (5) are not very interesting per se, as they are just equivalent forms of tests already implemented in the API of PermutationTests.jl. Creating genuinely new tests is not necesserely more complicated thought, as it is shown in this example.

The Chatterjee (2021) correlation $ξ_{x→y}$ is an asymmetric, asymptotycally consistent estimator of how much a random variable $y$ is a measurable function of $x$. Thus, we can use it to test a null hypothesis of the form

$H_0: y≠f(x)$.

The function $f$ does not have to be monotonic, nor linear. In the following, $x$ and $y$ are assumed continuous with no ties.

Let $r$ be the ranks of variable $y$ computed after sorting $y$ by ascending order of $x$. The coefficient is given by

$ξ_{x→y}= 1-\frac{3δ_{x→y}(r)}{n²-1}$,

where

$δ_{x→y}(r)=\sum_{1}^{n-1}\left| r_{i+1} - r_i \right|$

$ξ_{x→y}$ takes value in interval $[0, 1]$ asymptotically. The limits with finite sampling are $[-1/2+O(1/n), (n-2)/(n+1)]$.

$ξ$ has a simple asymptotic Gaussian distribution given by $√n*ξ \sim N(0, 2/5)$ and is computed in $O(n\textrm{log}n)$.

Reference: S. Chatterjee (2021) a new coefficient of correlation, journal of the american statistical association 116(536), 506-15


Let us now construct a univariate permutation tests for $ξ_{x→y}$. The following code block defines a function to create it:

using PermutationTests
using StatsBase: ordinalrank # install StatsBase.jl if you don't have it
import PermutationTests: testStatistic

struct Chatterjeeξ <: Statistic end

testStatistic(r, y, stat::Chatterjeeξ; kwargs...) = sum(abs, diff(r))

xiCorPermTest(x, y; kwargs...) =
   _permTest!(ordinalrank(y[sortperm(x)]), [], length(x), Chatterjeeξ(), PearsonR(); 
                fstat = flip, kwargs...)

As in example 3, we import testStatistic and we define a type for the test statistic of interest as a Statistic type, which we have named Chatterjeeξ. Notice that we use $δ_{x→y}(r)$ as an equivalent test statistic for $ξ_{x→y}$. Notice also that while this is possible because $δ_{x→y}(r)$ and $ξ_{x→y}$ are a monotonic function of each other, their relationship is inverted, which will be taken care of later.

Then, we declare a testStatistic method for computing the $δ_{x→y}(r)$ quantity defined above for the observed and permuted data.

Finally, we invoke the _permTest! function.

The test works this way: _permTest! takes as input the $n$ ranks and list all $n!$ permutations of the rank indices for an exact test, or a random number of them for an approximate test. As the ranks are a permutation of $(1.,...,n)$ themselves, at each permutation the testStatistic function is invoked and the test statistic is computed directly on the permutation vector, whcih is always passed by _permTest! to testStatistic as the first argument.

Notice that the test of interest is meaningful only as a right-directional test for $ξ_{x→y}$, but this is a left-directional test for $δ_{x→y}(r)$, therefore we have used keyword argument fstat = flip. This takes care of the aforementioned inverse relationship of our equivalents statistic.

Let's use the test we have created:

n = 100
x = randn(n)
y = randn(n)
t = xiCorPermTest(x, y) # is not signifant

y = x.^2+randn(length(x))/5
t1 = xiCorPermTest(x, y) # y is a function of x with little noise

Example 6: multiple comparisons Chatterjee correlation


Next, let us define the multiple comparisons permutation test for $ξ_{x→y}$. We wish to test simultaneously

$H_0(m): y_m≠f(x), m=1...M$

The code could be:

using PermutationTests
using StatsBase: ordinalrank # install StatsBase.jl if you don't have it
import PermutationTests: testStatistic

struct Chatterjeeξ <: Statistic end

testStatistic(p, Y, m::Int, stat::Chatterjeeξ; kwargs...) = 
    1.0-((3*sum(abs, diff(Y[m][p])))/(abs2(length(x))-1))

# Take as input x and Y=y1,...,yM
function xiCorPermMTest(x, Y; kwargs...) 
   p = sortperm(x)
   n = length(x)
   x_ = collect(Base.OneTo(n))
   return _permMcTest!(x_, [ordinalrank(y[p]) for y ∈ Y], n, Chatterjeeξ(), PearsonR(); 
                  fstat = identity, kwargs...)
end                

The first four lines are to be omitted if we continue writing on the same unit where we have written the univariate test.

Again, we define a method for the testStatistic function and we invoke the the _permMcTest! function.

Here we have to adopt a slightly different strategy: testStatistic takes as input the permutation vector and the ranks of the $y_1,..,y_M$ vectors. It computes the $ξ_{x→y}$ statistic (note: not an equivalent statistic) for these vectors after sorting them all by the same sorting key at each permutation, which is the permutation vector passed by _permMcTest! to testStatistic as the x_ vector.

Note that $ξ_{x→y}$ can take negative values, thus we pass to _permMcTest! keyword argument fstat = identity; this will result in the correct right-directional test we sought.

Let's use the test we have created:

n = 100
m = 10
x = randn(n)
Y = [randn(n) for i=1:m]
t = xiCorPermMTest(x, Y) # is not signifant

Y = [x.^2+randn(length(x))/5  for i=1:m]
t1 = xiCorPermMTest(x, Y) # y1,...ym are a function of x with little noise

Example 7: univariate distance correlation

This and next example illustrate how we can crete a test for a complex test statistic, still avoiding redundant computations, as if we wrote the code for a permutation test from scratch.

The distance correlation (Skézely, Rizzo and Bakirov, 2007) extends the concept of correlation to distances among objects living in metric spaces, yielding a measure of dependence that is somehow sensitive to linear and non-linear dependence. We will here limit ourselves to scalars, vectors and matrices, it does not matter if real or complex.

Let ${(X_k, Y_k): k=1...K}$ be $K$ pairs of observations (may be scalars, vectors, matrices). The elements within $X$ and $Y$ must have the same size, but the size of the elements in $X$ may be different from the size of those in $Y$.

Let $D_x$ and $D_y$ be the distance matrices with elements

$D_x^{(ij)}=\left| X_i - X_j \right|_p : i,j={1,...,K}$

and

$D_y^{(ij)}=\left| Y_i - Y_j \right|_p : i,j={1,...,K}$,

where $\left|\cdot\right|_p$ is the p-norm, with $p∈(0, 2]$.

Let

$H=I-K^{-1}\textbf{1}\textbf{1}^T$

be the centering matrix, where $I$ is the identity matrix and $\textbf{1}$ is the vector of ones.

Let

$A=HD_xH'$

and

$B=HD_yH'$

be the double-centered distance matrices. Then the distance variance is defined such as

$\nu(X)=\frac{1}{K^2}\sum_{i,j=1}^{K}a_{ij}^2$

and the distance covariance such as

$\nu(X, Y)=\frac{1}{K^2}\sum_{i,j=1}^{K}a_{ij}^2b_{ij}^2$.

Finally, the distance correlation is defined such as

$\rho(X, Y)=\frac{\nu(X, Y)}{\sqrt{\nu(X)\nu(Y)}}$,

where by convention it takes value zero if the denominator is zero.

Reference: G.J. Székely, M.L. Rizzo, N.K. Bakirov (2007). Measuring and testing dependence by correlation of distances Ann. Statist. 35(6): 2769-2794.


After giving the definition, let us see how we can create a univariate permutation tests for the distance correlation.

First, let us define some functions we will need.


using LinearAlgebra: I, norm, Hermitian, LinearAlgebra

# Distance matrix for an array `X` of scalars, vectors, matrices, etc.
# The `n` elements of `X` can be real or complex.
# `pnorm` is the norm used to get distaces d_ij=norm(X[i]-X[j], pnorm),
# see julia LinearAlgenbra.norm function.
# Return the distance matrix filled only in the upper triangle.
# Note: this is always real.
function dm(x, n; pnorm=2)
    D = Matrix{Float64}(undef, n, n)
    for i=1:n-1 
        @simd for j=i+1:n
            @inbounds D[i, j] = norm(x[i]-x[j], pnorm)
        end
    end
    @simd for i=1:n 
        @inbounds D[i, i] = 0. 
    end
    return D
end    

# Distance variance. Eq. (2.9) in Székemy, Rizzo and Bakirov(2007). 
# D is a distance matrix
dVar(D) = sum(x->abs2(x), D)/size(D, 1)^2  

# Distance covariance. Eq. (2.8) in Székemy, Rizzo and Bakirov(2007)
# Dx and Dy are two distance matrices of equal size
dCov(Dx, Dy) = sum(Dx.*Dy)/size(Dx, 1)^2  

# Distance correlation. Eq. (2.10) in Székemy, Rizzo and Bakirov(2007)
# Dx and Dy are two distance matrices of equal size
function dCor(Dx, Dy)
    den = dVar(Dx) * dVar(Dy)
    return den > 0 ? sqrt(dCov(Dx, Dy)/sqrt(den)) : 0.  
end

We are now ready to create the univariate test.

First, let us import testStatistic:

using PermutationTests
import PermutationTests: testStatistic

Given two vectors of elements x and y with distance matrices Dx and Dy, the test is obtained permuting the indices of x. Instead of recomputing distance matrix Dx at each permutation, the strategy is to permute instead the rows and columns of Dx by a permutation matrix P that is changed at each permutation according to the permutation vector x.

Let us then write the function to do this:

# Overwrite P with a permutation matrix given a vector x holding a 
# permutation of n indices. P must exist, be square and have same dimension as x
function permMatrix!(P, x)
   fill!(P, 0.)
   @simd for i∈axes(P, 1)
         @inbounds P[i, x[i]] = 1. 
   end
   return P
end

As in the previous examples, we need to define a type for the distance correlation statistic:

struct Dcor <: Statistic end

Next, as in the previous example, we need to define a method for testStatistic. This function takes as arguments a permutation vector p and Dy, the distance matrces of y, which is fixed across permutations.

Furthermore, we will use four specially defined keyword arguments:

  • H : the centering matrix
  • P : the permutation matrix (to be updated at each call of the function)
  • Dx: the original distance matrix for data x
  • dVarDy: the distance variance of y

At each permutation then, we will permute and double-center Dx in order to compute the distance correlation. The quantities that are invariant by permutation are passed to the function so that we do not need to recompute them.

function testStatistic(x, HDyH, stat::Dcor; H, P, Dx, dVarHDyH, kwargs...)
   permMatrix!(P, x)
   # Hx * P * Dx * P' * Hx', Dx with rows and columns permuted
   HP = H*P
   HxPDxPHx = HP * Dx * HP' # permuted and double centered Dx
   den = dVar(HxPDxPHx) * dVarHDyH # dVar(Dx) * dVar(Dy), square of the denominator of dCor 
   return den > 0 ? sqrt(dCov(HxPDxPHx, HDyH)/sqrt(den)) : 0. # dCor
end

Finally, we write a function preparing the data and calling the _permTest! function. The preparation involves computing the distance matrices (once and for all), making some checks, creating the permutation vector p and the keyword arguments that will be passed internally to the function testStatistic we have created by _permTest!.

Importantly, the vector p is created so as to correspond to the permutation yielding the observed statistic ($i.e.$, no data permutation), that is, in this case, the vector with the indices in the natural order $1...n$.

# The test takes as input two vectors of elements, which may be scalars, vectors or matrices.
function dCorPermTest(x, y; pnorm=2, kwargs...)
    n = length(x)
    Dx = Hermitian(dm(x, n; pnorm)) 
    Dy = Hermitian(dm(y, n; pnorm)) 
   
    size(Dx, 1) == size(Dx, 2) || throw(ArgumentError("Function dCorPermTest: Did you want to call dCorPermMTest intead? The `Dx` and `Dy` distance matrices that have been computed are not square"))
    size(Dx) == size(Dy) || throw(ArgumentError("Function dCorPermTest: Did you want to call dCorPermMTest intead? The `Dx` and `Dy` distance matrices that have been computed are not square or do not have the same size"))
    p = collect(Base.OneTo(n)) # permutation vector
   
    #  the centering matrix, the identity matrix, HDyH, dVar(HDyH)
    Id = Matrix{Float64}(I, n, n)
    H = Id - fill(1/n, n, n) # the centering matrix
    P = copy(Id) # the permutation matrix
    HDyH = H * Dy * H'
    dVarHDyH = dVar(HDyH) # the distance variace of HDyH (invariant to permutations)

    return _permTest!(p, HDyH, n, Dcor(), PearsonR(); fstat=identity, H, P, Dx, dVarHDyH, kwargs...)
end

We are done.

Let us use the function. We make an example where the elements of x and y are vectors. You can verify yourself that the test works in the same way if they are scalars or matrices.

# Vectors
n=10
l=20
x=[randn(l) for i=1:n]
y=[randn(l) for i=1:n]
perm = dCorPermTest(x, y; pnorm=2, switch2rand=1)

# non-linear relationship (could be detected)
y=[x_.^2+randn(length(x_))./10 for x_ in x]
perm = dCorPermTest(x, y; switch2rand=1)

Example 8: multiple comparisons distance correlation

This example reuses the code we have already written for the previous example 7.

The strategy here is the same as in that example, however here we have to be more careful as there are several keyword arguments that will be updated at each permutation.

The additional code we need to create a multiple comparisons permutation test is here below. As compared to the previous example, note that:

  • we are asking function _permMcTest! to pass internally to the function testStatistic the product HPDxPH as keyword argument because we do not want to compute it for every call of the testStatistic function, but only when it is called for the first hypothesis.
  • we are similarly also passing the distance variance of Dx dVarDx as a keyword argument for the same reason.

Note also

  • the syntax [:] to update variables passed as keyword
  • the fact that dVarHDxH is passed as a vector of one element so as to be possible to update it thanks to the syntax [:].
function testStatistic(x, HDYH, m::Int, stat::Dcor; H=H, P=P, Dx, dVarHDYH, HxPDxPHx, dVarHDxH, kwargs...)
    if m==1
        permMatrix!(P, x)
        # Hx * P * Dx * P' * Hx', Dx with rows and columns permuted 
        HP = H * P
        HxPDxPHx[:] = HP * Dx * HP'  # notice the [:] syntax; this kwarg is updated when m=1
        dVarHDxH[:] = [dVar(HxPDxPHx)] # notice the [:] syntax; this kwarg is updated when m=1
    end
    
    den = dVarHDxH[1] * dVarHDYH[m] # dVar(Dx) * dVar(Dy), square of the denominator of dCor 
    return den > 0 ? sqrt(dCov(HxPDxPHx, HDYH[m])/sqrt(den)) : 0. # dCor
end
 

function dCorPermMTest(x, Y; pnorm=2, kwargs...)
    n = length(x)
    Dx = Hermitian(dm(x, n; pnorm)) 
    DY = [Hermitian(dm(y, n; pnorm)) for y ∈ Y] 
    size(Dx, 1) == size(Dx, 2) || throw(ArgumentError("Function dCorPermMTest: The `Dx` and `Dy` distance matrices that have been computed are not square"))
    size(Dx) == size(DY[1]) || throw(ArgumentError("Function dCorPermMTest: The `Dx` and `Dy` distance matrices that have been computed are not square or do not have the same size"))
    length(unique(size.(DY, 1))) == 1 || throw(ArgumentError("Function dCorPermMTest: All elements of second data input must have equal size"))
    n = size(Dx, 1)
    p = collect(Base.OneTo(n)) # permutation vector

    # keyword arguments that are not updated
    Id = Matrix{Float64}(I, n, n)
    H = Id - fill(1/n, n, n)
    P = copy(Id) 
    HDYH = [H*Dy*H' for Dy ∈ DY]
    dVarHDYH = dVar.(HDYH)

    # Initialize keyword arguments that will be updated
    HxPDxPHx = Matrix{Float64}(undef, size(Dx)...)
    dVarHDxH = [0.0] # NB, cannot pass a scalar as kwargs if it is to be updated!

    return _permMcTest!(p, HDYH, n, Dcor(), PearsonR(); 
                        fstat=identity, threaded=false, H, P, Dx, dVarHDYH, HxPDxPHx, dVarHDxH, kwargs...)
end

Let us use the multiple comaparions test we have just created.

Example where x and y hold vectors:

# Vectors
n=10
l=20
m=30
x=[randn(l) for i=1:n]
Y=[[randn(l) for i=1:n] for j=1:m]
# random data. No hypothesis should be significant
perm = dCorPermMTest(x, Y)

# The y1...ym variables are noisy copy of x.  
# The p-value should be significant for about the first M/2 hypothesis out of M.
# The actual number of rejected hypotheses could be slighty different then M/2. 
for i=1:m÷2
    Y[i]=[x[j].+randn(l)./2 for j=1:n]
end
perm = dCorPermMTest(x, Y)

Example where x and y hold matrices:

# Matrices
n=10
l=20
m=30
x=[randn(l, l) for i=1:n]
Y=[[randn(l, l) for i=1:n] for j=1:m]
# random data. No hypothesis should be significant
perm = dCorPermMTest(x, Y)

# The y1...ym variables are noisy copy of x.  
# The p-value should be significant for about the first M/2 hypothesis out of M
# The actual number of rejected hypotheses could be slighty different then M/2. 
for i=1:m÷2
    Y[i]=[x[j].+randn(l, l)./2 for j=1:n]
end
perm = dCorPermMTest(x, Y)

In conclusion, in the above examples we have illustrated how to create permutation tests of increasing complexity. The last two examples concerns tests that are pretty complex and very different from any of the test implemented in Permutationtests.jl, where even the permutation scheme had to be defined differently. The diverse procedures exposed here above can be adapted to new problems in order to create a countless number of new permutation tests.

Useful functions for creating your own tests


PermutationTests._permTest!Function
function _permTest!(x, y, ns::nsType, stat::Stat, asStat::AsStat;
                    standardized::Bool=false, centered::Bool=false, 
                    nperm::Int = 20000, 
                    fstat::Function = abs,
                    compfunc::Function = >=,
                    switch2rand::Int = Int(1e8),
                    seed::Int = 1234,
                    verbose::Bool = true,
                    cpcd = nothing,
                    kwargs...) where {Stat<:Statistic, AsStat<:Statistic}

This function ultimately performs all univariate permutation tests implemented in PermutationsTests.jl, both exact and approximate (Monte Carlo).

For running tests use the univariate test functions. You need this function only for creating your own tests.

Rewrite x and/or y, depending on the test performed.

For the ns argument see ns.

stat can be a singleton of the Statistic type or a user-defined singleton of this type to be used as argument of a function implemented by the user to compute both the observed and permuted test statistics, see create your own test.

asStat is a singleton of the Statistic type. It is used to determine the permutation scheme and for this purpose it will be internally passed to genPerms and nrPerms.

asStat determines also the input data format if you declare your own stat type. In this case the function you write for computing the observed and permuted test statistic will take the x and y arguments as it follows:

For Stat belonging to group

  • BivStatistic : x, y are the two vectors of $N$ elements each for which the bivariate statistic is to be tested. ns is ignored.
  • IndSampStatistic : we have $K$ groups and $N=N_1+...+N_K$ total observations; y holds all observations in a single vector such as [y1;...;yK] and x is the membership(::IndSampStatistic) vector. For example, for $K=2$, $N_1=2$ and $N_2=3$, x=[1, 1, 2, 2, 2].
  • RepMeasStatistic : we have $K$ measures (e.g., treatements) and $N$ subjects; y holds the $K*N$ observations in a single vector such as [y1;...;yN], where each vector $y_i$, for $i=1...N$, holds the observation at the $K$ treatments and x=collect(1:K*N) (see membership(::RepMeasStatistic)).
  • OneSampStatistic : We have $N$ observations (e.g., subjects); y holds the $N$ observations and x=ones(Int, N) (see membership(::OneSampStatistic)).
Nota Bene

In all cases x is treated as the permutation vector that will be permuted before calling the testStatistic function.

For length(x)>30 the approximate test is performed in all cases,

otherwise,

if the number of systematic permutations exceeds switch2rand the approximate test is performed using nperm random permutations (default 20000),

otherwise,

the exact test is performed.

switch2rand defaults to 1e8. To perform approximate tests in all cases, set switch2rand, for example, to 1.

If stat is a BivStatistic, it optionally uses kwargs standardized or centered to compute them faster, see for example correlationTest.

seed is the initial seed for generating random permutations (not used for exact tests). To use a random seed, pass seed=0. For seed any natural number, the test will be reproducible.

fstat is a function applied to the test statistic. By default this is the julia abs function, which takes the absolute value, hence yieds a bi-directional test for a test statistic distributed symmetrically around zero. For a right-directional test using such test statistics pass here identity. For a left-directional using such test statistics pass here flip.

compfunc is the function to compare the observed statistics to the permuted statistics. The default function is >=. Don't change it unless you have studied the code of the function.

If verbose is true, print some information in the REPL while running the test. Set to false if you run benchmarks. The default is true.

For the cpcd and kwargs... argument, see create you own test.

Return a UniTest structure.

Examples

using PermutationTests
x=randn(6)
y=randn(6) 
# bi-directional exact test of the correlation between x and y
t8 = _permTest!(μ0(x), μ0(y), length(x), Covariance(), Covariance(); centered=true) 
t8.p
t8.stat
#...

# make a left-directional test and standardize the data
t8_2 = _permTest!(μ0σ1(x), μ0σ1(y), length(x), Covariance(), Covariance(); 
        standardized=true, fstat=flip) 

# the same but force an approximate test
t8_3 = _permTest!(μ0σ1(x), μ0σ1(y), length(x), Covariance(), Covariance(); 
        standardized=true, fstat=flip, switch2rand=1) 

# the same using 5000 random permutations
t8_4 = _permTest!(μ0σ1(x), μ0σ1(y), length(x), Covariance(), Covariance(); 
        standardized=true, fstat=flip, switch2rand=1, nperm=5000) 

To check more examples, see the uniTests_API.jl unit located in the src github folder and function test_unitests() in the runtests.jl unit located in the test github folder.

source
PermutationTests._permMcTest!Function
function _permMcTest!(x, Y, ns::nsType, stat::Stat, asStat::AsStat;
            standardized::Bool=false, centered::Bool=false, 
            stepdown::Bool = true,
            fwe::Float64 = 0.05,
            nperm::Int = 20000, 
            fstat::Function = abs,
            compfunc::Function = >=,
            switch2rand::Int = Int(1e8),
            seed::Int = 1234,
            threaded::Bool = Threads.nthreads()>=4,
            verbose::Bool = true,
            cpcd = nothing) where {Stat<:Statistic, AsStat<:Statistic}

This function ultimately performs all multiple comparisons permutation tests implemented in PermutationsTests.jl, both exact and approximate (Monte Carlo).

For running tests use the multiple comparisons test functions. You need this function only for creating your own tests.

The step-down version of the test is performed if stepdown is true (default). In this case the fwe (family-wise error) rate is used for rejection at each step (default=0.05).

Rewrite x and/or Y, depending on the test performed.

For the ns argument see ns.

stat can be a singleton of the Statistic type or a user-defined singleton of this type to be used as argument of two functions implemented by the user to compute the observed and permuted test statistics, see create your own test.

Warning

In contrast to univariate tests, equivalent statistics are not possible for multiple comparison tests, with the exception of CrossProd() if the data has been standardized and Covariance() if the data has been centered and those only for correlation-like tests.

The test statistics that must be used as Stat for the other kinds of test if a singleton of the Statistic type is used are AnovaF_IS(), StudentT_IS(), AnovaF_RM(), and StudentT_1S().

asStat is a singleton of the Statistic type. It is used to determine the permutation scheme and for this purpose it will be passed to genPerms and nrPerms.

asStat determines also the input data format if you declare your own stat type. In this case the two functions you write for computing the observed and permuted statistics will take the x and Y arguments as it follows.

For Stat belonging to group

  • BivStatistic : x is a fixed vector with $N$ elements and Y an an $M$-vector of $N$-vectors. The $M$ bivariate statistics between x and the $y_m$ vectors of Y are tested simultaneously. ns is ignored.
  • IndSampStatistic : we have $K$ groups and $N=N_1+...+N_K$ total observations; Y is an $M$-vector, each one holding all observations for the $m^{th}$ hypothesis in a single vector. The $m^{th}$ vector $y_m$ concatenates the observations for all groups such as [y[m][1];...;y[m][K]]. x is the membership(::IndSampStatistic) vector, common to all hypotheses. For example, for $K=2$, $N_1=2$ and $N_2=3$, x=[1, 1, 2, 2, 2].
  • RepMeasStatistic : we have $K$ measures (e.g., treatements) and $N$ subjects; Y is an M-vector, each one holding the $K*N$ observations for the $m^{th}$ hypothesis in a single vector. The $m^{th}$ vector $y_m$ is such as [Y[m][1];...;Y[m][N]], where each vector Y[1][n], for $n=1…N$, holds the observations for the $K$ treatments and x=collect(1:K*N) (see membership(::RepMeasStatistic)).
  • OneSampStatistic : We have $N$ observations (e.g., subjects); Y is an $M$-vector, each one holding the $N$ observations and x=ones(Int, N) (see membership(::OneSampStatistic)).
Nota Bene

In all cases x is treated as the permutation vector that will be permuted before calling the testStatistic function for each of the elements in Y.

Optional keyword arguments switch2rand, nperm, standardized, centered, seed, fstat, compfunc and verbose have the same meaning as in the _permTest! function.

If threaded is true (default) the function is multi-threaded if the product of the number of hypotheses, observations, and permutations exceed 500 millions. If you have unexpected problems with the function, try setting threaded to false.

For the cpcd and kargs... arguments, see create you own test.

Return a MultcompTest structure.

The number of executed steps $S$ can be retrived as the length of the .rejections field of the returned structure.

Examples

using PermutationTests
N, M = 8, 100 # 100 hypotheses, N=8
x=randn(N)
Y=[randn(N) for m=1:M]
# bi-directional exact test of the correlation between x and 
# all the M vector in Y.
T12 = _permMcTest!(x, Y, N, PearsonR(), PearsonR())
T12.p
T12.stat
#...

# bi-directional exact test. Faster test by data standardization 
T12_2 = _permMcTest!(μ0σ1(x), [μ0σ1(y) for y in Y], N, CrossProd(), PearsonR(); 
                    standardized=true) 

# left-directional exact test.
T12_3 = _permMcTest!(μ0σ1(x), [μ0σ1(y) for y in Y], N, CrossProd(), PearsonR(); 
                    standardized=true, fstat=flip)

# as above, but force an approximate test
T12_4 = _permMcTest!(μ0σ1(x), [μ0σ1(y) for y in Y], N, CrossProd(), PearsonR(); 
                    standardized=true, fstat=flip, switch2rand=1)

# the same using 5000 random permutations
T12_4 = _permMcTest!(μ0σ1(x), [μ0σ1(y) for y in Y], N, CrossProd(), PearsonR(); 
                    standardized=true, fstat=flip, switch2rand=1, nperm=5000)

To check more examples, see the multcompTests_API.jl unit located in the src github folder and function test_multicompTests() in the runtests.jl unit located in the test github folder.

source
PermutationTests.flipFunction
flip(x::Bool)

flip(x::Union{R, I}) where {R<:Real, I<:Int}

Invert the sign of a real number or integer and negate a boolean.

This function may be needed for argument fstat to call _permTest! or _permMcTest! if you create your own test - see the example on how to create a test for the Chatterjee correlation.

It can also be useful when you create a new test in the code you deveolop for computing your own test statistics.

source
PermutationTests.membershipFunction
function membership(stat::IndSampStatistic, ns::Vector{Int})

Create the appropriate argument x to be used by functions _permTest! and _permMcTest! when you create your own test using the permutation scheme of test statistics belonging to the IndSampStatistic group.

For stat a IndSampStatistic, ns is a group numerosity vector, i.e., a vector of positive integers [N1,...,NK], where $K$ is the number of groups and $N_k$ is the number of observations for the $k^{th}$ group (see ns).

Return the group membership vector [repeat (1, N1);...; repeat(K, Nk)]

If rev=reverse is passed as keyword argument, return instead group membership vector [repeat (K, N1);...; repeat(1, Nk)]. This is used to run t-tests for independent samples using the PearsonR() Statistic, see for example studentTestIS.

Examples

using PermutationsTests
membership(AnovaF_IS(), [3, 4])
# return the vector [1, 1, 1, 2, 2, 2, 2]
source
function membership(stat::RepMeasStatistic, ns::@NamedTuple{n::Int, k::Int})

Create the appropriate argument x to be used by functions _permTest! and _permMcTest! when you create your own test using the permutation scheme of test statistics belonging to the RepMeasStatistic group.

For stat a RepMeasStatistic, ns is a named tuple, such as (n=N, k=K), where $N$ is the number of observations (e.g., subjects) and $K$ the number of measurements (or treatments, times, ect.), see ns.

Return collect(1:N*K).

Examples

using PermutationsTests
membership(AnovaF_RM(), (n=2, k=4))
# return the vector [1, 2, 3, 4, 5, 6, 7, 8]
source
function membership(stat::OneSampStatistic, ns::Int)

Create the appropriate argument x to be used by functions _permTest! and _permMcTest! when you create your own test using the permutation scheme of test statistics belonging to the OneSampStatistic group.

For stat a OneSampStatistic, ns is the number of observations (e.,g., subjects) given as an integer, see ns.

Return ones(Int, ns).

Examples

using PermutationsTests
membership(Sum(), 5)
# return the vector [1, 1, 1, 1, 1]
source
PermutationTests.testStatisticFunction

# METHOD 1
function testStatistic(x, y, stat::mystat, fstat::Function; 
                        cpcd=nothing, kwargs...)

# METHOD 2
function testStatistic(x, Y, i::Int, stat::mystat, fstat::Function; 
                        cpcd=nothing, kwargs...)
                        
        where mystat<:Statistic 

Compute the observed and permuted test statistic for univariate tests (Method 1) or the $i^{th}$ observed and permuted test statistic for the $i^{th}$ hypothesis, with $i=1...M$, for multiple comparisons permutation tests (Method 2).

If you create your own test you will write new methods for these functions taking as stat a test statistic of type Statistic you have declared.

If not, you never need these functions.

Y is a vector of elements (typically, vectors themelves) and the test-statistic is to be computed on Y[i], using the permutation vector x.

For the fstat and cpcd argument, see how to create your own test.

source