======GLM statistics in mrTools ======
This wiki page provides some more background about the v2.0 implementation of the GLM code by Julien Besle (julienbesle@gmail.com) done at Nottingham, as well as a [[#Parameters |complete documentation of all parameters]]. For the original version of the GLM analysis, on which some of the code was based, see [[http://justingardner.net/doku.php?id=mrTools:analyses#glm_analysis | original GLM analysis]].
In addition, you may want to read through other introductions to the GLM model, eg at the FMRIB in Oxford [[http://fsl.fmrib.ox.ac.uk/fsl/feat5/]] or the online textbook at [[http://www.statsoft.com/textbook/general-linear-models/ | statsoft]].
There is also a [[:mrtools:tutorialsglm_v2 | step-by-step tutorial]] on this version of the GLM analysis.
======A quick primer on the General Linear Model ======
The goal of the **General Linear Model** (GLM)((In fact the type of single-subject analysis described here uses multiple regression, which is a particular type of GLM)) analysis is to explain an fMRI **timeseries** //at a given voxel// with a limited set of **explanatory variables** (EVs), sometimes also called **regressors**.
EVs are physiological or experimental **variables** ((variable meaning in this context that they are a function of time)) with a known temporal structure, but unknown amplitude.
For example a sensory stimulation in an fMRI experiment has a known temporal structure but we don't know how much it contributes to the fMRI variation with time.
The goal of the analysis is to find out //**how much each EV contributes to the time-series**//.
The GLM model makes the //simplistic assumption// that the fMRI time-series Y(t) //at a given voxel// can be expressed as a **linear combination** of p known time-series X1(t),..,Xn(t) or EVs ((In everything that follows, we assume that the mean of the timeseries is 0, which is done by de-meaning and/or converting the time-series to percent signal change.)):
{{:mrtools:glm:glm_continuous.png|}}
where β1,...,βp are real values called **beta weights** or **beta estimates** and ε(t) is a **noise** term that we are not able to explain
((for all statistical purposes it is assumed to be have a Gaussian distribution)). Beta estimates β1,...,βp represent how much each EV contributes to the fMRI signal Y(t).
This model is often denoted in matrix notation:
{{:mrtools:glm:glm_matrix.png|}}
Where Y is a column vector of n recorded fMRI time-samples, β is a row vector of p estimates, X is the **design matrix**, i.e. a matrix of size n×p where each column represents an EV and ε is column vector of n random values.
The goal of the GLM analysis is to find the values of the p beta weights that best explain the fMRI time-series, i.e. maximize the contribution of EVs (**explained variance**) and minimize the contribution of noise (**unexplained variance**) in the model.
The most common method to estimate these β weights is called **Ordinary Least Squares** (OLS).
There are at least two conditions required to use OLS estimation:
- //The **assumption** that the noise has no temporal structure// (meaning that at each time point the level of noise is independent from the level of noise at other time-points)
- //The **condition** that EVs are temporally independent from each other//
Under these conditions, it can be shown that there is a unique set of beta weights that minimizes the contribution of the noise and that they are computed as
{{:mrtools:glm:beta_ols.png|}}
The requirement for independent EVs can be understood intuitively:
if some EVs were not independent (i.e. if they had similar temporal structure, or one was a linear combination of other EVs), then it would be impossible to decide which EV contributes to a particular temporal structure in the data because there would be an infinite number of linear combinations that "best" explain the time-series.
In other words, the design matrix in this case would be ambiguous. This would happen if two stimuli are always presented together: it is impossible to decide which or how much each contributes to the recorded fMRI signal.
Note that in the GLM analysis, each voxel time-series is analysed independently from the other voxels. There is no attempt to include a //spatial// structure in the model.
===== HRF models =====
It is well known that the hemodynamic response to an infinitely/very short stimulus (aka **hemodynamic response function** or HRF)
is not a very/infinitely short increase in the fMRI signal.
The signal variation following a sensory stimulation has a certain shape and a certain delay.
and these temporal characteristics have to be taken into account in the GLM in order to obtain a better fit of the model with the data.
This is done by **convolving** the design matrix, i.e. each EV of the model, with a model of the HRF to obtain a new model that reflects the variation of the fMRI signal rather than the variation of the experimental design.
Three models are currently implemented:
* A **double Gamma function** model: This models the HDR as a difference of 2 gamma distribution functions with three parameters:
* Shape of the positive Gamma distribution function (determines the peak latency of the HRF: panel A)
* Shape of the negative Gamma distribution function (determines the undershoot latency of the HRF: panel B)
* Scaling ratio of the postivie to negative gamma components (see panel D)
{{:mrtools:glm:hrfdiffgamma.png?1000}}
Because the parameters are fixed for all voxels, this model cannot accomodate HRF variations in delay across voxels (or conditions or subjects).
However, there is an option to add the temporal derivative of the double gamma function as an additional **component** to the EV.
Each EV is convolved not only with the HRF, but also its orthogonalized temporal derivative.
This in effect doubles the number of EVs (the number of columns in the design matrix).
Fitting the GLM for this design matrix produces 2 beta values per EV (columns in your original design matrix).
Computing a linear combination of columns of the HRF model using a set of 2 beta values as weights would give you the estimated shape of the HRF for this EV.
* **Basis set**. The HRF is modeled using a number of independent components (3 or more ) or vectors that span a plausible space of shapes for the HRF.
In this case each EV is convolved with as many components as there are in the HRF model, resulting in a design matrix with (number of HRF components × number of EVs) columns.
Fitting the GLM for this design matrix would produce (number of EVs) sets of (number of HRF components) beta estimates.
Again, computing a linear combination of columns of the HRF model using a set of (number of HRF components) beta values as weights would give you the estimated shape of the HRF for this EV.
HRF basis sets must be constructed using FSL's Linear Optimal Basis Set ([[ http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLOBS |FLOBS]]). The default is to use their default 3-component basis set.
* **Deconvolution**: This is a special case where there is no attempt to explicitly model the HRF.
Rather, as many dimensions as necessary are provided in the HRF "model", in order to capture/estimate the shape of the HRF on a given number of TRs.
In effect the HRF model in this case consists of an identity matrix whose size defines how many TRs you want to estimate the HRF on.
If for example you wanted to estimate the HRF for a 20 s period and your TR is 2s, you would use a 10×10 identity matrix.
Each EV in your model would be convolved with this identity matrix to produce a design matrix with (10×number of EVs) columns.
Fitting the GLM in this case would produce (number of EVs) sets of 10 beta values which would each have the shape of the HRF in response to each EV.
Again, computing a linear combination of columns of the HRF model (the 10×10 identity matrix)
using a set of 10 beta values as weights would give you the estimated shape of the HRF for this EV.
This is equivalent to running the event-related analysis in mrTools, but allows for statistical analysis (for example an F-test on a subset of beta estimates at different time points).
===== Statistics in the GLM =====
**Descriptive** statistics in mrTools currently consist solely of the **coefficient of determination** R2 which measures how well the model explains the variance in the timeseries.
**Inference** tests are based on the analysis of variance of the EVs of interest and residual noise.
All statistics (at least implicitly) require the computation of an estimation of the noise, aka **residuals**.
The estimated noise is the difference between the actual timeseries Y and the model Xβ
{{:mrtools:glm:epsilon_ols.png|ε = Y - X×β = X×(X'×X)^-1×X'×Y}}
==== Coefficient of determination ====
The coefficient of determination R2 is the ratio of the variance explained by the model to the total variance of the data
{{:mrtools:glm:r2_ols.png|R2 = MSS/TSS = (TSS - RSS)/TSS = 1 - RSS/TSS = 1 - Y2/ε^2 = 1 - Y2/(X×(X'×X)^-1×X'×Y)}}
Where
* TSS is the total variance or **total sum of squares**, and is computed as the sum of the squared time-series values.
* RSS is the residual variance or **residual sum of squares**, and is computed as the sum of the squared residual values.
* MSS is the variance explained by the model or **model sum of squares**
The R2 value will be close to 0 if the time-series is dominated by noise and closer to 1 is the model explains the time-series.
Note that R2 also depends on the number of independent EVs in the model: the more EV the higher the value, so while R2 values are useful to compare voxels that are analysed with the same model, they shouldn't be used to compare model fits with different numbers of EVs.
==== Inference Tests ====
=== Contrasts and T-tests ===
**Contrasts** are linear combinations of EVs that are of particular interest. They can consist in a single EV or a difference of EVs or a sum of EVs, etc...
They are denoted as vectors of length p, where p is the total number of EV
Let's say you have 3 EVs in your model and you're interested in the contribution of EV1 to the hemodynamic response, the relevant contrast would be denoted:
//C = [1 0 0]//
If you wanted to look at the difference in contributions of EV2 and EV3 to the time-series,
the relevant contrast would be or (one gives a value of opposite sign to the other).
And your contrast value would be
//Contrast = C×βT//
Any contrast can be tested for significance using a **T-test**.
T-tests return the probability that the contrast value is significantly different from 0
((only comparisons to 0 are currently implemented)) under the null hypothesis.
The null hypothesis is the hypothesis that there is no effect, i.e. it describes the conditions under which the random distribution of contrasts (the null distribution) has a mean of 0.
For parametric tests, we assume that the this null distribution is Gaussian with mean 0 and variance σ2.
**T statistics** are computed as:
{{:mrtools:glm:student_t_ols.png|T = C×β' / (C'×(X'×X)^-1×C ×s^2 / (n-p-1))^.5}}
where s2 is an estimate of σ2, computed as
{{:mrtools:glm:s2_ols.png|s^2 = RSS/(n-p-1)}}
Under the **null hypothesis**, T values follow a **Student T distribution** with (n-p-1) **degrees of freedom**.
T-values can be converted to probability values or to a Z value (the value of the standard normal distribution that would return the same probability).
=== F-tests ===
This section is based on **generalized hypothesis tests** as described in [[http://dx.doi.org/10.1002/1097-0193(200012)11:4<249::AID-HBM20>3.0.CO;2-5|Burock & Dale 2000]]
Contrary to T-tests, **F-tests** do not test whether a given contrast is significantly different from 0 but whether any contrast in a set of contrasts is significantly different from 0.
F-tests are denoted as restriction matrices of size q×p (**restriction matrix**), where p is the total number of EVs and q