====== Introduction ======
The key to understanding analysis of any data including BOLD imaging data is to remember that all analyses are models. One designs and runs an experiment, collects data and then models the results and examines the goodness-of-fit and parameters of the model. If you have a good model of the data, then your goodness-of-fit will be good and you will get fit parameters that can be meaningful interpreted. Models contain multiple assumptions about the processes that generated the data, some of which may not be immediately obvious. Those assumptions may be wrong. And if they are, the results that you obtain may not be meaningful. This is true whether you are doing a simple t-test or fitting a complex non-linear model to your data. So, with that in mind, we will use this tutorial to go through a few commonly used models for analyzing BOLD imaging data.
To make things concrete, we will analyze human cortical BOLD responses to visual stimuli in which we know that neurons in primary visual cortex (V1) will show responses to visual stimuli in a particular location in space called a receptive-field. We will go through the most common ways to look at BOLD imaging data that interrogate these receptive-fields with two types of analyses, an "Event-related analysis" and an "encoding model". Event-related analyses can be thought of as a trial-triggered average response which assumes that overlaps in response sum linearly. Encoding models can take many shapes and forms, but generally mean that you assume a model that encodes the stimulus (in this case a receptive-field) whose parameters are adjusted to give rise to the response measured in a voxel. For this tutorial, we will use a population receptive-field (pRF) encoding model [[http://www.sciencedirect.com/science/article/pii/S1053811907008269|Dumoulin and Wandell (2008)]] which has been used extensively to make retinotopic measurements.
Note that this tutorial is meant as a quick and practical introduction to these types of analyses and assumes that you have already learned about the biological processes that give rise to the BOLD signal and some of its basic properties.
====== Learning goals ======
After finishing this tutorial, you should be able to...
//...compute an event-related analysis of BOLD data to recover the response to a trial or stimulus.//\\
//...compute a pRF analysis which allows you to recover the position and size of a visual receptive-field of a voxel.//\\
//...know the assumptions (particularly linearity) of each of these models.//\\
====== Setup ======
This tutorial is written in Matlab (though there are equivalent instructions in python for the pRF part).
To setup for matlab, you just need to do the following:
- **Setup your matlab environment** For those working at Stanford you can get an [[https://uit.stanford.edu/service/softwarelic/matlab|indiviudal license]]. If that fails, you can try to use the [[:nepr207:corn|corn server]], but we don't recommend that anymore. If you use the corn server, then you do not need to do the rest of these instructions.
- **Start matlab**
- **Get datafiles** You can get the data from [[https://drive.google.com/file/d/1T2VqjQQoEbBZ8pTtORBoQC90t1f2Q90s/view?usp=sharing|here]]. This should download a matlab file to your Downloads folder (if you are on a Mac). To load the data in, you can do:
cd ~/Downloads
load boldmodels
====== BOLD Response ======
Ok, let's say we do a totally simple experiment in which we flash a visual stimulus at different times and measure the BOLD response in visual cortex. In the diagram below each vertical tick is meant to represent a time at which we flashed the visual stimulus on.
{{:tutorials:modeltut_stimulus_times.png|}}
What would we expect the BOLD response to look like? Well, recall, that after brief neural activity, the "hemodynamic impulse response function" typically looks like the following:
{{:tutorials:screen_shot_2017-05-09_at_1.16.24_pm.png|}}
The BOLD response is also (approximately) linear in time (see [[http://www.sciencedirect.com/science/article/pii/S1053811912000997|Boynton GM1, Engel SA, Heeger DJ (2012)]] for an excellent review). What does that mean? It means that it obeys linear superposition - if you know the response to each of two stimuli presented alone, then you can predict the response to the combination of the two stimuli as the sum (or superposition) of the two responses. In this case, the two stimuli would be responses presented at different times which would both be expected to evoke the above hemodynamic response. The response to two stimuli presented in succession would be the sum of the individual responses. So, imagine in your head a hemodynamic response function placed at every vertical tick mark in the stimulus timing diagram above and any overlap in response (when stimuli are closer together than the length of the hemodynamic response) summing together (assume that each tick happens about 5-10 seconds after the last one). Add a little bit of noise to that mental image. Got it? You should be imagining something like the following (Click below to reveal what the expected response is):
++++Predicted BOLD response|
{{:tutorials:boldtut_predicted_bold.png|}}
++++
Looks like what you got? Good. You've just done a convolution in your head then! That's the linear systems model for a BOLD response and there's pretty good evidence that if you make your stimulus presentation times long enough (a couple of seconds) that the BOLD response is temporally linear (see review cited above). Pretty much every BOLD analysis technique rests on this temporally linear model of BOLD responses!
====== Event-related analysis ======
Ok. So, now let's go backwards. Imagine that you have measured that BOLD response time series was and you wanted to know what the average BOLD response was to each stimulus presentation. Well then your model would look something like this:
{{:tutorials:boldtut_linear_bold_model.png|}}
In words, our model is that the stimulus timing convolved with an unknown hemodynamic response plus noise gives the measured bold response. We want to know what that unknown hemodynamic response looks like, so we need to somehow "unconvolve" the stimulus timing from both sides to solve for the hemodynamic response. How is this done? Well, the above can be written in matrix form. It looks like this:
{{:tutorials:boldtut_matrix_bold_model.png|}}
The matrix on the left is just the way that you write down a convolution of the stimulus times with the hemodynamic response. Each column is the stimulus timing shifted downwards one (note that each column as numbers is zeros with ones every time a stimulus happens). Why does this compute a convolution in which response overlaps sum? Try to puzzle through the matrix multiplication by multiplying one row at a time of the convolution matrix with the hemodynamic response to get each time point of the measured BOLD response and you should see that it does the right thing. Make sense? If not, we will go through a concrete example with matlab in a sec, so try to figure it out there. Ok, what about the dimensions? In the above, we have n=number of timepoints in the experiment and k=number of timepoints in estimated response. Note that we have a choice over k - how many time points to compute out the hemodynamic response for. Typically you want to choose k such that you recover the full hemodynamic response, but not too long since every point you estimate will increase the number of unknowns you are solving for, making your estimates worse.
So, how do we solve this? Well, typically one assumes that noise is white (IID - independent at each time point, identically distributed - which is wrong, btw, since BOLD data typically have temporal auto-correlation - that is, timepoints near each other are correlated). But, going with that assumption, this is just a matrix multiplication and you can use the least-squares solution (i.e. solve for the unknown hemodynamic response that minimizes the squared error between the left and right sides of the equation). Note that the noise here isn't something that we are estimating - it is just the left over residual after the model fit. If we have S as the stimulus convolution matrix, h as the unknown response and B the measured response, that's just an equation like:
S x h = B
The least-squares solution is obtained by multiplying both sides by the transpose of S (in matlab notation S'):
S' x S x h = S' x B
Then multiplying by the inverse of S' x S (that's an important matrix, by the way, called the stimulus variance/covariance and can be used to diagnose problems with your design since the more variance in the design gives you more power to estimate parameters - in fact, this inverse of the variance/covariance matrix is used to project residual variance back to get error bars around estimates). That gives us:
h = (S' x S) ^ -1 x S' x B
And that's it, that's an event-related analysis (sometimes people call this deconvolution or finite impulse response estimation). So, how about trying it on real data?
In Matlab, you should have two variables. timeSeries and stimulusTiming. timeSeries is the BOLD response from a single voxel in a part of primary visual cortex. The data were collected every 1s. Typically the data needs to be high-pass filtered to get rid of slow drifts in signals. The time series also has been divided by its mean and multiplied by 100 so that everything is in percent signal change relative to the average pixel intensity. Finally the mean has been subtracted (this is actually really important, since the model we are fitting here doesn't have a term to account for the mean response, so if you don't subtract the mean things won't work!).
Go ahead and plot these two arrays and see what they look like.
++++Matlab coding hint|Use the plot function++++
++++Matlab cheat sheet|
% clear the figure
clf;
% plot the time series
plot(timeSeries);
hold on
% plot the stimulus timing
plot(stimulusTiming,'k');
% label axis
xlabel('Time (sec)');
ylabel('Response (% signal change)');
++++
++++Answer|
{{:tutorials:modtut_timeseries_and_stim.png|}}
++++
Any hint of what the response is? Let's try to form the stimulus-convolution matrix. Each column of the matrix should be a shifted copy of the stimulusTiming array as we discussed above. Let's compute the response out for 30 seconds. So, that means 30 columns (since each data point is 1s).
++++Matlab coding hint|Use circshift to shift the stimulusTiming array by one volume - note that circshift will wrap around the end of the array to the front of the array which is not exactly what we want, but won't matter in this case since the end of the array has 0's in it).++++
++++Matlab cheat sheet|
% number of volumes
n = length(timeSeries);
% number of volumes to compute out the response for
k = 30;
% pre-allocate the stimulus convolution matrix
s = zeros(n,k);
for iColumn = 1:k
s(:,iColumn) = circshift(stimulusTiming,iColumn-1);
end
++++
If you got that right, you should be able to look at a small part of the matrix (say the first 300 rows) and see that the matrix has the slanted line structure that we talked about above. Go ahead and take a look to confirm that.
++++Matlab hint|Use imagesc++++
++++Matlab cheat sheet|
clf;
% display the first 300 rows of the stimulus convolution matrix
imagesc(s(1:300,:));
++++
++++Answer|
{{:tutorials:boldtut_stimconv.png|}}
++++
Ok. Cool. So, now to compute the event-related responses, just plug into the formula from above for getting the least-squares estimate and plot what you get.
++++Matlab cheat sheet|
% calculate least-squares estimate of response
h = (((s' * s) ^ -1 ) * s') * timeSeries;
% plot what you get
plot(h);
xlabel('Time (sec)');
ylabel('Response (% signal change)');
++++
Should look like the following:
++++Answer|
{{:tutorials:modeltut_h.png|}}
++++
Easy-peasy. Right? One thing - how good a model is this of the response we measured? Whenever you fit a model you should ask whether you have a statistic that tells you that. One that [[http://www.sciencedirect.com/science/article/pii/S0896627305006100|we have used]] is to compute the variance accounted for by the event-related model. To do this, you can take the hemodynamic response you estimated and multiply it back through the stimulus convolution matrix. That will form the ideal response as if every trial created exactly the same response and response overlaps summed linearly. That's your model. Subtract that from the time series to get the residual. Now compute r2 as one minus the ratio of the residual variance to the original variance of the time series.
++++Matlab hint|var computes variance++++
++++Matlab cheat sheet|
residual = timeSeries - s*h;
r2 = 1 - var(residual)/var(timeSeries);
disp(sprintf('r2 = %f',r2));
So about 10% of the variance accounted for - that might sound low, but consider how long the time series is that we are estimating over. The time series has over 4000 time points, many of which are times where the stimulus is not modulating response, and we are summarizing its response with a response of length of 30 - so it's actually not as bad as it sounds. You can confirm that these values are statistically significant by shuffling stimulus times and recomputing r2 values that would have happened by chance and comparing the actual value to this null distribution.
++++
===== Challenge question =====
Ok. Challenge question here. The data from above is actually from a paradigm in which we showed a stimulus either in the left or right visual field. Of course we know that primary visual cortex should only respond to a stimulus in the contra-lateral visual field. So, let's say I give you the stimulus time for the times when the stimulus was in the left visual field, and the times at which the stimulus was in the right visual field. You should get a nice response to the contra-lateral stimulus and a flat line for the ipsi-lateral stimulus. Right? But, how do I compute the response for two different stimuli? Well, same way. You just assume that the responses to the two stimuli sum linearly! Try to draw out how to make the matrix equation do that. Be careful of matrix dimensions - make sure they match-up correctly. How would you do it?
++++Hint 1|Instead of computing one unknown response as above, you will compute two unknown responses. In the matrix equation, this will become one array that has the first unknown response in the first part of the array and the second unknown response in the second part of the array++++
++++Hint 2|If the unknown response is now 2k x 1 (two responses each of length k), what should the dimensions of the stimulus convolution matrix be?++++
++++Answer|
{{:tutorials:modeltut_event2.png|}}
Note how in the middle part of the stimulus convolution matrix we have switched the stimulus timing to the timing for the second stimulus.
++++
Got it? Ok. Then here's your challenge. There's a time series called mysteryTimeSeries. Make the stimulus convolution matrix for stimulusTimingLeft and stimulusTimingRight and compute the response for the mysteryTimeSeries (it will be one long vector with the concatenated responses to the two different types of stimulus).
Which hemisphere did this voxel come from?
===== General Linear Model =====
Quick note here is that the event-related analysis is a form of "general linear model" in which the measured responses is modeled as weighted sums of different variables (in this case, different contributions of events from different time points). The event-related analysis shows you something pretty close to a trial-triggered average (assuming that response overlaps sum linearly) so is fairly close to the original data, but can require a lot of data to fit well since you have many unknowns (every time point in the estimated response is an unknown). If you knew the shape of a hemodynamic response function, then you could simplify the computation down to a single number - the magnitude of the evoked response. That would look like the following:
{{:tutorials:modeltut_glm1.png|}}
All we have done here is insert a "canoncial" hemodynamic response of unit response (typically you either set the amplitude or the area under the curve to equal one), and now only have to estimate a single number to get the magnitude of response (traditionally this number is called a beta weight). This can be effective because it means there are less unknowns to compute, but can be problematic if the canonical hemodynamic response function you have assumed is wrong. Typically, the convolution is precomputed so that each column of the "design matrix" represents the effect of a different stimulus type (you can also put in other columns to represent things like head motion or linear drift) and a set of beta weights are computed.
{{:tutorials:modeltut_glm2.png|}}
The least squares solution for the GLM above is computed in the same way as we did for the event-related analysis.
====== pRF Encoding model ======
If we are interested in where visual receptive fields are, we could continue the approach from above by testing stimuli at different locations of the visual field and then computing event-related responses. But, there is a much better way to do this. That takes us to the idea of encoding models!
Encoding models of various forms have become a powerful tool for analyzing functional imaging data because they can build on knowledge of how different parts of the brain respond to stimuli. A simple, but really useful encoding model is the population receptive field model (see [[http://www.sciencedirect.com/science/article/pii/S1053811907008269|Dumoulin and Wandell (2008)]] that is used routinely to make topographic maps of visual areas in human visual cortex.
First, take a few minutes to go through Dan Birman's excellent [[https://dbirman.github.io/learn/prf/prf.html|conceptual introduction to the population receptive field]].
Got the idea? Cool. A few key points to take away from this. The encoding models are useful because they allow you to predict what responses will be like to stimuli that you have never presented before, which also allows you to decoding - given a response what stimulus caused this response? The idea is very general and powerful - while we do this for a simple receptive field, an encoding model can be anything - could be a sophisticated receptive field model selective for different visual features, could be the output of a deep convolutional network, or a model for language.
So, the basic idea of any encoding model is to make a model that, well, encodes the stimulus. This model is a hypothesis for how you think the area responds. For visual cortex, the simplest model is that neurons respond to a localized position of space, i.e. has a receptive field. As the BOLD measurement will pick up hemodynamic changes related to a population of neurons, the hypothesis will be that there is a "population receptive field" associated with each voxel. That looks something like the following:
|{{ :mrtools:bars.gif|}}|{{:tutorials:modeltut_gaussianprf.png|}} |
Where the stimulus is high-contrast bars moving through the visual field (while the subject fixates at a single position in the center of the screen) and the population receptive field encoding model is a gaussian. The predicted response is simply the projection of the stimulus on to the receptive field - which means that every time frame of the visual stimulus you multiply the visual stimulus with the gaussian receptive field and sum that all up to get the predicted neural response at that time. With that predicted neural response in hand, you can get a prediction for the BOLD response similar to what we did with the event-related response:
{{:tutorials:modeltut_prf_model_convolve.png|}}
That is, we simply convolve with a hemodynamic response. That's it. That's a prediction for the voxel response for a given receptive field at a particular location and width. Of course, if the receptive field is in a different location - or has a different size - it will predict a different BOLD response. Here are two different receptive fields with the neural prediction in black and the BOLD prediction in red. You should be able to see that they make different predictions as the bar stimulus crosses their receptive fields at different times.
| {{:mrtools:prfbars1x400f.gif|}}| {{:mrtools:prfbars2x.gif|}}|
So, to fit the pRF encoding model, you just look for the receptive field location and size that best predicts the BOLD measurement you made at each voxel. You can assume a canonical hemodynamic response function or you can allow that to change to best predict the data as well. Here's a non-linear fitting algorithm searching for the best parameters for a particular voxel. Black is the measured time series, red is the fit and the lower right shows you the population receptive field that gives that prediction.
{{:mrtools:prffit.gif|}}
Now, we'll try and fit this model to real data. Note that we will just do a very rough fit using a grid search so that you get the idea of how the fit is done. For those interested, a more thorough tutorial in which includes non-linear fitting, projected data onto cortical surfaces and decoding is available [[:tutorials:pRF|here]]. For a tutorial on how to do this analysis and mark retiotopic boundaries for your own data using the GUI in mrTools, see [[:mrtools:tutorialsprf|here]].
===== Compute the model neural response =====
==== Examine the stimulus matrix ====
Let's start by just taking a look at the stimulus that was presented to the subject. Remember the subject was looking at the center of the screen where there was a fixation cross on which they had to perform a task (just to keep their attentional state and fixation under control) and bars moving in different directions were shown. Each frame of the image is stored as a 2D matrix that has the pixel values for each position on the screen. For this purpose, the matrix just contains 0's and 1's. Each of these 2D frames are stacked into a 3D matrix where the third dimension is time.
So they just look like images, let's go ahead and display an arbitrary image, say the 36th image, and see what we get.
++++Matlab coding hint|Use the command imagesc++++
++++Matlab Cheat sheet|
% Display the 36th image
imagesc(stimImage(:,:,36));
% set the colormap to grayscale so that you get a black and white image
colormap(gray);
++++
++++Python cheat sheet|
# Display the 36th image, with the gray colormap
plt.imshow(stimImage[:,:,36],cmap="gray");
++++
++++Answer|You should get something like this
{{:tutorials:prf01_horizontal_bar.png|}}
++++
Now that you can display the image, you might want to go ahead and look at the whole sequence. You can write a for loop and display each image at a time and see what it looks like.
++++Matlab coding hint|Do //help for// if you don't know how to specify a for loop. Check the size of the stim array to see how many frames there are, by using the //size// command and loop from 1 to that number. To make sure that the image is drawn each round of the for loop, use the command //drawnow//++++
++++Matlab Cheat sheet|
% loop over all frames of the image
for iFrame = 1:size(stimImage,3)
% draw it
imagesc(stimImage(:,:,iFrame));
% and flush graphics events
drawnow
end
++++
++++Python Cheat sheet|
# This for some reason is excruciatingly slow on my machine
# so only go 30 images instead of showing the whole thing...
# get figure
fig = plt.figure()
# loop over all frames of the image
for iFrame in range(0,30):
# draw it
plt.imshow(stimImage[:,:,iFrame],cmap="gray");
# flush
fig.canvas.draw()
++++
Good, you should have seen a bar move in various directions across the screen. That was what was shown to the subject. Run again, and count how many times a bar crossed the screen - we'll need that number as a consistency check in a following section!
++++Answer|8 times++++
==== Compute a gaussian receptive field ====
Now we need to make our encoding model. In this case it is a simple gaussian receptive field. Let's start by making an arbitrary one at the location (-12, -3) degrees (i.e. 12 degrees to the left and 3 degrees below center) We also need to set the size of the receptive field, let's do one that has a standard deviation of 3 degrees. So, first, your going need to know the equation of a gaussian, right?
++++Equation|
gaussian(x,y) = e^{-((x-x_{center})^2 + (y-y_{center})^2)/{2\sigma^2}}
++++
The x and y coordinates of every point of the stimulus image is in stimX and stimY, so using the 2D equation with those input points, you should be able to compute the gaussian receptive field.
++++Matlab coding hint|The exponential function is //exp// and the power function is //.^// Note that the . before the ^ is used for the matrix operation++++
++++Matlab cheat sheet|
% set variables for the center and width
xCenter = -12;yCenter = -3;rfstd = 3;
% compute the receptive field
rf = exp(-((stimX-xCenter).^2 + (stimY-yCenter).^2)/(2*rfstd^2));
% and display
imagesc(rf);
% use the "hot" colormap to make it prettier
colormap hot
++++
++++Python cheat sheet|
# set variables for the center and width
xCenter = -12
yCenter = -3
rfstd = 3
# compute the receptive field
rf = numpy.exp(-(numpy.power(stimX-xCenter,2) + numpy.power(stimY-yCenter,2))/(2*rfstd**2));
# and display
plt.imshow(rf,cmap="hot");
++++
++++Answer|You should get something like this
{{:tutorials:prf02_gaussian.png|}}
++++
==== Compute the model neural response ====
Now that we have the receptive field and the stimulus, it's time to compute the expected "neural response". The amount of overlap between the stimulus and the receptive will govern the neural response. (For our purposes we won't consider any of the dynamics of neural responses themselves - e.g. transients, adaptation, offset responses, etc etc) To compute the overlap, we just take a point by point multiplication of the stimulus image with the receptive fields and sum across all points. You should now try to compute that for every time point in the stimulus image.
++++Matlab coding hint|To compute the point-by-point multiplication between two matrices you use the operator .* to sum across all points in the matrix you use the function sum. Note that to sum over all points, you need to use the sum operator twice (since each time it will sum over one dimension). Try to first compute the response for one frame, and then once you have that working, then do a for loop++++
++++Matlab cheat sheet|
% loop over frames
for iFrame = 1:size(stimImage,3)
% compute overlap
rfStimulusOverlap = stimImage(:,:,iFrame) .* rf;
% sum over all points
modelNeuralResponse(iFrame) = sum(sum(rfStimulusOverlap));
end
% plot what we got
plot(modelNeuralResponse);
++++
++++Python cheat sheet|
# init array
modelNeuralResponse = numpy.zeros(stimImage.shape[2])
# loop over frames
for iFrame in range(0,stimImage.shape[2]):
# compute overlap
rfStimulusOverlap = numpy.multiply(stimImage[:,:,iFrame],rf);
# sum over all points
modelNeuralResponse[iFrame] = numpy.sum(rfStimulusOverlap);
# plot what we got
plt.clf()
plt.plot(modelNeuralResponse);
++++
How many peaks in this simulated neural response do you see? Does it match the number of times there was a bar moving across the visual field that you counted above?
++++Answer|You should see something like this
{{:tutorials:prf03_response.png|}}
++++
===== Compute the model BOLD response =====
Now we have to convert this model neural response into a model BOLD response. The model that we use for this is that the BOLD response is a convolution of a canonical hemodynamic response function with the neural response. So, go ahead and compute the convolution of the model neural response with the canonical hemodynamic response function (we have provided one in the variable called //canonical//
++++Matlab coding hint|Do a help on the function //conv//++++
++++Matlab Cheat sheet|
% do the convolution
modelBoldResponse = conv(modelNeuralResponse,canonical);
% and display
plot(modelBoldResponse);
++++
++++Python Cheat sheet|
# do the convolution
modelBoldResponse = numpy.convolve(modelNeuralResponse,canonical);
# and display
plt.clf()
plt.plot(modelBoldResponse);
++++
++++Answer|Should look like the following
{{:tutorials:prf04_bold.png|}}
++++
Ok. Sanity check (always good to check that things are making sense as you go along as much as possible). Plot the neural response on the same graph as the bold response and see if they make sense. In particular, the bold response should start **after** the neural response and last longer, right?
++++Matlab coding hint|Use //clf// to clear the figure. To plot two things on the same graph, use //hold on//. The plot command can be used to plot in different colors. Do a help on plot++++
++++Matlab Cheat sheet|
clf;
hold on;
plot(modelNeuralResponse,'k-');
plot(modelBoldResponse,'r-');
++++
++++Python Cheat sheet|
plt.clf();
plt.plot(modelNeuralResponse,'k-');
plt.plot(modelBoldResponse,'r-');
++++
++++Answer|Should look like the following
{{:tutorials:prf05_bold_check.png|}}
++++
Ok. Looks good? What about at the very end. Do you notice that the BOLD response lasts longer than the neural response? That's because the conv command returns a vector longer than the original response. We don't want that last piece, so remove it from the bold model response (otherwise, dimensions won't match for the calculations we do later). Also, let's take the transpose of this modelBoldResponse (also important for matching dimensions up later)
++++Matlab coding hint|//length// can be used to find the length of a vector. To get the subset of a vector, use the techniques we used above to get a portion of the stim image. To take the transpose of a vector use ' ++++
++++Matlab cheat sheet|
modelBoldResponse = modelBoldResponse(1:length(modelNeuralResponse));
++++
++++Python cheat sheet|
modelBoldResponse = numpy.transpose(modelBoldResponse[0:modelNeuralResponse.shape[0]]);
++++
===== Compare the model to measured responses =====
Ok. So, we can compute the model BOLD response for a specific receptive field - now we want to know how to compare that to measured responses. We're going to use the time-point by time-point correlation between the model and the measured responses. Correlation is a nice measure to work with, since it isn't affected by the magnitude of responses. Remember that we haven't really done anything to make the units of the model match the units of response (look back at the units you have for the model BOLD response - typical BOLD responses are about 1 or 2% signal change - is that the size of the fluctuations you have in the model? (no. right!). We'll get back to that in a bit below, but here we'll just ignore the whole issue by using correlation. Ok, we have two actual time series for you to work with //tSeries1// and //tSeries2// (they were measured from a subject viewing the stimuli). Go ahead and compute the correlation between these time series and the measured bold response from above. What do you get? For which tSeries is this model rf a good model for?
++++Matlab coding hint|Do a help on //corr// and make sure that both the tSeries and modelBoldResponse are vertical arrays++++
++++Matlab cheat sheet|
corr(tSeries1(:),modelBoldResponse(:))
corr(tSeries2(:),modelBoldResponse(:))
++++
++++Pyton cheat sheet|
numpy.corrcoef(tSeries1,modelBoldResponse)[0,1]
numpy.corrcoef(tSeries2,modelBoldResponse)[0,1]
++++
++++Answer|
The correlation for tSeries1 is: 0.839672
The correlation for tSeries2 is: 0.1711145
So, the model RF is best for tSeries1.
++++
Ok. Let's confirm that visually, by plotting the model and the time series together and see that they match. Now, we have to worry a little about the magnitudes which are going to be different between the model and the actual bold responses. Let's handle that by finding that scale factor. How do we do that (hint - what if I call that scale factor a "beta weight"?). Also, since our time series have the mean subtracted off, we need to subtract the mean off of the model bold response as well (do that first and then compute the scale factor) - general maxim here - if you do something to your data, you need to do the same thing to the model.
++++Matlab cheat sheet|
% make the tSeries and the bold model into vertical arrays so that the computation is the same as the GLM above
tSeries1 = tSeries1(:);
modelBoldResponse = modelBoldResponse(:);
% subtract the mean off of the model
modelBoldResponse = modelBoldResponse-mean(modelBoldResponse);
% calculate the least-squares solution for how to weight the modelBoldResponse to get the tSeries1
scaleFactor = ((modelBoldResponse' * modelBoldResponse)^-1)*modelBoldResponse'*tSeries1;
% and rescale the modelBoldResponse (note that this is scaled to best match tSeries1 and not tSeries2 -
% but you should see that it doesn't really matter - the modelBoldResponse is clearly not a good
% fit for tSeries2
modelBoldResponse = modelBoldResponse*scaleFactor;
% now plot
clf;
hold on;
plot(tSeries1,'k-');
plot(tSeries2,'b-');
plot(modelBoldResponse,'r-');
% label stuff
xlabel('Volume number');
ylabel('Model and time series response');
legend('tSeries1','tSeries2','modelBoldResponse')
++++
++++Python cheat sheet|
# normalize time series
tSeries1 = tSeries1-numpy.mean(tSeries1);
tSeries1 = tSeries1/numpy.sqrt(numpy.sum(numpy.power(tSeries1,2)));
tSeries2 = tSeries2-numpy.mean(tSeries2);
tSeries2 = tSeries2/numpy.sqrt(numpy.sum(numpy.power(tSeries2,2)));
# everything that is done to the data should also be done to the model
modelBoldResponse = modelBoldResponse-numpy.mean(modelBoldResponse);
modelBoldResponse = modelBoldResponse/numpy.sqrt(numpy.sum(numpy.power(modelBoldResponse,2)));
# now plot
plt.clf()
plt.plot(tSeries1,'k-')
plt.plot(tSeries2,'b-')
plt.plot(modelBoldResponse,'r-')
# label stuff
plt.xlabel('Volume number')
plt.ylabel('Model and time series response')
plt.legend(['tSeries1','tSeries2','modelBoldResponse'])
++++
So, should be a great match for tSeries1 and a pretty bad match for tSeries2.
++++Answer|
{{:tutorials:prf06_response_model.png|}}
++++
One thing about the correlation values. They are a measure of goodness-of-fit. If you square the value then it is mathematically equivalent to the r2 value we calculated above for the event-related model. Really? Check it. Compute the r^2 like we did above. Compute the ratio of the residual variance (subtract the model from the time series) to the original variance. r2 is one minus that ratio. Is that the same as the correlation?
++++Matlab cheat sheet|
% compute r2 as one minus the ratio of the residual variance to the original variance
1 - var(tSeries1-modelBoldResponse)/var(tSeries1)
% compare to the square of the correlation
corr(tSeries1,modelBoldResponse)^2
++++
Come out the same? Math. Kinda amazing isn't it. Anyway, this is super important because if your fit is bad (in this case low correlation) then the parameters you fit cannot be interpreted (i.e. the x, y and std of the population receptive field)! So, always (whether you are doing an encoding model or a GLM or whatever) check goodness of fit before trying to extrapolate anything from what parameters you got. To use technical terms, if goodness-of-fit is crap then your parameter estimates will be crap too.
===== Challenge question =====
Ok, so of course the example above for tSeries1 was totally cooked - I gave you the parameters that would pretty closely match the response. How can we find the best parameters for tSeries2 (and in general any time series for which we don't know what the receptive field parameters are). After all, that's the whole goal of what we are doing right? To figure out what the receptive field model is that best explains the responses of each voxel. We'll do a quick and dirty way to get close to the right parameters by doing a grid search. What's a grid search? It's just forming a grid of parameters and testing to see which model parameters on this grid give a model response that best fits the voxel. In this case, our grid will be over x and y position of the receptive field and the width (standard deviation). Let's try x and y from -15 to 15 degrees in steps of 1 and standard deviations from 1 to 5 in steps of 1. If you iterate over all combinations of these parameters, compute predicted model responses and compare the correlation with tSeries2, you should be able to figure out the best receptive field for tSeries2!
What are the best receptive field parameters for tSeries2?
++++pseudo code hint|
init current best r to -inf
Loop over x, y and s
compute gaussian rf for this x, y and s value
compute model neural response
compute model bold response
compute the correlation of the model bold response with tSeries2
if this correlation is better than the current best r, then
update r
keep the params
print out the best r and best params
++++