Diffusion model

Signal Detection Theory can be used to get bias-free measures of subject's sensitivity, predict performance on a two alternative forced choice and has been quite useful both in psychophysics and in cognitive neuroscience where it is often used to link physiology to behavior. But, one thing it completely sweeps under the carpet is time. There is no notion of how the decision process evolves over time and so therefore it has nothing to say about reaction times. So, diffusion models.

There are many variations of diffusion models, but the basic flavor is to think of a decision variable which accumulates evidence for a choice (generically, let's call that “A”) against another choice (“B”). In the Newsome et al., experiments these would be a choice for, say, leftwards vs rightwards motion. The decision variable gets momentary evidence that is corrupted by noise (can be anything, but typically gaussian) which is integrated across time. This causes the decision variable to move randomly (brownian motion) until it hits (gets absorbed by) one of the decision criterion (bounds). Whichever bound it hits first is the decision that wins. When it hits gives you the reaction time. You should have the following picture in your head of a single trial:

The evolution of the decision variable is literally governed by integrating random draws from a gaussian distribution whose mean governs how much evidence is acquired in each time interval and whose standard deviation is associated with how noise that evidence is. So, if you have strong evidence for choice “A”, then you would have a large positive mean. For example, if a neuron in MT fires a lot in its preferred direction, then it will contribute a large mean and the trial to trial variability will correspond to the standard deviation of the evidence. Sometimes, due to the randomness of each sample from the evidence distribution, the decision variable may take a random walk to the other boundary and a wrong decision will be made. Note, that we are, for the purpose of this tutorial, always simulating a decision process in which the correct decision is “A” (upper bound), since we are adding a positive mean to evidence.

Across many trials then, having a small amount of moment-by-moment evidence (mean of the gaussian variable) relative to variability (standard deviation of the gaussian variable) accumulation will more often go to the correct bound then the incorrect bound. The time it takes to get to each bound (RT) forms a distribution and you can see that the distributions have a bit of a skew to them.

Ok. Get the basic idea? Let's try it out ourselves.

Simulate a trial

So, the idea is to have a decision variable that over time integrates information from a random gaussian process with some mean and standard deviation. Let's start by simulating that (arbitrarily) for a process with mean = 0.1 and standard deviation = 1. Let's have the decision variable, d, start at 0. We will stop the process when it hits either the top bound, A, or the bottom bound -B. For this case, we will just have those bounds equidistant from the starting point. So A=B, and we'll just stop the process when the absolute value of d gets bigger than A. Let's make A for this simulation 10. Got it?

If you know how to program this go ahead and do so. Otherwise, check the hints:

hint 1
To generate samples from a gaussian noise distribution you can use the matlab function random using the option 'norm'. Try help random to see if you can figure out how to use that. On each step of the simulation, you will add one pull from this random distribution to the current value of d until the absolute value of d gets greater than A (which we set to 10). You can do this in a while loop. Note, you should keep around the whole history of d, not just the last value that it was set at, so that we can plot it later.

answer

d = 0;
A = 10;
% check the last value of d (we use matlab's end keyword 
% to get the last value in the d array - when we start this
% will just be the first value, but on each iteration of the
% loop this will increment)
while abs(d(end)) < A
  % put a new value at the end of the array for d which is
  % the last value of d plus a pull from the random distribution
  d(end+1) = d(end) + random('norm',0.1,1);
end

Now, go ahead and plot it.

cheat sheet

plot(d)
% let's also set the axis so that we can see the bounds
axis([0 length(d) -A-2 A+2]);
% and let's draw the bounds
hold on
plot([0 length(d)],[A A],'k-');
plot([0 length(d)],[-A -A],'k-');
% and label the graph
xlabel('Time');
ylabel('Decision variable');

Compare

Simulate multiple trials

Ok, good so far? Let's try to run that process for multiple trials (say n=100) and plot each trial as we go along. We should also keep track of what decision was made (i.e. which bound was chosen) and when that decision was made. That way at the end we can look at the predicted percent correct and the reaction time distribution - these are the basic predictions of the model. So, go ahead and try to modify the code from above to do that.

hint 1
First off, you may want to put the code for above into your text editor so that it is easy for you to modify. Now wrap that code in a for loop that goes from 1 to n (n=100). On each iteration if the process stops at the upper bound increment a counter for correct, otherwise increment a counter for incorrect. In separate arrays keep the length of the process when it terminated - these are the reaction times. And, make sure to plot the d on each loop.

cheat sheet

% set up the simulation variables
A = 10;
n = 100;
nCorrect = 0;
nError = 0;
correctRT = [];
errorRT = [];
% loop over trials
for i = 1:n
  d = 0;
  % simulate each trial
  while abs(d(end)) < A
    % put a new value at the end of the array for d the last value
    % of d plus a pull from the random distribution
    d(end+1) = d(end) + random('norm',0.1,1);
  end
  % plot the trial
  plot(d);drawnow
  % if it was correct
  if d(end)>A
    % then keep score
    nCorrect = nCorrect+1;
    correctRT(end+1) = length(d);
  else
    nError = nError + 1;
    errorRT(end+1) = length(d);
  end
end

Compare

Ok. So, let's see what we got. Calculate the percentage of correct trials vs the percentage of incorrect trials (what do you expect these to be like?)

hint 1
This can be easily done if you have the counts from above, just divide nCorrect by the number of trials n and same for error.

cheat sheet 1

nCorrect/n
nError/n

And, let's look at the reaction time distribution (this might be more meaningful for the correct trials in which you have enough trials to be able to see what is going on.

hint 2
Try using the function hist

cheat sheet 2

% clear the figure
clf
% Let's plot 20 bins
hist(correctRT,20);

What does the distribution look like? Is it gaussian? Is it skewed? Why does it have the shape that it does?

If you have that all working, then try the same simulation, but don't give any sensory evidence. Decisions are just based on chance fluctuations. You should expect percent correct to equal percent errors. Try it.

hint 3
Modify the code from above to pull from a random distribution where the mean is 0.

cheat sheet 3

% clear the figure, so we can start over
clf;hold on;
% set up the simulation variables
A = 10;
n = 100;
nCorrect = 0;
nError = 0;
correctRT = [];
errorRT = [];
% loop over trials
for i = 1:n
  d = 0;
  % simulate each trial
  while abs(d(end)) < A
    % put a new value at the end of the array for d the last value
    % of d plus a pull from the random distribution
    d(end+1) = d(end) + random('norm',0,1);
  end
  % plot the trial
  plot(d);drawnow
  % if it was correct
  if d(end)>A
    % then keep score
    nCorrect = nCorrect+1;
    correctRT(end+1) = length(d);
  else
    nError = nError + 1;
    errorRT(end+1) = length(d);
  end
end
% draw the bounds
a = axis;
plot([a(1) a(2)],[A A],'k-');
plot([a(1) a(2)],[-A -A],'k-');

Compare

Ok, what percent correct and incorrect did you get. What happened to reaction times? Why?

cheat sheet

nCorrect/n
nError/n
figure;
hist(correctRT,20);
xlabel('RT');ylabel('count');
figure
hist(errorRT,20);
xlabel('RT');ylabel('count');

The thing to note about this, is that even thought the gaussian generating process had 0 mean - you still ended up on each trial meandering to one of the bounds. Random walks don't just jitter around the starting place and never go anywhere - by integration of random values they end up, by chance, going somewhere.

Try playing around with the model in different ways - what does changing the height of the bounds do? Or changing the mean? Or, the standard deviation. What about using a different distribution, not gaussian (why might you want to do that?). What about the starting point of the process?

Ok, so you should now have a rough idea of how changing different parameters of the model can affect the accuracy and reaction times. So, er, Pop quiz, what parameters can you change to affect the accuracy and reaction time?

Answer 1 The mean of the random process will cause you to accumulate larger increments of evidence for the correct decision. It will cause you to more often reach the correct upper bound “A” and will make you do that faster. You could also change the standard deviation of the gaussian process. If it had 0 standard deviation, then there would be no noise at all and you would always go predictably in a straight line up to the bound. You could also make the bounds farther away from the starting point. This will make the process have to traverse farther to get to the decision, making it less likely that you will get false alarms at the other boundary. But, this comes at a cost - it will make the decisions take a longer time to get to the bound. This is your classic speed-accuracy trade-off. If you haven't tried to play with these ideas yet with the simulation, then do so now!

Fit behavioral data

Ok, so that's all great, but how do you fit this beast to behavioral data? Well, one answer is to do just what you have been doing - simulate many variations of the parameters until you get something that matches by some fit criteria (usually maximizes the likelihood of your data). A non-linear search algorithm can adjust the parameters for you to search the space of parameters more efficiently and you just let it go at it and come back some time later to see what it gets. While this may seem inefficient, there are times when you have to do this. In particular, if you make the model more complicated (various versions have been proposed in which you add randomness not just do the moment to moment evidence, but also to the starting point for instance) or if you want to fit the whole shape of your reaction time distributions you may wish to do a fit like this.

But, if you just want to fit your percent correct and mean reaction times. There are closed-form solutions that give you equations for how to do this.

P_{correct} = \frac{1}{1+e^{-2{\mu}A}}

This first equations tells you what to expect for the probability correct, given the bounds (assuming symmetrical bounds) A and the mean of the process μ. Where does this come from? Well it comes from a mathematical derivation which is a bit complicated (see: here for a helpful primer). But, you can think of it as the solution to the following problem. Given bounds A and a gaussian process with some mean and standard deviation, what is the probability that the process will end up at the top bound instead of the bottom bound. The solution is the above equation.

Let's take a look at the form of the equation and see if it makes sense. How does it change as you make μ bigger and bigger? You know what to expect, right? So, punch the equation into matlab and see what it does for different values.

hint Make an array mu of values from 0 to, say, 10 in steps of, say, 0.1. Then plot mu against the equation above. In matlab, the exponential function is called exp. Also, you will need to know that when you write the equation the division symbol needs to be preceded by a . this tells matlab to do the equation for each element of the array mu that you make, rather than treat the expression as a matrix multiplication

answer

A = 1;
mu = 0:0.1:10;
plot(mu,1./(1+exp(-mu*A)));
xlabel('mu');
ylabel('P');

What does it look like? Where does it start and where does it end? Does the shape make sense?

Answer When mu is 0, that means there is no evidence, just noise, so you should get chance performance = 0.5. It should rise up to 1 as you get more and more evidence (i.e. make mu bigger).

But, wait a second. What happened to the standard deviation? You know from above that it matters. But, it's not in the equation, why? Well, it actually should be. The full equation should read:

P_{correct} = \frac{1}{1+e^{\frac{-2{\mu}A}{\sigma^{2}}}}

Trouble is, changing A and changing sigma amount to the same thing, just in opposite directions. They are both unobserved variables, so you cannot estimate both of them. So, you just estimate one and say that you estimate the ratio of the two. You might say the same thing about μ and A, but take a look first at the solution for when on average you hit the bound:

RT_{mean} = \frac{A}{\mu}tanh(\mu A)

Ok, what shape does that give you? Well, plot it as a function of μ just like you did for the percent correct equation and see what it looks like.

hint tanh is the hyperbolic tangent function and matlab has that function build in. Again, be careful to put . before your multiplications and divisions so that you do the operation for each value of mu

answer

A = 1;
mu = 0:0.1:10;
plot(mu,(A./mu).*tanh(mu*A));
xlabel('mu');
ylabel('RT');

What do you get for that? Does the shape of it make sense? What happens as mu gets larger and larger?

Answer At mu=0, there is no evidence, so you get the longest reaction times (it takes a long time on average) for the process to meander to one of the bounds. As mu gets larger and larger, there is more sensory evidence and you reach the bounds quicker

One thing to note here. This is the average reaction time. Remember that the distributions from above look skewed? This does not tell you anything about the skew of the distribution, just the central tendency. The model actually predicts a particular skewed shape of the reaction time distribution and that may be important. It is not fit, when you use the above equation.

Another thing. What are the units here? We want them to be in milliseconds, for example. If the monkey takes, say, 800ms to make the decision, then this equation needs to be fit such that you get a value of 800. Right? So, change A and see what happens. Can you get a value of 800 out of the equation?

Hint Just use the code from above, and try different values of A

Answer

A = 100;
mu = 0:0.1:10;
plot(mu,(A./mu).*tanh(mu*A));
xlabel('mu');
ylabel('RT');

You see that you get some value of mu that will give you 800 ms? You can now imagine playing around with values of mu and A to fit your reaction time data. But, what's the longest reaction time you should expect for some A and mu? Is it actually plotted on the graph?

Answer No. The computation A/mu results in a NaN when mu is 0, so it is not showing up on the graph

But, we may want to know what to expect for a reaction time when mu is 0 and there is no sensory evidence (this is a very important case, right? Remember how choice probabilities were calculated in the Newsome experiment?) So, to do that you need to take limits:

RT_{mean} = \lim_{\mu \to 0}\frac{A}{\mu}tanh(\mu A) = A^{2}

Ok. Good, so now we have an expression for the reaction times as a function of all possible μ

But, hey, wait a second. Notice something else here? What value is the shortest reaction time that the model predicts?

answer In the limit, 0. If you make mu really large (lots of evidence), the equation predicts that you could reach the bound ever faster, in no time at all!

But, this isn't really true for behavioral data. There must be some inherent delays associated with the time visual input is transduced and motor delays, at least, right? So, to fix that people put in, a, well, fudge factor - an independent “non decision” time. So, you get another free parameter to fit:

RT_{mean} = \frac{A}{\mu}tanh(\mu A) + \tau_{non decision}

Fit behavioral data from motion task

From the above, we now know that to fit behavioral data, we have two free parameters μ and A. How do Shadlen et al., fit their behavioral data? Well they know that as you increase motion coherence, that neurons in MT increase their response approximately linearly Britten KH, Shadlen MN, Newsome WT, Movshon JA (1993) Vis Neurosci 10:1157–1169 <pdf>. Here are a few examples of MT neurons responding increasing (or decreasing) as motion coherence is increased in their preferred (or anti-preferred) direction:

So, they assume that increasing coherence (C) increases the μ of the evidence process linearly. You need a scaling parameter, k, to get the parameters right. So, the equations they use, are simply the ones from above where μ has been replaced with kC (remember that C is known to them - it is the motion coherence that they used).

Pop quiz. What would you do if you were trying to model decisions and you changed image contrast, not coherence?

Answer You should make mu change according to the sigmoidal equation (called a Naka-Rushton equation) that is used to model neurons response as a function of contrast

So, let's take some values from fits to monkey behavioral data from Huk & Shadlen, 2005, Figure 3. Where k = 0.426, A = 22.811 and tau (non decision time) = 417ms. Coherence of motion is varied from 0 to .50. What do we get?

Answer

% parameters
k = 0.426;
A = 22.811;
tau = 417;
C = 0:0.01:0.5;
% percent correct
p = 1./(1+exp(-2*k*C*A));
rt = (A./(k*C)).*tanh(k*C*A) + tau;
% get the reaction time for when coherence = 0
rt(1) = A^2 + tau;
% now plot as a function of coherence
figure;
subplot(1,2,1);
plot(C,p);
xlabel('Motion coherence');
ylabel('Percent correct');
subplot(1,2,2);
plot(C,rt);
xlabel('Motion coherence');
ylabel('RT (msec)');

It is convenient to plot on the same graph, the performance for the opposite motion direction as well (typically because we are studying a neuron and we want to know about choices made in the preferred direction vs the anti-preferred direction). When we do this, by convention, we call negative motion coherence more evidence for motion in the anti-preferred direction and we plot as a function of choices to the preferred direction. Try and plot the full curve this way.

hint Use the same p as before, but take 1-p since that will give you the percent choice in the preferred direction. The RT is the same as above. Add each of these, plotting against -C, to the graph you just made

answer

subplot(1,2,1);hold on
plot(-C,1-p);
subplot(1,2,2);hold on
plot(-C,rt);

compare

So, that is the solution using the equations. Does it actually match our simulation? Well, try it out. Set the parameters from above, for a low coherence, say 0.05, and see if you get the expectation from the equations. BTW, what is the expectation?

answer Plug the values into the equation for p and RT from above. You should get (rounding the value a bit): 0.73 for probability correct and for RT, you should get: 899.9

So, run the simulation now with the parameters we have, and see what pCorrect we get.

solution

% clear figure
clf;hold on
% set up the simulation variables
k = 0.426;
A = 22.811;
tau = 417;
C = 0.05;
n = 100;
nCorrect = 0;
nError = 0;
correctRT = [];
errorRT = [];
% loop over trials
for i = 1:n
  d = 0;
  % simulate each trial
  while abs(d(end)) < A
    % put a new value at the end of the array for d the last value
    % of d plus a pull from the random distribution
    d(end+1) = d(end) + random('norm',k*C,1);
  end
  % plot the trial
  plot(d);drawnow;
  % if it was correct
  if d(end)>A
    % then keep score
    nCorrect = nCorrect+1;
    correctRT(end+1) = length(d);
  else
    nError = nError + 1;
    errorRT(end+1) = length(d);
  end
end
% now see if the values match our expectations
nCorrect/n
tau+mean(correctRT)

What does this imply about the underlying decision process? Well, let's see what on average the process looks like for. To do this, simulate the decision process from above, but this time keep a copy of each actual decision process (just for correct trials), so that we can average and look at it afterwords. That means instead of one array d, we should make a matrix d with rows equal to trials of the decision process and columns equal to time points. Note this is a bit tricky, since each decision process can last a different amount of time. In the end, we will put the value of the bound into each decision process for the time that it terminates to the end of the simulation time (since the assumption is that the process is just absorbed into the bound). This all can be done with a few modifications of the code from above. If you know what to do, go ahead and do that.

solution

% set up the simulation variables
k = 0.426;
A = 22.811;
tau = 417;
C = 0.05;
n = 100;
nCorrect = 0;
nError = 0;
correctRT = [];
errorRT = [];
correctTrials = [];
incorrectTrials = [];
% initialize d to something reasonably sized (it will expand as needed)
d = nan(n,2000);
% loop over trials
for i = 1:n
  % display i just so that we know things are working
  i
  % initialize decision variable
  d(i,1) = 0;t = 1;
  % simulate each trial
  while abs(d(i,t)) < A
    % put a new value at the end of the array for d the last value
    % of d plus a pull from the random distribution
    d(i,t+1) = d(i,t) + random('norm',k*C,1);
    t = t + 1;
  end
  % keep all reaction times
  rt(i) = t;
  % if it was correct
  if d(i,t)>A
    % then keep score
    nCorrect = nCorrect+1;
    correctRT(end+1) = t;
    % keep track of which trials are correct
    correctTrials(end+1) = i;
  else 
    % same for incorrect
    nError = nError + 1;
    errorRT(end+1) = t;
    incorrectTrials(end+1) = i;
  end
end
% now fill in all the values past the end of decision time
for i = 1:n
  if any(correctTrials==i)
    d(i,rt(i)+1:end) = A;
  else 
    d(i,rt(i)+1:end) = -A;
  end
end
% ok, now we have a full matrix with simulation runs. Take the average over correct trials
meanCorrect = mean(d(correctTrials,:));
% plot it
plot(meanCorrect);
xlabel('Time (ms)');
ylabel('Decision Variable');
hold on

compare

Now, try running the same simulation, but set the coherence higher, to say 0.2, which should make the task much easier and plot that on the same graph. What does it look like?

compare

If you do that across the coherences used in Roitman & Shadlen, 2002 pdf, would it look like what they recorded from LIP?

Gardner Lab

Stanford University

Table of Contents

Diffusion model

Simulate a trial

Simulate multiple trials

Fit behavioral data

Fit behavioral data from motion task

Gardner Lab

Stanford University

Site Tools

Table of Contents

Diffusion model

Simulate a trial

Simulate multiple trials

Fit behavioral data

Fit behavioral data from motion task

Page Tools