Change of Measure or Girsanov’s Theorem is such an important theorem in Real Analysis or Quantitative Finance. Unfortunately, I never really understood it until much later after having left school. I blamed it to the professors and the textbook authors, of course.  The textbook version usually goes like this.

Given a probability space {\Omega,\mathcal{F},P}, and a non-negative random variable Z satisfying \mathbb{E}(Z) = 1 (why 1?). We then defined a new probability measure Q by the formula, for all A in \mathcal{F}.

Q(A) = \int _AZ(\omega)dP(w)

Any random variable X, a measurable process adapted to the natural filtration of the \mathcal{F}, now has two expectations, one under the original probability measure P, which denoted as \mathbb{E}_P(X), and the other under the new probability measure Q, denoted as \mathbb{E}_Q(X). They are related to each other by the formula

\mathbb{E}_Q(X) = \mathbb{E}_P(XZ)

If P(Z > 0) = 1, then P and Q agree on the null sets. We say Z is the Radon-Nikodym derivatives of Q with respect to P, and we write Z = \frac{dQ}{dP}. To remove the mean, μ, of a Brownian motion, we define

Z=\exp \left ( -\mu X - \frac{1}{2} \mu^2 \right )

Then under the probability measure Q, the random variable Y = X + μ is standard normal. In particular, \mathbb{E}_Q(X) = 0 (so what?).

This text made no sense to me when I first read it in school. It was very frustrated that the text was filled with unfamiliar terms like probability space and adaptation, and scary symbols like integration and \frac{dQ}{dP}. (I knew what \frac{dy}{dx} meant when y was a function and x a variable. But what on earth were dQ over dP?)

Now after I have become a professor to teach students in finance or financial math, I would get rid of all the jargon and rigorousness. I would focus on the intuition rather than the math details (traders are not mathematicians). Here is my laymen version.

Given a probability measure P. A probability measure is just a function that assigns numbers to a random variable, e.g., 0.5 to head and 0.5 to tail for a fair coin. There could be another measure Q that assigns different numbers to the head and tail, say, 0.6 and 0.4 (an unfair coin)! Assume P and Q are equivalent, meaning that they agree on what events are possible (positive probabilities) and what events have 0 probability. Is there a relation between P and Q? It turns out to be a resounding yes!

Let’s define Z=\frac{Q}{P}. Z here is a function as P and Q are just functions. Z is evaluated to be 0.6/0.5 and 0.4/0.5. Then we have

\mathbb{E}_Q(X) = \mathbb{E}_P(XZ)

This is intuitively true when doing some symbol cancellation. Forget about the proof even though it is quite easy like 2 lines. We traders don’t care about proof. Therefore, the distribution of X under Q is (by plugging in the indicator function in the last equation):

\mathbb{E}_Q(X \in A) = \mathbb{E}_P(I(X \in A)Z)

Moreover, setting X = 1, we have (Z here is a random variable):

\mathbb{E}_Q(X) = 1 = \mathbb{E}_P(Z)

These results hold in general, especially for the Gaussian random variable and hence Brownian motion. Suppose we have a random (i.e., stochastic) process generated by (adapted to) a Brownian motion and it has a drift μ under a probability measure P. We can find an equivalent measure Q so that under Q, this random process has a 0 drift. Wiki has a picture that shows the same random process under the two different measures: each of the 30 paths in the picture has a different probability under P and Q.

The change of measure, Z, is a function of the original drift (as would be guessed) and is given by:

Z=\exp \left ( -\mu X - \frac{1}{2} \mu^2 \right )

For a 0 drift process, hence no increment, the expectation of the future value of the process is the same as the current value (a laymen way of saying that the process is a martingale.) Therefore, with the ability to remove the drift of any random process (by finding a suitable Q using the Z formula), we are ready to do options pricing.

Now, if you understand my presentation and go back to the textbook version, you should have a much better understanding and easier read, I hope.

References:

17 Comments

  1. Thank you. This stopped instantly my 6 hours of struggling to understand this subject.

    • This instantly stopped my 6 hours of struggling to understand this subject. After reading this article it is clear.

  2. Thank you a lot ! Your students must love you for your sense of pedagogy

  3. Thanks for the explanation. However I am confused about this:

    Moreover, setting X = 1, we have (Z here is a random variable):
    \mathbb{E}_Q(X) = 1 = \mathbb{E}_P(Z)

    Why is this true? \mathbb{E}_Q(X) = 1

    • Hi,

      I completely agree. I think it is a misleading statement. It is obvious that expectation of a random variable X (e.g. price) is not 1. I guess this was referring to an integral (continuous RV) / sum (discrete RV) of its pdf/pmf.

  4. Thank you so much for writing this. Incredibly helpful, great explanation. I wish I would have found it sooner!

  5. Didn’t you forget about the indicator function on the left hand side in the 3rd equation (from the bottom)?

  6. Thank you so much!! I could finally understand change of measure intuitively 🙂

  7. A student goes to beach, party, or lecture, if a coin shows tail T, head H, or falls on edge E. Enjoyment is X = {T = 1, H = 2, E = -10}. The probabilities of a fair coin C are P(C) = {0.5, 0.5, 0}.
    For a specially manufactured unfair coin U, favoring sport over drinking, they are P(U) = {0.6, 0.4, 0}. We still maintain zero for E making the probability measures P(C) and P(U) “equivalent”, what helps us later to avoid division by zero. The events and values X are the same but their probabilities change. The average enjoyment is
    E(X with C) = 1 * 0.5 + 2 * 0.5 + (-10) * 0 = 1.5 or E(X with U) = 1 * 0.6 + 2 * 0.4 + (-10) * 0 = 1.4. We define Z = P(U)/P(C) = {0.6/0.5 = 1.2, 0.4/0.5 = 0.8, ignore}. Z is not probabilities (do not sum to one) but their “corresponding to events or values X” ratios. We ignore undefined value of Z here but could ignore zero probabilities earlier: adding values of impossible events to a random variable changes nothing for us. Here T, H, E, E(C), E(U); X, P(C), P(U); and Z are deterministic values, sets, and function. Only the coins C and U (how they fall) are random.

    We can reproduce E(X with U) = 1.4 using P(C), if replace X with X * Z = {1 * 0.6/0.5 = 1.2, 2 * 0.4/0.5 = 1.6}. The values
    X are multiplied by the transforming function. Indeed, E(XZ with C) = 1.2 * 0.5 + 1.6 * 0.5 = 1.4. This is the mean of the product XZ under a different (not P(U)) measure.

    Using finite number of discrete events instead of a continuous random variable provides simpler explanations including the theorem on changing the probability measure. But a probability of any value of a continuous random variable is exactly zero. One could not apply the same explanation to a Gaussian variable facing with 0/0 elsewhere. The non-zero measures for the latter are introduced for intervals of values. While point probabilities in the continuous case are zeros, the ratio dP(U)/dP(C) on corresponding each to other shrinking intervals can be a finite number, even, in the limit and then we can get similar conclusions and use this useful technique.

    Suddenly, Igor Vladimirovich Girsanov, a talented mathematician, pupil of Dynkin and Kolmogorov died in Sayan Mountains on March 16, 1967 at the age of 32 years. Similar to the coin U, he favored
    sport – alpinism. It was 50 years ago. One of his contributions: Girsanov, Igor “On Transforming a Certain Class of Stochastic Processes by Absolutely Continuous Substitution of Measures”,
    Theory of Probability and its Applications, Volume 5, No 3, pp. 314 – 330, 1960 is translated to several languages and famous. Best Regards, Valerii

  8. I believe there is a mistake in the above – most likely for the example of removing drift has been copied from Shreve, who has the mistake in his book as well: when you remove the drift in that example, both Z and Y are missing a “t” in their second term: Y = X + mu*t, and Z = exp( -muX -1/2 * mu^2 t).

  9. Very helpful, I’ve needed this explanation for 15 years!


Add a Comment