Change of Measure/Girsanov’s Theorem Explained

Change of Measure or Girsanov’s Theorem is such an important theorem in Real Analysis or Quantitative Finance. Unfortunately, I never really understood it until much later after having left school. I blamed it to the professors and the textbook authors, of course.  The textbook version usually goes like this.

Given a probability space {\Omega,\mathcal{F},P}, and a non-negative random variable Z satisfying \mathbb{E}(Z) = 1 (why 1?). We then defined a new probability measure Q by the formula, for all A in \mathcal{F}.

Q(A) = \int _AZ(\omega)dP(w)

Any random variable X, a measurable process adapted to the natural filtration of the \mathcal{F}, now has two expectations, one under the original probability measure P, which denoted as \mathbb{E}_P(X), and the other under the new probability measure Q, denoted as \mathbb{E}_Q(X). They are related to each other by the formula

\mathbb{E}_Q(X) = \mathbb{E}_P(XZ)

If P(Z > 0) = 1, then P and Q agree on the null sets. We say Z is the Radon-Nikodym derivatives of Q with respect to P, and we write Z = \frac{dQ}{dP}. To remove the mean, μ, of a Brownian motion, we define

Z=\exp \left ( -\mu X - \frac{1}{2} \mu^2 \right )

Then under the probability measure Q, the random variable Y = X + μ is standard normal. In particular, \mathbb{E}_Q(X) = 0 (so what?).

This text made no sense to me when I first read it in school. It was very frustrated that the text was filled with unfamiliar terms like probability space and adaptation, and scary symbols like integration and \frac{dQ}{dP}. (I knew what \frac{dy}{dx} meant when y was a function and x a variable. But what on earth were dQ over dP?)

Now after I have become a professor to teach students in finance or financial math, I would get rid of all the jargon and rigorousness. I would focus on the intuition rather than the math details (traders are not mathematicians). Here is my laymen version.

Given a probability measure P. A probability measure is just a function that assigns numbers to a random variable, e.g., 0.5 to head and 0.5 to tail for a fair coin. There could be another measure Q that assigns different numbers to the head and tail, say, 0.6 and 0.4 (an unfair coin)! Assume P and Q are equivalent, meaning that they agree on what events are possible (positive probabilities) and what events have 0 probability. Is there a relation between P and Q? It turns out to be a resounding yes!

Let’s define Z=\frac{Q}{P}. Z here is a function as P and Q are just functions. Z is evaluated to be 0.6/0.5 and 0.4/0.5. Then we have

\mathbb{E}_Q(X) = \mathbb{E}_P(XZ)

This is intuitively true when doing some symbol cancellation. Forget about the proof even though it is quite easy like 2 lines. We traders don’t care about proof. Therefore, the distribution of X under Q is (by plugging in the indicator function in the last equation):

\mathbb{E}_Q(X \in A) = \mathbb{E}_P(I(X \in A)Z)

Moreover, setting X = 1, we have (Z here is a random variable):

\mathbb{E}_Q(X) = 1 = \mathbb{E}_P(Z)

These results hold in general, especially for the Gaussian random variable and hence Brownian motion. Suppose we have a random (i.e., stochastic) process generated by (adapted to) a Brownian motion and it has a drift μ under a probability measure P. We can find an equivalent measure Q so that under Q, this random process has a 0 drift. Wiki has a picture that shows the same random process under the two different measures: each of the 30 paths in the picture has a different probability under P and Q.

The change of measure, Z, is a function of the original drift (as would be guessed) and is given by:

Z=\exp \left ( -\mu X - \frac{1}{2} \mu^2 \right )

For a 0 drift process, hence no increment, the expectation of the future value of the process is the same as the current value (a laymen way of saying that the process is a martingale.) Therefore, with the ability to remove the drift of any random process (by finding a suitable Q using the Z formula), we are ready to do options pricing.

Now, if you understand my presentation and go back to the textbook version, you should have a much better understanding and easier read, I hope.


Java vs c++ performance

It is very unfortunate that some people are still not aware of the fact that Java performance is comparable to that of C++. This blog piece collects the evidence to support this claim.

The wrong perception about Java slowness is by-and-large because Java 1 in 1995 was indeed slower than C++. Java has improved a lot since then, e.g., hotspot. It is now version 6 and soon will be version 7. Java is now a competitive technology comparing to C/C++. In fact, in order to realistically optimize for C/C++, you need to find the “right” programmer to code it. This programmer needs to be aware of all the performance issues of C/C++, profiling, code optimization such as loop unfolding, and may even need to write code snippets in assembly. An average Joe coding in C/C++ is probably not any faster than coding in Java.

(I am in general against code optimization techniques because they make the code unreadable to humans, hence unmaintainable, such as a lot of the FORTRAN/C/C++ code found in Netlib and Statlib.)

More importantly, most modern software runs on multiple cores. Code optimization techniques are dwarfed by parallel computing technologies. It is significantly easier and more efficient (and more enjoyable) to write concurrent programming code in Java than in C++. Therefore, to code high performance software, I personally prefer to code for multi-core, multi-CPU, and cloud in java rather than doing code optimization in C/C++.


Briefly, my conclusion is that, among the general purpose programming languages (hence excluding Scala and etc.), we should use Java instead of C/C++, FORTRAN, Assembly and etc. whenever possible because Java is the easiest programming language to learn and work with without a sacrifice in performance.

(For me, Java is an easier language than C# because the Java IDE technology is far much better than the C# counterparts.)

The evidence I collect are listed below. Please feel free to expand the list.

  1. Java has recently won some major benchmark competitions.
  2. Recent independent studies seem to show that Java performance for high performance computing (HPC) is similar to FORTRAN on computation intensive benchmarks.


SuanShu Introduction

Numerical Method Inc. publishes SuanShu, a Java numerical and statistical library. The objective of SuanShu is to enable very easy programming of engineering applications. Programmers are able to program mathematics in a way that the source code is solidly object-oriented and individually testable. SuanShu source code adheres to the strictest coding standard so that it is readable, maintainable, and can be easily modified and extended.

SuanShu revolutionizes how numerical computing is traditionally done, e.g., netlib, gsl. The repositories of these most popular and somewhat “standard” libraries are rather collections of ad-hoc source code in obsolete languages, e.g., FORTRAN and C. One biggest problem of these code is that they are not readable (for most modern programmers), hence unmaintainable. For example, it is quite a challenge to understand AS 288, let alone improving it. Other problems include, but not limited to, the lack of data structure, duplicated code, being entirely procedural, very bad variable naming, abuse of GOTO, the lack of test cases, insufficient documentations, the lack of IDE support, inconvenient linking to modern languages such as Java, being unfriendly to parallel computing, etc.

To address these problems, SuanShu designs a framework of reusable math components (not procedures) so that programmers can put components together like Legos to build more complex algorithms. SuanShu is written from anew so that it conforms to the modern programming paradigm such as variable naming, code structuring, reusability, readability, maintainability, as well as software engineering procedure. To ensure very high quality of the code and very few bugs, SuanShu has a few thousands of unit test cases that run daily.

The basic of SuanShu covers the following.

– numerical differentiation and integration
– polynomial and Jenkin-Straub
– root finding
– unconstrained and constrained optimization for univariate and multivariate functions
– linear algebra: matrix operations and factorization
– sparse matrix
– descriptive statistics
– random sampling from distributions

Comparing to competing products, SuanShu, as we believe, has the most extensive coverage in statistics. SuanShu covers the following.

– Ordinary Least Square (OLS) regression
– Generalized Linear Model (GLM) regression
– a full suite of residual analysis
– Stochastic Differential Equation (SDE) simulation
– a comprehensive library of hypothesis testing: Kolmogorov-Smirnov, D’Agostino, Jarque-Bera, Lilliefors, Shapiro-Wilk, One-way ANOVA, T, Kruskal-Wallis, Siegel-Tukey, Van der Waerden, Wilcoxon rank sum, Wilcoxon signed rank, Breusch-Pagan, ADF, Bartlett, Brown-Forsythe, F, Levene, Pearson’s Chi-square, Portmanteau
– time series analysis, univariate and multivariate
– ARIMA, GARCH modelling, simulation, fitting, and prediction
– sample and theoretical auto-correlation
– cointegration
– hidden Markov chain
– Kalman filter

For the full article, please read “SuanShu Introduction“.

Class Parsimony

Hello World!

Today I have my first post ever on the Numerical Method blog. I’d like to discuss a design decision when writing our library. This is to give our friends some insight into how the SuanShu library works. Please leave me feedbacks and comments so that we can continue to improve the product for you.

We have decided to make the class library as parsimonious as possible to avoid method pollution. This is inspired by the jMatrices’ white paper. The challenge is to organize the methods by minimal and correct packages.

I will illustrate this with SuanShu’s Matrix packages.

The Matrix class has only 26 methods, of which 9 of them are constructors and the related; 3 are overrides for the AbstractMatrix interfaces; 8 are overrides for the MatrixSpace interfaces. Only 6 of them are class specific to make calling these methods convenient for the user. The other dozens of matrix operations, such as the different factorizations, properties like rank, transformations like inverse, are grouped into multiple classes and packages. In most cases, each of these operations is a class on its own.

For instance, the inverse operation itself is a class inheriting from Matrix. The constructor takes as input a Matrix to invert. For example, to find the inverse for

A = \begin{bmatrix} 1 & 2 & 3 \\ 6 & 5 & 4 \\ 8 & 7 & 9 \end{bmatrix},

we code

Matrix A = new Matrix(new double[][]{
{1, 2, 3},
{6, 5, 4},
{8, 7, 9}

Matrix Ainv = new Inverse(A);

SuanShu computes

A^{-1} = \begin{bmatrix} -0.809524 & -0.142857 & 0.333333 \\ 1.047619 & 0.714286 & -0.666667 \\ -0.095238 & -0.428571 & 0.333333 \end{bmatrix}

It is important to note that Ainv is a Matrix, created by the keyword new, not by a method call.

In summary, we choose to have 100s of classes, rather than to have a class with 100s of methods. Each class is kept deliberately short. This class parsimony principle is a key design decision guiding the whole library development.