Quasi-Newton methods is a framework of algorithims that is built on the Newton-Rhason method and the line search in this method is based on a $n*n$ matrix S, which serves as the same purpose as the inverse Hessian matrix in the Newton-Rhapson Method. The Quasi-Newton method is yet another framework of algorithims that do not rquire the Hessian. The expression in Quasi-Newton method can be given as

$x_{k+1}=x_k-\alpha_kS_kg_k$

When $S_K=I$, it is the algorithim for the steepest-descent method. When $S_k=H^{-1}$, the algorithim is for the Newton-Rhapson method. The above equation of $x_{k+1}$ converges fastest when $S_k=H^{-1}$. Quasi-Newton methods combines the advantages of the steepest-descent and the conjugate-direction methods. They are among the most efficient methods and are used extensively in many applications.

Assume that the difference in successive $x_k$ is $\delta_k=x_{k+1}-x_k$ and the difference in successive gradient is $g_k$ is $\gamma_k=g_{k+1}-g_k$, then we have $\gamma_k=H\delta_k$. The Newton-Rhapson method determines the direction vector as $d_k=-S_kg_k$. The magnitude $\alpha_k$ can be determined by a line search minimizing the function $f(x_k+\alpha d_k)$. Now here we want to avoid inverting a matrix such as computing $S_k=H^{-1}$ and checking that $S_k$ is positive definite in each iteration, rather we simply want an updating rule for $S_{k+1}$ such as

$S_{k+1}S_k+C_k$

Here $C_k$ is a $n*n$ correction matrix that can be computed form available data. In any definition of $C_k$, $S_{k+1}$ must satisfy these properties:

1. $S_{k+1}\gamma_k=\delta_k$.

2. Vectors $\{\delta_0,\delta_1,...,\delta_{n-1}\}$ must be conjugate direction vectors.

3. A positive definite $S_k$ must produce a positive definite $S_{k+1}$.

The second property states that the advantages of tyhe conjugat-direction method method applies to Quasi-Newton method. The third method states that the direction vector is valid in each iteration. There are many ways to define $C_k$ and they give rise to different variants of the Quasi-Newton method.

Rank-One Method

The Rank-Method method is featured by the correction matrix $C_k$ has a unity rank. Now we assume that

$S_{k+1}\gamma_k=\delta_k$.

Let

$S_{k+1}=S_k\gamma_k+\beta\xi_k\xi_k^T\gamma_k$

And

$\gamma^T_k(\delta_k-S_k\gamma_k)=\beta_k\gamma^T_k\xi_k\xi_k^T\gamma_k=\beta_k(\xi^T_k\gamma_k)^2$

$(\delta_k-S_k\gamma_k)(\delta_k-S_k\gamma_k)^T=\beta_k(\xi_k^T\gamma_k)^2\beta_k\xi_k\xi_k^T$

Then we have

$\displaystyle \beta_k\xi_k\xi_k^T=\frac{(\delta_k-S_K\gamma_k)(\delta_k-S_k\gamma_k)^T}{\gamma_k^T(\delta_k-S_k\gamma_k)}$

Therefore,

$S_{k+1}=S_k+\beta_k\xi_k\xi_k^T$.

There are two problems with this method. First, the positive definite $S_k$ may not give a positive definite $S_{k+1}$. In such a case, the next direction is not a good direction. Second, the denominator in the correction formula may approach or become equal to zero. If that happens, the method breaks down as $S_{k+1}$ is undefined.

Davidon-Fletcher-Powell Method

The Davidon-Fletcher-Powell (DPF) method is similar to the rank-one method but it has one important advantage, that if the initial matrix $S_0$ s positive definite, then the subsequent matrices are always positive definite. Unlike the Raank-One method, every new direction is a descent direction.

The DPF formula is

$\displaystyle S_{k+1}=S_k+\frac{\delta_k\delta_k^T}{\delta_k^T\gamma_k}-\frac{S_k\gamma_k\gamma_k^Ts_k}{\gamma_k^TS_k\gamma_k}$

The correction is an $n*n$ symmetric matrix of rank two.

Broyden-Fletcher-Goldfarb-Shanno Method

The BFGS method formula is

$\displaystyle S_{k+1}=S_k+(1+\frac{\gamma^T_kS_k\gamma_k}{\gamma_k^T\delta_k})\frac{\delta_k\delta_k^T}{\gamma^T_k\delta_k}-\frac{\delta_k\gamma_k^TS_k+S_k\gamma_k\delta_k^T}{\gamma_k^T\delta_k}$

The BFGS method has the following properies:

1. $S_{k+1}$ becomes identical to $H^{-1}$ for $=n-1$.

2. Direction vectors $\{\delta_i\}_{i1,2,...,n-1}$ form a conjugate set.

3. $S_{k+1}$ is the positive definite if $S_k$ is positive definite.

4. $\delta_k^T\gamma_k=\delta^T_kg_{k+1}-\delta^T_kg_k>0$.

BFGS method is the best method among all.

Huang Family Method

Huang method is a general formula that includes the rank-one, DFP, BFGS, Pearson and McCormick ethods. It has the following form

$\displaystyle S_{k+1}=S_k+\frac{\delta_k(\theta\delta_k+\phi S^T_k\gamma_k)^T}{(\theta\delta_k+\phi S^T_k\gamma_k)^T\gamma_k}-\frac{S_k\gamma_k(\psi\delta_k+\omega S^T_k\gamma_k)^T}{(\psi\delta_k+\omega S^T_k\gamma_k)^T\gamma_k}$

$\theta,\, \phi,\, \psi$ and $\omega$ are independent parameters.

By letting $theta=1,\, \phi=-1,\, \psi=1,\, \omega=-1,$ we have the Rank-One formula.

By letting $theta=1,\, \phi=0,\, \psi=0,\, \omega=1,$ we have the DFP formula.

By letting $theta=1,\, \phi=0,\, \psi=1,\, \omega=0,$ we have the McCormick formula:

$\displaystyle S_{k+1}=S_k+\frac{(\delta_k-S_k\gamma_k)\delta^T_k}{\delta_k^T\gamma_k}$

By letting $theta=0,\, \phi=1,\, \psi=0,\, \omega=1,$ we have the Pearson Formula:

$\displaystyle S_{k+1}=S_k+\frac{(\delta_k-S_k\gamma_k)\gamma_k^TS_k}{\gamma_k^TS_k\gamma_k}$

By letting $\displaystyle \frac{\phi}{\theta}=\frac{-\gamma_k\delta^T_k}{\gamma_k\delta_k^T+\gamma^T_kS_k\gamma_k},\,\psi=1,\, \omega=0,$ we have the BFGS formula.

The following code minimizes the Himmelblau function $f(x_1,x_2)=(x_1^2+x_2-11)^2+(x_1+x_2^2-7)^2$ using different Quasi-Newton methods.

				
//The Himmelblau function:
// f(x) = (x1^2 + x2 - 11)^2 + (x1 + x2^2 - 7)^2

val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),

double result = pow(x1 * x1 + x2 - 11, 2),
result += 12 * pow(x1 + x2 * x2 - 7, 2),
return result
}
override public int dimensionofDomain() {
return 2
}
override public int dimensionofRange() {
return 1
}
}

val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),

double w1 = x1 * x1 + x2 - 11,
double w2 = x1 + x2 * x2 -7,

double[] result = new double[2],
result[0] = 4 * w1 * x1 + 2 * w2,
result[1] = 2 * w1 + 4 * w2 * x2,
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 2
}
override public int dimensionofRange() {
return 2
}
}

val problem: C2OptimProblem = new C2OptimProblemImpl(f, g)
val solver : QuasiNewtonMinimizer RankOneMinimizer = new RankOneMinimizer(
1e-16,
15)
val soln1: RankOneMinimizer.solve(problem)
val xmin1: soln1.search(new DenseVector(new double[]{6, 6}))
val fmin1= f.evaluate(xmin1)
System.out.println(String.format("f(%s) = %.16f", xmin1.toString(), fmin1))

val solver : QuasiNewtonMinimizer DFPMinimizer = new DFPMinimizer(
1e-16,
15)
val soln2: DFPMinimizer.solve(problem)
val xmin2: soln2.search(new DenseVector(new double[]{6, 6}))
val fmin2= f.evaluate(xmin2)
System.out.println(String.format("f(%s) = %.16f", xmin2.toString(), fmin2))

val solver : QuasiNewtonMinimizer BFGSMinimizer = new BFGSMinimizer(
false,
1e-16,
15)
val soln3: BFGSMinimizer.solve(problem)
val xmin3: soln3.search(new DenseVector(new double[]{6, 6}))
val fmin3= f.evaluate(xmin3)
System.out.println(String.format("f(%s) = %.16f", xmin3.toString(), fmin3))

val solver : QuasiNewtonMinimizer HuangMinimizer = new HuangMinimizer(
0,
1,
0,
1,
1e-16,
15)
val soln4: HuangMinimizer.solve(problem)
val xmin4: soln4.search(new DenseVector(new double[]{6, 6}))
val fmin4= f.evaluate(xmin4)
System.out.println(String.format("f(%s) = %.16f", xmin4.toString(), fmin4))

val solver : QuasiNewtonMinimizer PearsonMinimizer = new PearsonMinimizer(
1e-16,
15)
val soln5: PearsonMinimizer.solve(problem)
val xmin5: soln5.search(new DenseVector(new double[]{6, 6}))
val fmin5= f.evaluate(xmin5)
System.out.println(String.format("f(%s) = %.16f", xmin5.toString(), fmin5))



The output is:

				
f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000



The Quasi-Newton methods are much more efficient than any of the Conjugate-Drection methods. They take less iterations than the Conjugate-Direction methods and give etter accuracy and precision.