Quasi-Newton methods is a framework of algorithims that is built on the Newton-Rhason method and the line search in this method is based on a  n*n matrix S, which serves as the same purpose as the inverse Hessian matrix in the Newton-Rhapson Method. The Quasi-Newton method is yet another framework of algorithims that do not rquire the Hessian. The expression in Quasi-Newton method can be given as

 x_{k+1}=x_k-\alpha_kS_kg_k

When  S_K=I , it is the algorithim for the steepest-descent method. When  S_k=H^{-1} , the algorithim is for the Newton-Rhapson method. The above equation of  x_{k+1} converges fastest when  S_k=H^{-1} . Quasi-Newton methods combines the advantages of the steepest-descent and the conjugate-direction methods. They are among the most efficient methods and are used extensively in many applications.

Assume that the difference in successive  x_k is  \delta_k=x_{k+1}-x_k and the difference in successive gradient is  g_k is  \gamma_k=g_{k+1}-g_k , then we have  \gamma_k=H\delta_k . The Newton-Rhapson method determines the direction vector as  d_k=-S_kg_k . The magnitude  \alpha_k can be determined by a line search minimizing the function  f(x_k+\alpha  d_k) . Now here we want to avoid inverting a matrix such as computing  S_k=H^{-1} and checking that  S_k is positive definite in each iteration, rather we simply want an updating rule for  S_{k+1} such as 

 S_{k+1}S_k+C_k

Here  C_k is a  n*n correction matrix that can be computed form available data. In any definition of  C_k ,  S_{k+1} must satisfy these properties:

1.  S_{k+1}\gamma_k=\delta_k .

2. Vectors  \{\delta_0,\delta_1,...,\delta_{n-1}\} must be conjugate direction vectors.

3. A positive definite  S_k must produce a positive definite  S_{k+1} .

The second property states that the advantages of tyhe conjugat-direction method method applies to Quasi-Newton method. The third method states that the direction vector is valid in each iteration. There are many ways to define  C_k and they give rise to different variants of the Quasi-Newton method. 

Rank-One Method

The Rank-Method method is featured by the correction matrix  C_k has a unity rank. Now we assume that 

 S_{k+1}\gamma_k=\delta_k .

Let

 S_{k+1}=S_k\gamma_k+\beta\xi_k\xi_k^T\gamma_k

And 

 \gamma^T_k(\delta_k-S_k\gamma_k)=\beta_k\gamma^T_k\xi_k\xi_k^T\gamma_k=\beta_k(\xi^T_k\gamma_k)^2

 (\delta_k-S_k\gamma_k)(\delta_k-S_k\gamma_k)^T=\beta_k(\xi_k^T\gamma_k)^2\beta_k\xi_k\xi_k^T

Then we have

\displaystyle \beta_k\xi_k\xi_k^T=\frac{(\delta_k-S_K\gamma_k)(\delta_k-S_k\gamma_k)^T}{\gamma_k^T(\delta_k-S_k\gamma_k)}

Therefore,

 S_{k+1}=S_k+\beta_k\xi_k\xi_k^T .

There are two problems with this method. First, the positive definite  S_k may not give a positive definite  S_{k+1} . In such a case, the next direction is not a good direction. Second, the denominator in the correction formula may approach or become equal to zero. If that happens, the method breaks down as  S_{k+1} is undefined.

Davidon-Fletcher-Powell Method

The Davidon-Fletcher-Powell (DPF) method is similar to the rank-one method but it has one important advantage, that if the initial matrix  S_0 s positive definite, then the subsequent matrices are always positive definite. Unlike the Raank-One method, every new direction is a descent direction. 

The DPF formula is

\displaystyle S_{k+1}=S_k+\frac{\delta_k\delta_k^T}{\delta_k^T\gamma_k}-\frac{S_k\gamma_k\gamma_k^Ts_k}{\gamma_k^TS_k\gamma_k}

The correction is an n*n symmetric matrix of rank two.

Broyden-Fletcher-Goldfarb-Shanno Method

The BFGS method formula is

 \displaystyle S_{k+1}=S_k+(1+\frac{\gamma^T_kS_k\gamma_k}{\gamma_k^T\delta_k})\frac{\delta_k\delta_k^T}{\gamma^T_k\delta_k}-\frac{\delta_k\gamma_k^TS_k+S_k\gamma_k\delta_k^T}{\gamma_k^T\delta_k}

The BFGS method has the following properies:

1.  S_{k+1} becomes identical to  H^{-1} for  =n-1 .

2. Direction vectors  \{\delta_i\}_{i1,2,...,n-1} form a conjugate set.

3.  S_{k+1} is the positive definite if  S_k is positive definite.

4.  \delta_k^T\gamma_k=\delta^T_kg_{k+1}-\delta^T_kg_k>0 .

BFGS method is the best method among all.

Huang Family Method

Huang method is a general formula that includes the rank-one, DFP, BFGS, Pearson and McCormick ethods. It has the following form

 \displaystyle S_{k+1}=S_k+\frac{\delta_k(\theta\delta_k+\phi S^T_k\gamma_k)^T}{(\theta\delta_k+\phi S^T_k\gamma_k)^T\gamma_k}-\frac{S_k\gamma_k(\psi\delta_k+\omega S^T_k\gamma_k)^T}{(\psi\delta_k+\omega S^T_k\gamma_k)^T\gamma_k}

 \theta,\, \phi,\, \psi and  \omega are independent parameters.

By letting  theta=1,\, \phi=-1,\, \psi=1,\, \omega=-1, we have the Rank-One formula.

By letting  theta=1,\, \phi=0,\, \psi=0,\, \omega=1, we have the DFP formula.

By letting  theta=1,\, \phi=0,\, \psi=1,\, \omega=0, we have the McCormick formula:

\displaystyle S_{k+1}=S_k+\frac{(\delta_k-S_k\gamma_k)\delta^T_k}{\delta_k^T\gamma_k}

By letting  theta=0,\, \phi=1,\, \psi=0,\, \omega=1, we have the Pearson Formula:

 \displaystyle S_{k+1}=S_k+\frac{(\delta_k-S_k\gamma_k)\gamma_k^TS_k}{\gamma_k^TS_k\gamma_k}

By letting  \displaystyle \frac{\phi}{\theta}=\frac{-\gamma_k\delta^T_k}{\gamma_k\delta_k^T+\gamma^T_kS_k\gamma_k},\,\psi=1,\, \omega=0, we have the BFGS formula.

The following code minimizes the Himmelblau function  f(x_1,x_2)=(x_1^2+x_2-11)^2+(x_1+x_2^2-7)^2 using different Quasi-Newton methods.

 

				
					//The Himmelblau function: 
// f(x) = (x1^2 + x2 - 11)^2 + (x1 + x2^2 - 7)^2
 
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),

double result = pow(x1 * x1 + x2 - 11, 2),
result += 12 * pow(x1 + x2 * x2 - 7, 2),
return result
}
override public int dimensionofDomain() {
return 2
}
override public int dimensionofRange() {
return 1
}
}

val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),

double w1 = x1 * x1 + x2 - 11,
double w2 = x1 + x2 * x2 -7,
    
double[] result = new double[2],
result[0] = 4 * w1 * x1 + 2 * w2,
result[1] = 2 * w1 + 4 * w2 * x2,
return new DenseVector (result)
}  
override public int dimensionofDomain() {
return 2
}
override public int dimensionofRange() {
return 2
}
}

val problem: C2OptimProblem = new C2OptimProblemImpl(f, g)
val solver : QuasiNewtonMinimizer RankOneMinimizer = new RankOneMinimizer(
1e-16,
15)
val soln1: RankOneMinimizer.solve(problem)
val xmin1: soln1.search(new DenseVector(new double[]{6, 6}))
val fmin1= f.evaluate(xmin1)
System.out.println(String.format("f(%s) = %.16f", xmin1.toString(), fmin1))

val solver : QuasiNewtonMinimizer DFPMinimizer = new DFPMinimizer(
1e-16,
15)
val soln2: DFPMinimizer.solve(problem)
val xmin2: soln2.search(new DenseVector(new double[]{6, 6}))
val fmin2= f.evaluate(xmin2)
System.out.println(String.format("f(%s) = %.16f", xmin2.toString(), fmin2))

val solver : QuasiNewtonMinimizer BFGSMinimizer = new BFGSMinimizer(
false,
1e-16,
15)
val soln3: BFGSMinimizer.solve(problem)
val xmin3: soln3.search(new DenseVector(new double[]{6, 6}))
val fmin3= f.evaluate(xmin3)
System.out.println(String.format("f(%s) = %.16f", xmin3.toString(), fmin3))

val solver : QuasiNewtonMinimizer HuangMinimizer = new HuangMinimizer(
0,
1,
0,
1,
1e-16,
15)
val soln4: HuangMinimizer.solve(problem)
val xmin4: soln4.search(new DenseVector(new double[]{6, 6}))
val fmin4= f.evaluate(xmin4)
System.out.println(String.format("f(%s) = %.16f", xmin4.toString(), fmin4))

val solver : QuasiNewtonMinimizer PearsonMinimizer = new PearsonMinimizer(
1e-16,
15)
val soln5: PearsonMinimizer.solve(problem)
val xmin5: soln5.search(new DenseVector(new double[]{6, 6}))
val fmin5= f.evaluate(xmin5)
System.out.println(String.format("f(%s) = %.16f", xmin5.toString(), fmin5))
				
			

The output is:

				
					f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000
f([3.000000, 2.000000]) = 0.000000000000000
				
			

The Quasi-Newton methods are much more efficient than any of the Conjugate-Drection methods. They take less iterations than the Conjugate-Direction methods and give etter accuracy and precision.