Steepest-descent method algorithms are family of algorithms that use the gradient and Hessian information to search in a direction that gives the biggest reduction in the function value. The gradients is denoted by g and Hessia is denoted by H. We will first discuss the first-order methods in which only the gradient is used. 

Now we consider the optimization problem

 minF=f(x)\,\,\, for \, x\epsilon E^n

Taylor expansion of the above gives

 F+\Delta F=f(x+\delta)\approx f(x)+g^T\delta+0(\delta^T\delta)

 

 \Delta F\approx \sum^{n}_{i=1}g_i\delta_i=||g||\,||\delta||\,cos\theta

For any vector  \delta,\, \Delta F is maximum when  \theta=0 , which implies that  \delta is in the same direction as g.  \Delta F is maximum when  \theta=\pi , which implies that  \delta is in the opposite direction of g or simply we can say that it is in the direction of -g. They both are defined as steepest-ascent and steepest-descent directions respectively.  

The general framework of a steepest-descent algorithim is as follows:

1. Specify the initial point  x_0 and the tolerance  \epsilon .

2. Calculate the gradient  g_k and set the Newton direction vector  d_k=-g_k .

3. Minimize  f(x_k+ad_k) with respect to  \alpha to compute the increment  \alpha_k .

4. Set the next point  x_{k+1}=x_k+a_kd_k and compute the next function value  f_{k+1}=f(x_{k+1}) .

5. If  ||a_kd_k||<\varepsilon , then,

      a. Done.  x^*=x_{k+1} and  f(x^*)=f_{k+1} .

      b. Else, repeat step 2. 

In step 3,  a_k can be computed by using a line search using a univariate optimization alogorithim. Alternatively,  a_k can be computed analitically.                                                                        

The following code minimizes the given function using the first-order steepest-descent method.

 f(x_1,x_2,x_3,x_4)=(x_1-4x_2)^4+12(x_3-x_4)^4+3(x_2-10x_3)^2+55(x_1-2x_4)^2
				
					// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double result = pow(x1- 4 * x2, 4),
result += 12 * pow(x3 - 4 * x2, 4),
result += 3 * pow(x2 -10 * x3, 2),
result += 55 * pow(x1 - 2 * x4, 2),
return result
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

// the gradient function
val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double[] result = new double[4],
result[0] = 4 * pow(x1 - 4 * x2, 3) + 110 * (x1 - 2 * x4),
result[1] = -16 * pow(x1 - 4 * x2, 3) + 6 * (x2 - 10 * x3),
result[2] = 48 * pow(x3 - x4, 3) + 60 * (x2 - 10 * x3),
result[3] = -48 * pow(x3 - x4, 3) - 220 * (x1 - 2 * x4),
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// construct an optimization problem
val problem: C2OptimProblem = new C2OptimProblemImpl(f, g); // only gradient information
val solver : SteepestDescentMinimizer firstOrderMinimizer = new FirstOrderMinimizer(
1e-8,
40000)
val soln: firstOrderMinimizer.solve(problem)

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
val fmin= f.evaluate(xmin)
println(String.format("f(%s) = %f", xmin.toString(), fmin))
				
			

The output is:

				
					f([0.046380, 0.0016330, 0.001634, 0.023189]) = 0.000003
				
			

It takes about 40,000 iterations, which is very slow.

Newton-Rapshon Method

The second order methods use a nonsingular Hessian function in additionto the gradient to determine the vector  d_k and the the increment  a_k in the steepest-descent methods. One such method is the Newton-Rhapson method. The newton-Rhapson method expands the Taylor series to the second order. Let  \delta be the change in  x , so we have:
 \displaystyle f(x+\delta)\approx fx)+\sum_{j=1}^{n}\frac{\partial f}{\partial x_i}\delta_i+\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{\partial^2 f}{\partial x_i \partial x_j}\delta_i\delta_j
Now diffrentiating the above equation w.r.t.  \delta_k to minimize  f(x+\delta) ,we have
 \displaystyle \frac{\partial f}{\partial x_k}+\sum_{i=1}^{n}\frac{\partial^2 f}{\partial x_i \partial _j}\delta_i=0,\,\,\, for\, k=1,2,...,n
Now rewritin the above equation in matrix form, we have
 g=-H\delta
The optimal change in x is given by
 \delta=-H^{-1}g
Now to compute the maximum reduction in  f(x) along the direction, a line search is used. The next point  x_{k+1} is:
 x_{k+1}=x_k+\delta_k=x_k+\alpha_k d_k
where,  d_k=-H^{-1}_k g_k and  \alpha_k is the  \alpha that minimizes  f(x_k+\alpha d_k) .
The Newton-Rhapson method compared with the first-order methods, uses the Hessian information to speed up the convergence, especially when it is close to the solution. The following code minimizes the same function using the Newton-Rhapson method.

				
					// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double result = pow(x1 - 4 * x2, 4),
result += 12 * pow(x3 - 4 * x2, 4),
result += 3 * pow(x2 -10 * x3, 2),
result += 55 * pow(x1 - 2 * x4, 2),

return result
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

// the gradient function
val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double[] result = new double[4],
result[0] = 4 * pow(x1 - 4 * x2, 3) + 110 * (x1 - 2 * x4),
result[1] = -16 * pow(x1 - 4 * x2, 3) + 6 * (x2 - 10 * x3),
result[2] = 48 * pow(x3 - x4, 3) + 60 * (x2 - 10 * x3),
result[3] = -48 * pow(x3 - x4, 3) - 220 * (x1 - 2 * x4),
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// construct an optimization problem
val problem: C2OptimProblem = new C2OptimProblemImpl(f, g) // uses numerical Hessian
val solver : SteepestDescentMinimizer NewtonRhapsonMinimizer = new NewtonRhapsonMinimizer(
1e-8,
20)
val soln: NewtonRhapsonMinimizer.solve(problem)

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
val fmin= f.evaluate(xmin)
println(String.format("f(%s) = %f", xmin.toString(), fmin))
				
			

The output is:

				
					f([0.000134, -0.000009, 0.000067]) = 0.000000
				
			

The Newton-Rhapson method converges much faster than the first-order method. It only uses 20 iterations with much better accuracy compared to the 40000 iterations in the first-order method.

Gauss-Newton Method

The Gauss-Newton minimizes an objective function in the following form,

 f=[f_1(x)\, f_2(x)\, .....\, f_m(x)]^T

The solution of the above equation is a point  x such that all  f_p(x) are reduced to zero simultaneously. We can construct a real-valued function  F [/latex such that <img src="https://s0.wp.com/latex.php?latex&bg=ffffff&fg=000&s=0&c=20201002" alt="" class="latex" /> F is minimized, so are functions  f_p(x) in the least-square sense such that,

\displaystyle F=\sum_{p=1}^{m}f_p(x)^2=ff^TF

Now differentiating the above equation w.r.t.,  x_i , we have,

 \displaystyle \frac{\partial F}{\partial x_i}=\sum_{p=1}^{m}2f_p(x)\frac{\partial f_p}{\partial x_i}

Otherwise, in the matrix form, it can be given as,  g_F=2J^Tf

Now differentiating again, we have

 \displaystyle \frac{\partial^2 F}{\partial x_i \partial x_j}\approx 2\sum_{p=1}^{m}\frac{\partial f_p}{\partial x_i}\frac{\partial f_p}{\partial x_j}

The Hessian  H_F needs to be nonsingular and positive definite and is given as  F_F\approx 2J^TJ

The next point is given by the recursive relation:
 x_{k+1}=x_k-\alpha_k(2J^TJ)^{-1}(2J^Tf) .

The following code solves the same minimization problem using the Gauss-Newton method.

 

				
					// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double[] fx= new double[4],
fx[0] = pow(x1 - 4 * x2, 2),
fx[1] = sqrt(12) * pow(x3 - x4, 2),
fx[3] = sqrt(3) * (x2 - 10 * x3),
fx[4] = sqrt(55) * (x1 - 2 * x4),
return new DenseVector (fx)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// the Jacobian
val J: RntoMatrix J = object : new RntoMatrix() {
override Matrix evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

Matrix Jx = new DenseMatrix(4, 4),

double value = 2 * (x1 - 4 * x2),
Jx.set(1, 1, value),

value = -8 * (x1 - 4 * x2),
Jx.set(1, 2, value),

value = 2 * sqrt(12) * (x3 - x4),
Jx.set(2, 3, value),
Jx.set(2, 4, -value),

Jx.set(3, 2, sqrt(3)),
Jx.set(3, 3, -10 * sqrt(3)),

Jx.set(4, 1, sqrt(55)),
Jx.set(4, 4, -2 * sqrt(55)),

return Jx

}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

val solver : GaussNewtonMinimizer optim1 = new GaussNewtonMinimizer(
1e-8,
10)
val soln: optim1.solve(f, J) //analytical gradient

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
println(String.format("f(%s) = %s", xmin.toString(), f.evaluate(xmin).toString()))
				
			

The output is:

				
					f([0.000007, -0.000000, -0.000000, 0.000003]) = [0.000000, 0.000000, -0.000000, -0.000000]
				
			

The Gauss-Newton method converges much faster than the Newton-Rhapson method. It only uses 10 iterations with much better accuracy compared to the 40000 iterations in the first-order method and 20 iterations used by the Newton-Rhapson Method.