Steepest-descent method algorithms are family of algorithms that use the gradient and Hessian information to search in a direction that gives the biggest reduction in the function value. The gradients is denoted by g and Hessia is denoted by H. We will first discuss the first-order methods in which only the gradient is used.

Now we consider the optimization problem

$minF=f(x)\,\,\, for \, x\epsilon E^n$

Taylor expansion of the above gives

$F+\Delta F=f(x+\delta)\approx f(x)+g^T\delta+0(\delta^T\delta)$

$\Delta F\approx \sum^{n}_{i=1}g_i\delta_i=||g||\,||\delta||\,cos\theta$

For any vector $\delta,\, \Delta F$ is maximum when $\theta=0$, which implies that $\delta$ is in the same direction as g. $\Delta F$ is maximum when $\theta=\pi$, which implies that $\delta$ is in the opposite direction of g or simply we can say that it is in the direction of -g. They both are defined as steepest-ascent and steepest-descent directions respectively.

The general framework of a steepest-descent algorithim is as follows:

1. Specify the initial point $x_0$ and the tolerance $\epsilon$.

2. Calculate the gradient $g_k$ and set the Newton direction vector $d_k=-g_k$.

3. Minimize $f(x_k+ad_k)$ with respect to $\alpha$ to compute the increment $\alpha_k$.

4. Set the next point $x_{k+1}=x_k+a_kd_k$ and compute the next function value $f_{k+1}=f(x_{k+1})$.

5. If $||a_kd_k||<\varepsilon$, then,

a. Done. $x^*=x_{k+1}$ and $f(x^*)=f_{k+1}$.

b. Else, repeat step 2.

In step 3, $a_k$ can be computed by using a line search using a univariate optimization alogorithim. Alternatively, $a_k$ can be computed analitically.

The following code minimizes the given function using the first-order steepest-descent method.

$f(x_1,x_2,x_3,x_4)=(x_1-4x_2)^4+12(x_3-x_4)^4+3(x_2-10x_3)^2+55(x_1-2x_4)^2$
				
// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double result = pow(x1- 4 * x2, 4),
result += 12 * pow(x3 - 4 * x2, 4),
result += 3 * pow(x2 -10 * x3, 2),
result += 55 * pow(x1 - 2 * x4, 2),
return result
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double[] result = new double[4],
result[0] = 4 * pow(x1 - 4 * x2, 3) + 110 * (x1 - 2 * x4),
result[1] = -16 * pow(x1 - 4 * x2, 3) + 6 * (x2 - 10 * x3),
result[2] = 48 * pow(x3 - x4, 3) + 60 * (x2 - 10 * x3),
result[3] = -48 * pow(x3 - x4, 3) - 220 * (x1 - 2 * x4),
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// construct an optimization problem
val problem: C2OptimProblem = new C2OptimProblemImpl(f, g); // only gradient information
val solver : SteepestDescentMinimizer firstOrderMinimizer = new FirstOrderMinimizer(
1e-8,
40000)
val soln: firstOrderMinimizer.solve(problem)

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
val fmin= f.evaluate(xmin)
println(String.format("f(%s) = %f", xmin.toString(), fmin))



The output is:

				
f([0.046380, 0.0016330, 0.001634, 0.023189]) = 0.000003



It takes about 40,000 iterations, which is very slow.

## Newton-Rapshon Method

The second order methods use a nonsingular Hessian function in additionto the gradient to determine the vector $d_k$ and the the increment $a_k$ in the steepest-descent methods. One such method is the Newton-Rhapson method. The newton-Rhapson method expands the Taylor series to the second order. Let $\delta$ be the change in $x$, so we have:
$\displaystyle f(x+\delta)\approx fx)+\sum_{j=1}^{n}\frac{\partial f}{\partial x_i}\delta_i+\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{\partial^2 f}{\partial x_i \partial x_j}\delta_i\delta_j$
Now diffrentiating the above equation w.r.t. $\delta_k$ to minimize $f(x+\delta)$,we have
$\displaystyle \frac{\partial f}{\partial x_k}+\sum_{i=1}^{n}\frac{\partial^2 f}{\partial x_i \partial _j}\delta_i=0,\,\,\, for\, k=1,2,...,n$
Now rewritin the above equation in matrix form, we have
$g=-H\delta$
The optimal change in x is given by
$\delta=-H^{-1}g$
Now to compute the maximum reduction in $f(x)$ along the direction, a line search is used. The next point $x_{k+1}$ is:
$x_{k+1}=x_k+\delta_k=x_k+\alpha_k d_k$
where, $d_k=-H^{-1}_k g_k$ and $\alpha_k$ is the $\alpha$ that minimizes $f(x_k+\alpha d_k)$.
The Newton-Rhapson method compared with the first-order methods, uses the Hessian information to speed up the convergence, especially when it is close to the solution. The following code minimizes the same function using the Newton-Rhapson method.

				
// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double result = pow(x1 - 4 * x2, 4),
result += 12 * pow(x3 - 4 * x2, 4),
result += 3 * pow(x2 -10 * x3, 2),
result += 55 * pow(x1 - 2 * x4, 2),

return result
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double[] result = new double[4],
result[0] = 4 * pow(x1 - 4 * x2, 3) + 110 * (x1 - 2 * x4),
result[1] = -16 * pow(x1 - 4 * x2, 3) + 6 * (x2 - 10 * x3),
result[2] = 48 * pow(x3 - x4, 3) + 60 * (x2 - 10 * x3),
result[3] = -48 * pow(x3 - x4, 3) - 220 * (x1 - 2 * x4),
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// construct an optimization problem
val problem: C2OptimProblem = new C2OptimProblemImpl(f, g) // uses numerical Hessian
val solver : SteepestDescentMinimizer NewtonRhapsonMinimizer = new NewtonRhapsonMinimizer(
1e-8,
20)
val soln: NewtonRhapsonMinimizer.solve(problem)

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
val fmin= f.evaluate(xmin)
println(String.format("f(%s) = %f", xmin.toString(), fmin))



The output is:

				
f([0.000134, -0.000009, 0.000067]) = 0.000000



The Newton-Rhapson method converges much faster than the first-order method. It only uses 20 iterations with much better accuracy compared to the 40000 iterations in the first-order method.

## Gauss-Newton Method

The Gauss-Newton minimizes an objective function in the following form,

$f=[f_1(x)\, f_2(x)\, .....\, f_m(x)]^T$

The solution of the above equation is a point $x$ such that all $f_p(x)$ are reduced to zero simultaneously. We can construct a real-valued function $F [/latex such that F$ is minimized, so are functions $f_p(x)$ in the least-square sense such that,

$\displaystyle F=\sum_{p=1}^{m}f_p(x)^2=ff^TF$

Now differentiating the above equation w.r.t., $x_i$, we have,

$\displaystyle \frac{\partial F}{\partial x_i}=\sum_{p=1}^{m}2f_p(x)\frac{\partial f_p}{\partial x_i}$

Otherwise, in the matrix form, it can be given as, $g_F=2J^Tf$

Now differentiating again, we have

$\displaystyle \frac{\partial^2 F}{\partial x_i \partial x_j}\approx 2\sum_{p=1}^{m}\frac{\partial f_p}{\partial x_i}\frac{\partial f_p}{\partial x_j}$

The Hessian $H_F$ needs to be nonsingular and positive definite and is given as $F_F\approx 2J^TJ$

The next point is given by the recursive relation:
$x_{k+1}=x_k-\alpha_k(2J^TJ)^{-1}(2J^Tf)$.

The following code solves the same minimization problem using the Gauss-Newton method.

				
// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

double[] fx= new double[4],
fx[0] = pow(x1 - 4 * x2, 2),
fx[1] = sqrt(12) * pow(x3 - x4, 2),
fx[3] = sqrt(3) * (x2 - 10 * x3),
fx[4] = sqrt(55) * (x1 - 2 * x4),
return new DenseVector (fx)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}

// the Jacobian
val J: RntoMatrix J = object : new RntoMatrix() {
override Matrix evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),

Matrix Jx = new DenseMatrix(4, 4),

double value = 2 * (x1 - 4 * x2),
Jx.set(1, 1, value),

value = -8 * (x1 - 4 * x2),
Jx.set(1, 2, value),

value = 2 * sqrt(12) * (x3 - x4),
Jx.set(2, 3, value),
Jx.set(2, 4, -value),

Jx.set(3, 2, sqrt(3)),
Jx.set(3, 3, -10 * sqrt(3)),

Jx.set(4, 1, sqrt(55)),
Jx.set(4, 4, -2 * sqrt(55)),

return Jx

}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}

val solver : GaussNewtonMinimizer optim1 = new GaussNewtonMinimizer(
1e-8,
10)
val soln: optim1.solve(f, J) //analytical gradient

val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
println(String.format("f(%s) = %s", xmin.toString(), f.evaluate(xmin).toString()))



The output is:

				
f([0.000007, -0.000000, -0.000000, 0.000003]) = [0.000000, 0.000000, -0.000000, -0.000000]



The Gauss-Newton method converges much faster than the Newton-Rhapson method. It only uses 10 iterations with much better accuracy compared to the 40000 iterations in the first-order method and 20 iterations used by the Newton-Rhapson Method.