Steepest-descent method algorithms are family of algorithms that use the gradient and Hessian information to search in a direction that gives the biggest reduction in the function value. The gradients is denoted by g and Hessia is denoted by H. We will first discuss the first-order methods in which only the gradient is used.

Now we consider the optimization problem

Taylor expansion of the above gives

For any vector is maximum when , which implies that is in the same direction as g. is maximum when , which implies that is in the opposite direction of g or simply we can say that it is in the direction of -g. They both are defined as steepest-ascent and steepest-descent directions respectively.

The general framework of a steepest-descent algorithim is as follows:

1. Specify the initial point and the tolerance .

2. Calculate the gradient and set the Newton direction vector .

3. Minimize with respect to to compute the increment .

4. Set the next point and compute the next function value .

5. If , then,

a. Done. and .

b. Else, repeat step 2.

In step 3, can be computed by using a line search using a univariate optimization alogorithim. Alternatively, can be computed analitically.

The following code minimizes the given function using the first-order steepest-descent method.

` ````
```// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double result = pow(x1- 4 * x2, 4),
result += 12 * pow(x3 - 4 * x2, 4),
result += 3 * pow(x2 -10 * x3, 2),
result += 55 * pow(x1 - 2 * x4, 2),
return result
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}
// the gradient function
val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double[] result = new double[4],
result[0] = 4 * pow(x1 - 4 * x2, 3) + 110 * (x1 - 2 * x4),
result[1] = -16 * pow(x1 - 4 * x2, 3) + 6 * (x2 - 10 * x3),
result[2] = 48 * pow(x3 - x4, 3) + 60 * (x2 - 10 * x3),
result[3] = -48 * pow(x3 - x4, 3) - 220 * (x1 - 2 * x4),
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}
// construct an optimization problem
val problem: C2OptimProblem = new C2OptimProblemImpl(f, g); // only gradient information
val solver : SteepestDescentMinimizer firstOrderMinimizer = new FirstOrderMinimizer(
1e-8,
40000)
val soln: firstOrderMinimizer.solve(problem)
val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
val fmin= f.evaluate(xmin)
println(String.format("f(%s) = %f", xmin.toString(), fmin))

The output is:

` ````
```f([0.046380, 0.0016330, 0.001634, 0.023189]) = 0.000003

It takes about 40,000 iterations, which is very slow.

## Newton-Rapshon Method

The optimal change in x is given by

where, and is the that minimizes .

` ````
```// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double result = pow(x1 - 4 * x2, 4),
result += 12 * pow(x3 - 4 * x2, 4),
result += 3 * pow(x2 -10 * x3, 2),
result += 55 * pow(x1 - 2 * x4, 2),
return result
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}
// the gradient function
val g: RealVectorFunction g = object : new RealVectorFunction() {
override fun evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double[] result = new double[4],
result[0] = 4 * pow(x1 - 4 * x2, 3) + 110 * (x1 - 2 * x4),
result[1] = -16 * pow(x1 - 4 * x2, 3) + 6 * (x2 - 10 * x3),
result[2] = 48 * pow(x3 - x4, 3) + 60 * (x2 - 10 * x3),
result[3] = -48 * pow(x3 - x4, 3) - 220 * (x1 - 2 * x4),
return new DenseVector (result)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}
// construct an optimization problem
val problem: C2OptimProblem = new C2OptimProblemImpl(f, g) // uses numerical Hessian
val solver : SteepestDescentMinimizer NewtonRhapsonMinimizer = new NewtonRhapsonMinimizer(
1e-8,
20)
val soln: NewtonRhapsonMinimizer.solve(problem)
val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
val fmin= f.evaluate(xmin)
println(String.format("f(%s) = %f", xmin.toString(), fmin))

The output is:

` ````
```f([0.000134, -0.000009, 0.000067]) = 0.000000

The Newton-Rhapson method converges much faster than the first-order method. It only uses 20 iterations with much better accuracy compared to the 40000 iterations in the first-order method.

## Gauss-Newton Method

The Gauss-Newton minimizes an objective function in the following form,

The solution of the above equation is a point such that all are reduced to zero simultaneously. We can construct a real-valued function is minimized, so are functions in the least-square sense such that,

Now differentiating the above equation w.r.t., , we have,

Otherwise, in the matrix form, it can be given as,

Now differentiating again, we have

The Hessian needs to be nonsingular and positive definite and is given as .

The next point is given by the recursive relation:

.

The following code solves the same minimization problem using the Gauss-Newton method.

` ````
```// the objective function
// the global minimizer is at x = (0,0,0,0)
val f: RealScalarFunction f= object : new RealScalarFunction() {
override fun Double evaluate(Vector x) {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
double[] fx= new double[4],
fx[0] = pow(x1 - 4 * x2, 2),
fx[1] = sqrt(12) * pow(x3 - x4, 2),
fx[3] = sqrt(3) * (x2 - 10 * x3),
fx[4] = sqrt(55) * (x1 - 2 * x4),
return new DenseVector (fx)
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 4
}
}
// the Jacobian
val J: RntoMatrix J = object : new RntoMatrix() {
override Matrix evaluate(Vector x): {
double x1 = x.get(1),
double x2 = x.get(2),
double x3 = x.get(3),
double x4 = x.get(4),
Matrix Jx = new DenseMatrix(4, 4),
double value = 2 * (x1 - 4 * x2),
Jx.set(1, 1, value),
value = -8 * (x1 - 4 * x2),
Jx.set(1, 2, value),
value = 2 * sqrt(12) * (x3 - x4),
Jx.set(2, 3, value),
Jx.set(2, 4, -value),
Jx.set(3, 2, sqrt(3)),
Jx.set(3, 3, -10 * sqrt(3)),
Jx.set(4, 1, sqrt(55)),
Jx.set(4, 4, -2 * sqrt(55)),
return Jx
}
override public int dimensionofDomain() {
return 4
}
override public int dimensionofRange() {
return 1
}
}
val solver : GaussNewtonMinimizer optim1 = new GaussNewtonMinimizer(
1e-8,
10)
val soln: optim1.solve(f, J) //analytical gradient
val xmin: soln.search(new DenseVector(new double [] {1, -1, -1, 1}))
println(String.format("f(%s) = %s", xmin.toString(), f.evaluate(xmin).toString()))

The output is:

` ````
```f([0.000007, -0.000000, -0.000000, 0.000003]) = [0.000000, 0.000000, -0.000000, -0.000000]

The Gauss-Newton method converges much faster than the Newton-Rhapson method. It only uses 10 iterations with much better accuracy compared to the 40000 iterations in the first-order method and 20 iterations used by the Newton-Rhapson Method.