There are several minimization methods available. The best choice of algorithm depends on the problem. All of the algorithms use the value of the function and its gradient at each evaluation point, except for the Simplex algorithm which uses function values only.
This is the Fletcher-Reeves conjugate gradient algorithm. The conjugate gradient algorithm proceeds as a succession of line minimizations. The sequence of search directions is used to build up an approximation to the curvature of the function in the neighborhood of the minimum.
An initial search direction p is chosen using the gradient, and line minimization is carried out in that direction. The accuracy of the line minimization is specified by the parameter tol. The minimum along this line occurs when the function gradient g and the search direction p are orthogonal. The line minimization terminates when dot(p,g) < tol |p| |g|. The search direction is updated using the Fletcher-Reeves formula p' = g' - \beta g where \beta=-|g'|^2/|g|^2, and the line minimization is then repeated for the new search direction.
This is the Polak-Ribiere conjugate gradient algorithm. It is similar to the Fletcher-Reeves method, differing only in the choice of the coefficient \beta. Both methods work well when the evaluation point is close enough to the minimum of the objective function that it is well approximated by a quadratic hypersurface.
These methods use the vector Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. This is a quasi-Newton method which builds up an approximation to the second derivatives of the function f using the difference between successive gradient vectors. By combining the first and second derivatives the algorithm is able to take Newton-type steps towards the function minimum, assuming quadratic behavior in that region.
bfgs2version of this minimizer is the most efficient version available, and is a faithful implementation of the line minimization scheme described in Fletcher's Practical Methods of Optimization, Algorithms 2.6.2 and 2.6.4. It supercedes the original
bfgsroutine and requires substantially fewer function and gradient evaluations. The user-supplied tolerance tol corresponds to the parameter \sigma used by Fletcher. A value of 0.1 is recommended for typical use (larger values correspond to less accurate line searches).
The steepest descent algorithm follows the downhill gradient of the function at each step. When a downhill step is successful the step-size is increased by a factor of two. If the downhill step leads to a higher function value then the algorithm backtracks and the step size is decreased using the parameter tol. A suitable value of tol for most applications is 0.1. The steepest descent method is inefficient and is included only for demonstration purposes.
p_0 = (x_0, x_1, ... , x_n) p_1 = (x_0 + step_size_0, x_1, ... , x_n) p_2 = (x_0, x_1 + step_size_1, ... , x_n) ... = ... p_n = (x_0, x_1, ... , x_n+step_size_n)
These vectors form the n+1 vertices of a simplex in n dimensions. On each iteration the algorithm tries to improve the parameter vector p_i corresponding to the highest function value by simple geometrical transformations. These are reflection, reflection followed by expansion, contraction and multiple contraction. Using these transformations the simplex moves through the parameter space towards the minimum, where it contracts itself.
After each iteration, the best vertex is returned. Note, that due to the nature of the algorithm not every step improves the current best parameter vector. Usually several iterations are required.
The routine calculates the minimizer specific characteristic size as the average distance from the geometrical center of the simplex to all its vertices. This size can be used as a stopping criteria, as the simplex contracts itself near the minimum. The size is returned by the function