The gradient of a function f:Rn→R at a point x where f is differentiable, denoted with ∇f(x), is a column vector of first derivatives of f with respect to x1,...,xn:
The gradient of a Vector Function can be interpreted in the context of the level and sublevel sets.
The α-level set of a function f:Rn→R is the set {x∈Rn:f(x)=α}
ie, the contours
and the α-sublevel set is {x∈Rn:f(x)≤α}
Geometrically, the gradient of f at a point x0 is a vector ∇f(x0) perpendicular to the contour line of f at level α=f(x0), pointing from x0outwards the α-sublevel set
That is, the gradient, ∇f(x0) represents the direction along which the function has the max rate of increase
Let v be a unit vector and ϵ≥0
Consider the point x=x0+ϵ.v. We have f(x0+ϵ.v)≈f(x0)+ϵ.∇f(x0)Tv for ϵ→0
equivalently, ϵ→0limϵf(x0+ϵv)−f(x0)=∇f(x0)Tv
Think of it as the gradient projecting on the vector v
Whenever ϵ>0 and v is such that ∇f(x0)Tv>0 then f is increasing along the direction v, for small ϵ??
The inner product ∇f(x0)Tv measures the rate of variation of f at x0, along direction v, and it is usually referred to as the directional derivative of f along v???
Minimization via Gradient Descent Method:
To solve an optimization problem of the form xminf(x)
where f is differentiable
Start from x0, iterate the rule xk+1=xk−α∇f(xk)