StrixTheKiet Notes

Search

❯

❯

ArtificialIntelligence

❯

❯

❯

Neural Network

Jun 01, 20252 min read

Description:

Made up of nodes or units, connected by links
Each link has an associated weight and activation level
Each node has an input function (typically summing over weighted inputs) or an activation function and an output
X to Y can be non-linear, and can still use Stochastic gradient descent

Activation functions:

- ex: binary step activation fn
There are more functions

Multi-layer neural network:

Theorem 9 in CIML:

Two-Layer Networks are Universal Function Approximator
- Approximation error is 0
- A two-layer neural network can approximate any continuous function, given enough neurons in the hidden layer.

Expressiveness of NN:

Deeper layers of a trained network learn more and more complex functions

Compositionality via mathematics:

Given a library of simple functions, ex: $sin, cos, lo g, exp$
If each node is one of the functions, next layer’s node can be:
- Linear combination: $f (x) = \sum_{i} a_{i} g_{i} (x)$
- Composition: $f (x) = g_{1} (g_{2} ((... g_{n} (x) ...)))$
  - for deep learning, Hierarchical Compositionality is used
    - vision: pixels → edge → texton → motif → part → object
    - speech: sample → spectral → formant → motif → phone → word
    - NLP: character → word → NP/VP/.. → clause → sentense → story

SDG:

If the objective of optimization is convex, it will reach the bottom
If not, SDG still perform well
But we will need to have the loss function

Training a neural network:

The backpropagation algorithm = gradient descent + Chain rule
To search for set of weight values that minimize the total error of the network over the set of training examples
Repeated procedures of the following 2 passes:
- forward pass: compute the outputs of all units in the network, and the error of the output layers
- backward pass: the network error is used for updating the weights
  - Starting at the output layer, the error is propagated backwards through the network, layer by layer.
  - This is done by recursively computing the local gradient of each neuron.
Gradient of objective w.r.t. output layer weights $v$
- $\nabla_{v_{i}} L = E [(Y - \sum_{i = 1}^{m} v_{i} ϕ (w_{i}^{t} X)) ϕ (w_{i}^{t} X)]$
Gradient of objective w.r.t. hidden unit weights:
- $\nabla_{L} w_{1} = (\frac{d L}{d ϕ _{1}}) \nabla_{w_{1}} ϕ_{1}$
- $\frac{d L}{d ϕ _{1}} = - E [(Y - i = 1 \sum m v_{i} ϕ (w_{i}^{t} X)) v_{1}]$
- $\nabla_{w_{1}} = E [ϕ^{'} (w_{1}^{t} X) X]$

Graph View

Description:
Activation functions:
Multi-layer neural network:
Theorem 9 in CIML:
Expressiveness of NN:
Compositionality via mathematics:
SDG:
Training a neural network:

Backlinks

Machine Learning

Created with strixthekiet

GitHub
Email