Least Square Data Fitting

Definition:

A scalar $y$ and an $n$ -vector $x$ are related by model, $y \approx f (x)$ which is $f : R^{n} \to R$
- $x$ is independent variable, the feature vector
- $y$ is the outcome variable, sth we want to predict
$x^{(i)}$ , $y^{(i)}$ is the $i$ th data pair
- $x_{j}^{(i)}$ isthe jth component of ith data point $x^{(i)}$
Choose model $\hat{f} : R^{n} \to R$ are basis functions that we choose
- $θ_{i}$ are model parameters we can choose
- $\hat{f} (x) = θ_{1} f_{1} (x) + ... + θ_{p} f_{p} (x)$
$\overset{y}{^}^{(i)} = \hat{f} (x^{(i)})$ is the model prediction of $y^{(i)}$
Define:
- $y^{d} = (y^{(1)}, ..., y^{(N)})$ as vectors of outcomes
- $\overset{y}{^}^{d} = (\overset{y}{^}^{(1)}, ..., \overset{y}{^}^{(N)})$ as vectors of predictions
- $r^{d} = (r^{(1)}, ..., r^{(N)})$ as vectors of residuals
  - for $r_{i} = y^{(i)} - \overset{y}{^}^{(i)}$
  - $rms = (\frac{( r ^{(1)} ) ^{2} + ... + ( r ^{(N)} ) ^{2}}{N})^{1/2}$
Define a matrix mapping $A \in R^{N \times p}$ with $A_{ij} = f_{j} (x^{(i)})$ so $\overset{y}{^}^{d} = A θ$
Then we choose $θ$ to minize $∣∣ r^{d} ∣ ∣^{2} = ∣∣ y^{d} - \overset{y}{^}^{d} ∣ ∣^{2} = ∣∣ y^{d} - A θ ∣ ∣^{2} = ∣∣ A θ - y^{d} ∣ ∣^{2}$
Similar to ordinary least square, $\hat{θ} = (A^{⊺} A)^{- 1} A^{⊺} y$ (if columns of $A$ are linearly independent)
$∣∣ A \hat{θ} - y ∣ ∣^{2} / N$ is the minimum mean-square error
For the weights of $θ$ , we define the first weight $f_{1} (x) = 1$ as other weight are relative to it
The function $\overset{y}{^} (x) = a x + b$ but we can add another last element in $a$ to incorporate the value of $b$ , so we have $\overset{y}{^} (x) = A x$

For $p = 1$ :

Then $f_{(} x) = 1$ so the model $\hat{f} (x) = θ_{1}$ is a constant

For $p = 2$ :

$f_{1} (x) = 1, f_{2} (x) = x$
Model has form $\hat{f} (x) = θ_{1} + θ_{2} x$
Matrix $A$ has form: $A = 11.1 x^{(1}) x^{(2}) . x^{(N}) = [1 x^{d}]$
Then $\hat{θ}_{1}, \hat{θ}_{2}$ can be found with $\hat{f} (x) = a vg (y^{d}) + ρ \frac{s t d ( y ^{d} )}{s t d ( x ^{d} )} (x - a vg (x^{d}))$

For $p =$ polynomial:

$f_{i} (x) = x^{i - 1}, i = 1, ..., p$
Model is $p - 1$ degree $\hat{f} (x) = θ_{1} + θ_{2} x + ... + θ_{p} x^{p - 1}$ , here $x$ is powered, not the element
$A$ is then Vandermonde Matrix

StrixTheKiet Notes

Explorer

Least Square Data Fitting

Definition:

For $p = 1$ :

For $p = 2$ :

For $p =$ polynomial:

Least Square Classification

Multi-objective Least Square

Graph View

Table of Contents

Backlinks

StrixTheKiet Notes

Explorer

Least Square Data Fitting

Definition:

For p=1:

For p=2:

For p= polynomial:

Least Square Classification

Multi-objective Least Square

Graph View

Table of Contents

Backlinks

For $p = 1$ :

For $p = 2$ :

For $p =$ polynomial: