Definition:

  • SVD is mainly used to find Dyads in low rank matrix for approximate data to a lower-rank matrix
  • From the SVD result, a matrix with rank if it can be written as , with and , the matrix can be expressed as a sum of dyads,
  • Geometric intepretation:
    • If a matrix can be written as a single dyad: , this mean that every column (and row) is proportional to a single vector for
      • where and
      • and is the -th unit vector in
    • If each column represents a data point, all these points are on a line going through 0 with
  • Sum of Dyads:
    • If can be expressed as a sum of dyads: then each column is a linear combination of vectors:
    • If each column represents a data points, all these pints are on a subspace, . In practice, it is better if the ’s are independent, as then each dyad brings in “new information”
  • Dyads in low rank matrix

Rank-one approximation:

  • , can be used via SVD

Low-rank approximation:

  • Approximate with rank, thereby create a new dataset with information close enough to the old dataset
  • We use SVD as it provides the closest subspace in Frobenius/Euclidean norm and cheaper with independent components
  • Let be a given matrix, with .
  • Consider the problem of approximating A with a matrix of lower rank, which is subject where
  • Let
  • Then only take sum of first terms which is
  • Define ratio of retained/total infor:
  • Then the relative (squared) norm approximation error:
  • Plot differen as a function of will show which value of is good enough

Low-dimensional representation of data:

  • Given a data matrix , and the SVD-based rank- approximation to , we have where:
  • Thus: projecting data points, for any data point where:
  • Vector is a low-dimensional representation of data point . More compactly:
  • We may also project features (rows of ) using the same derivation as before, with instead of with
    • More compact:

SVD-based auto-encoder:

  • For any data point: for
  • Thus, any data point can be encoded to lower dimension
  • Given a low-dimensional representation of , we can decode, i.e. go back to the approximation , via

Power iteration:

  • Solve the problem , alternatively over
  • Each step is a simple closed-form formula
  • Convergence improved when formulated as an iteration over normalized vectors: setting :
    • and
  • Works extrememly well for large, sparse A as there are empty information easily cut