Description:

  • A bag of word as matrix representation, where each row is 1 document and each column is the nb of times a word appears
  • SVD allows for a low-dimensional representation of the documents as well as the word vectors. From there we can
  • See how documents are “close” to each other (typically by cosine similarity), and cluster them, using k-means for example
  • Compare terms, find relations between terms (synonymy and polysemy).
  • Recommend documents based on a query: view the query as a mini document, and compare it to the other documents in the low-dimensional space.

two