Latent Semantic Analysis

A bag of word as matrix representation, where each row is 1 document and each column is the nb of times a word appears
SVD allows for a low-dimensional representation of the documents as well as the word vectors. From there we can
See how documents are “close” to each other (typically by cosine similarity), and cluster them, using k-means for example
Compare terms, find relations between terms (synonymy and polysemy).
Recommend documents based on a query: view the query as a mini document, and compare it to the other documents in the low-dimensional space.

StrixTheKiet Notes