Description:
- A bag of word as matrix representation, where each row is 1 document and each column is the nb of times a word appears
- SVD allows for a low-dimensional representation of the documents as well as the word vectors. From there we can
- See how documents are “close” to each other (typically by cosine similarity), and cluster them, using k-means for example
- Compare terms, find relations between terms (synonymy and polysemy).
- Recommend documents based on a query: view the query as a mini document, and compare it to the other documents in the low-dimensional space.
two