Word2Vec Exploration Tool
A simple tool to query vectorized text corpora
For the two terms entered, calculate distance, similarity and top 30 most similar tokens.
This tool does:
- Compute the distance between two tokens.
- Compute the similarity of two tokens.
- Fetch the top 30 most similar tokens.
- Queries are performed using pymagnitude
Select from those corpora:
- Common Crawl, 600B tokens (for details check fasttext).
- English Wikipedia 2017, 16B tokens (for details check fasttext).
- Google News, 100B tokens (for details check Google word2vec).
- New York Times Article Snippets 2000 - 2019, 79M tokens (extracted myself, trained with gensim).