Word2Vec Exploration Tool

A simple tool to query vectorized text corpora

For the two terms entered, calculate distance, similarity and top 30 most similar tokens.

This tool does:

  • Compute the distance between two tokens.
  • Compute the similarity of two tokens.
  • Fetch the top 30 most similar tokens.
  • Queries are performed using pymagnitude

Select from those corpora:

  • Common Crawl, 600B tokens (for details check fasttext).
  • English Wikipedia 2017, 16B tokens (for details check fasttext).
  • Google News, 100B tokens (for details check Google word2vec).
  • New York Times Article Snippets 2000 - 2019, 79M tokens (extracted myself, trained with gensim).