Word2Vec Example

This is a simple example of Word2Vec using Python and the Gensim library.

Word2Vec Overview

Word2Vec is a popular technique for learning word embeddings, which represent words as dense vectors in a continuous vector space. Word embeddings capture semantic relationships between words, making them useful for various natural language processing (NLP) tasks. Word2Vec models are trained on large text corpora and learn to predict the context (surrounding words) of a target word or predict a target word given its context.

Key concepts of Word2Vec:

Word2Vec embeddings are commonly used in NLP applications such as sentiment analysis, named entity recognition, and machine translation.

Python Source Code:

# Import necessary libraries
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')

# Example sentences for training Word2Vec model
sentences = [
    "Word embeddings are dense vector representations of words.",
    "They capture semantic relationships between words.",
    "Word2Vec is a popular technique for learning word embeddings.",
    "Natural language processing tasks often benefit from using word embeddings."
]

# Tokenize the sentences
tokenized_sentences = [word_tokenize(sentence.lower()) for sentence in sentences]

# Train the Word2Vec model
word2vec_model = Word2Vec(sentences=tokenized_sentences, vector_size=100, window=5, min_count=1, workers=4)

# Retrieve the vector for a specific word
vector_for_word = word2vec_model.wv['word']

# Find similar words
similar_words = word2vec_model.wv.most_similar('word', topn=3)

# Print results
print("Vector for 'word':", vector_for_word)
print("\nSimilar words to 'word':", similar_words)

Explanation: