spaCy

spaCy is an open-source library for advanced natural language processing (NLP) in Python. It is designed specifically for production use and emphasizes simplicity and efficiency. spaCy provides pre-trained models for various NLP tasks, including part-of-speech tagging, named entity recognition, sentence segmentation, and more.

Key Features and Components of spaCy:


import spacy

# Load the English language model
nlp = spacy.load("en_core_web_sm")

# Tokenization
text = "spaCy is a powerful NLP library."
doc = nlp(text)

# Part-of-Speech Tagging
for token in doc:
    print(token.text, token.pos_)

# Named Entity Recognition (NER)
for ent in doc.ents:
    print(ent.text, ent.label_)

# Dependency Parsing
for token in doc:
    print(token.text, token.dep_, token.head.text, token.head.pos_)

# Sentence Segmentation
for sentence in doc.sents:
    print(sentence.text)

# Word Embeddings
word = nlp("apple")
print(word.vector)

To use spaCy, you'll need to install it first using:


pip install spacy

After installation, you'll also need to download the language model you want to use. For example, to download the English language model:


python -m spacy download en_core_web_sm

The above code downloads the small English model (en_core_web_sm). Depending on your use case, you might need a larger model. You can find more information in the spaCy documentation.