NLTK is a powerful Python library for working with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources, such as WordNet. NLTK also includes a variety of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, among other natural language processing (NLP) tasks.
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk import pos_tag
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk import ne_chunk
from nltk import FreqDist
# Tokenization
text = "NLTK is a powerful library for natural language processing."
words = word_tokenize(text)
sentences = sent_tokenize(text)
# Part-of-Speech Tagging
tagged_words = pos_tag(words)
# Stemming and Lemmatization
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()
stemmed_words = [stemmer.stem(word) for word in words]
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
# Named Entity Recognition
text_ner = "Barack Obama was born in Hawaii."
words_ner = word_tokenize(text_ner)
tagged_words_ner = pos_tag(words_ner)
named_entities = ne_chunk(tagged_words_ner)
# Frequency Distribution
freq_dist = FreqDist(words)
print("Tokenization:")
print("Words:", words)
print("Sentences:", sentences)
print("Part-of-Speech Tagging:")
print("Tagged Words:", tagged_words)
print("Stemming and Lemmatization:")
print("Stemmed Words:", stemmed_words)
print("Lemmatized Words:", lemmatized_words)
print("Named Entity Recognition (NER):")
print("Named Entities:", named_entities)
print("Frequency Distribution:")
print("Most Common Words:", freq_dist.most_common(5))
To use NLTK, you'll need to install it first using:
pip install nltk
After installation, you may need to download additional resources such as corpora and models. NLTK provides a convenient way to download these resources using the nltk.download