V Rahul
By V Rahul

Useful Python libraries For Natural Processing

Useful Python libraries For Natural Processing

Useful Python libraries For Natural Processing

1. CoreNLP

The CoreNLP library — a product of Stanford University — was built to be a production-ready natural language processing solution, capable of delivering NLP predictions and analyses at scale. CoreNLP is written in Java, but multiple Python packages and APIs are available for it, including a native Python NLP library called StanfordNLP.

CoreNLP includes a broad range of language tools—grammar tagging, named entity recognition, parsing, sentiment analysis, and plenty more. It was designed to be human language agnostic, and currently supports Arabic, Chinese, French, German, and Spanish in addition to English (with Russian, Swedish, and Danish support available from third parties). CoreNLP also includes a web API server, a convenient way to serve predictions without too much additional work.

2. Genism

Gensim does just two things, but does them exceedingly well. Its focus is statistical semantics—analyzing documents for their structure, then scoring other documents based on their similarity.

Gensim can work with very large bodies of text by streaming documents to its analysis engine and performing unsupervised learning on them incrementally. It can create multiple types of models, each suited to different scenarios: Word2Vec, Doc2Vec, FastText, and Latent Dirichlet Allocation.


The Natural Language Toolkit, or NLTK for short, is among the best-known and most powerful of the Python natural language processing libraries.Many corpora (data sets) and trained models are available to use with NLTK out of the box, so you can start experimenting with NLTK right away.

As the documentation states, NLTK provides a wide variety of tools for working with text: “classification, tokenization, stemming, tagging, parsing, and semantic reasoning.” It can also work with some third-party tools to enhance its functionality.

4. Pattern

If all you need to do is scrape a popular website and analyze what you find, reach for Pattern. This natural language processing library is far smaller and narrower than other libraries covered here, but that also means it’s focused on doing one common job really well.

Pattern comes with built-ins for scraping a number of popular web services and sources (Google, Wikipedia, Twitter, Facebook, generic RSS, etc.), all of which are available as Python modules (e.g., from pattern.web import Twitter). You don’t have to reinvent the wheels for getting data from those sites, with all of their individual quirks. You can then perform a variety of common NLP operations on the data, such as sentiment analysis.


PyNLPI (pronounced “pineapple”) has only a basic roster of natural language processing functions, but it has some truly useful data-conversion and data-processsing features for NLP data formats.

Most of the NLP functions in PyNLPI are for basic jobs like tokenization or n-gram extraction, along with some statistical functions useful in NLP like Levenshtein distance between strings or Markov chains. Those functions are implemented in pure Python for convenience, so they’re unlikely to have production-level performance.

Thanks for reading, cheers!