Unlock the Power of Words: Beginner’s Guide to Natural Language Processing using Python
Natural Language Processing sits at the intersection of computer science, linguistics, and artificial intelligence. It involves understanding, interpreting, and generating human language through algorithms and computational methods.
Creating a Natural Language Processing proof of concept can be an exciting project! To get started with NLP in Python, you don’t need a degree in Mathematics or Data Science. The availability of pre-trained models and frameworks like Hugging Face’s Transformers library allows even those with a modest background in the field to implement state-of-the-art Natural Language Processing solutions without starting from scratch.
In this walkthrough, I’ll use transformers by Hugging Face, which provides an easy-to-use interface to interact with such models. I will show you how to do Natural Language Processing with max 10 lines of code!
What’s really cool is that you don’t need to be a language expert or a seasoned AI researcher to use these tools. If you know some Python — even just the basics — you can start experimenting with these powerful models. It’s like having access to language superpowers with just a few lines of code!
A couple of processing methods we will see are Name Entity Recognition, sentimental analysis, and summarization.
What is Hugging Face and Transformers Library ?
Imagine you’re in a vast library filled with every book ever written, and you’re tasked with summarizing, translating, or even continuing the stories where they left off. Now, picture Hugging Face as your ultra-intelligent librarian who can help you with these tasks using just a few whispers (or lines of code).
Hugging Face is a company that’s created something called the Transformers library, which is like a treasure trove of magical tools for Natural Language Processing (NLP). More information here https://huggingface.co/
The Transformers library, despite its name, has nothing to do with Optimus Prime and his crew. Instead, it’s packed with pre-trained models — think of them as well-read robots that have already digested a vast amount of text and learned from it. These models can perform a wide range of language tasks, from summarizing articles to generating entirely new content that sounds convincingly human.
For each scenario, the Python code will be at most 6 to 8 lines long! I will guide you through three different NLP examples using Python.
Pre-requisite: Python 3.8 or better
Step 1 — Install Transformers Library
Install transformers by running the following command
pip install transformers
More about Transformers library Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. The most basic object in the 🤗 Transformers library is the pipeline() function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer. The pipeline function abstracts away much of the complexity involved in loading models, processing input, and generating output.
Step 2 : Import the Library and write code in editor of your choice
You can use any text editor or IDE of your choice. Create a new Python file (e.g., ner_demo.py) and open it in your editor.
Here, with pipeline function ’ner’ as the task, and aggregation strategy as simple, we are asking pretrained model to find the person, organization and location in a freeform text.
When you input a sentence or paragraph into an NER pipeline, the model processes the text and returns a list of identified entities along with their corresponding labels (e.g., person, organization, location) and positions in the text along with a matching score. Higher score indicates a better match.
Named Entity Recognition(NER) Example :
# Name Entity Recognition Demo - ner_demo.py
from transformers import pipeline
# Initialize the Named Entity Recognition
generator = pipeline('ner', aggregation_strategy='simple')
# Your text prompt
prompt = "My name is Nilesh and I worked at UST Global in Richmond"
# Generate text
output = generator(prompt)
# here output returned is a dictionary object
# Print the generated text
for entity in output:
print(f"Entity: {entity['entity_group']}, Word : {entity['word'].ljust(25)}, \
Score: {str(round(entity['score']*100,2)).ljust(5)}% , \
Start: {entity['start']}, End: {entity['end']} ")
Output of Named Entity Recognition program
Named entity recognition capability is valuable for various applications, such as content classification, information retrieval, and enhancing search algorithms, by providing a structured understanding of the content within unstructured text data.
Recommended by LinkedIn
Real life usages of Named Entity Recognition(NER):
Sentimental Analysis : Similarly , use the sentiment-analysis pipeline to analyze the sentiment of the input text.
Sentimental Analysis Example
# sentimental_demo.py
from transformers import pipeline
# Load the sentiment-analysis pipeline
sentiment_analysis = pipeline("sentiment-analysis","distilbert/distilbert-base-uncased-finetuned-sst-2-english")
prompt=["I am very disappointed by your behavior","I like medium.com articles"]
# Perform sentiment analysis
result = sentiment_analysis(prompt)
index=0
for sentiment in result:
print (f"Text is '{prompt[index]}' , Tone is {sentiment['label']},\
Score is : {round(sentiment['score']*100,2)}%")
index=index+1
Output of Sentimental Analysis program
Practical Uses of Sentimental Analysis :
Summarization
Summarization feature is like a literary chef, expertly condensing lengthy texts into bite-sized, flavorful summaries. With just a few lines of Python, it serves up the essence of articles, papers, or any long documents, saving you hours of reading. It’s your go-to for quick insights and key points, all thanks to the magic of AI. Review the example below
# Summarization Example
from transformers import pipeline
# Initialize the text-generation pipeline
generator = pipeline('summarization')
# Your text prompt
prompt = '''The Republican National Committee began laying off dozens of staffers on Monday, days after Donald Trump’s handpicked team took the reins of the organization, according to two Republican operatives with knowledge of the dismissals.
The layoffs affect staffers across multiple departments, the sources said. The cuts also go beyond senior staff to vendors and mid-level employees, one of the Republican operatives said. Vendor contracts will likely be cut as well.
Some staff who were asked to resign could reapply for jobs at the organization.
It’s not unusual for there to be staff turnover at a national committee after that committee’s party has a de facto or official presidential nominee, but the depths of these cuts go beyond the norm and underscore the lackluster fundraising the committee has experienced lately.
Politico first reported on the layoffs.
Top officials in communications, the political department and the data team were laid off, according to multiple Republicans with knowledge of those layoffs.
'''
# Generate text
output = generator(prompt)
# Print the generated text
for element in output:
print(f"Output is {element['summary_text']}")
Output of above program
Usage of Summarization
Note : When you run above Python Program it will download the required library and data, additionally it will run with defaults, so you would also see few messages about using defaults. One of the major download will be safetensors.
What is SafeTensors ?
SafeTensors is a modern solution for securely storing and easily accessing the numerical data used in AI and ML models, similar to tensors. Think of it as a digital photo frame for your data. Unlike traditional methods like ‘pickle’ in Python, which can be slow and risky for data, SafeTensors allows for quick, direct access to your data without compromising its integrity, thanks to its “zero-copy” approach. It’s a smarter, safer way to handle important AI data, making it as convenient as flipping through photos on a digital frame.
Summary
In wrapping up this tutorial on Natural Language Processing using the Hugging Face Transformers library and Python, it’s worth emphasizing the array of pipelines at your disposal. We have barely scratched the surface using the Named Entity Recognition , Sentimental Analysis and Summarization. And, there are many out there, the power and efficiency of these tools may surprise you, enabling significant achievements with minimal coding effort. Please note that the scope of your projects will inherently be constrained by smaller datasets and limited computational resources. While crafting the next ChatGPT might be beyond reach without access to vast datasets and substantial computing infrastructure, embarking on this journey with the knowledge and skills you’ve gained sets a solid foundation for your path forward in the world of NLP.
Reference :
Assoc Director at NightWing | Hands-On AWS Systems Engineer and Cloud Architect | U.S. Navy Veteran | Cyber Security Certified | Ex-Raytheon | Active Clearance
1moThis is very cool Nilesh Trivedi using voice with python!
Lawyer + Advisor in star Health Allied Health Insurance Company.
1moMahadev bhai majama