Unlock the Power of Words: Beginner’s Guide to Natural Language Processing using Python

Unlock the Power of Words: Beginner’s Guide to Natural Language Processing using Python

Natural Language Processing sits at the intersection of computer science, linguistics, and artificial intelligence. It involves understanding, interpreting, and generating human language through algorithms and computational methods.

Creating a Natural Language Processing proof of concept can be an exciting project! To get started with NLP in Python, you don’t need a degree in Mathematics or Data Science. The availability of pre-trained models and frameworks like Hugging Face’s Transformers library allows even those with a modest background in the field to implement state-of-the-art Natural Language Processing solutions without starting from scratch.

In this walkthrough, I’ll use transformers by Hugging Face, which provides an easy-to-use interface to interact with such models. I will show you how to do Natural Language Processing with max 10 lines of code!

What’s really cool is that you don’t need to be a language expert or a seasoned AI researcher to use these tools. If you know some Python — even just the basics — you can start experimenting with these powerful models. It’s like having access to language superpowers with just a few lines of code!

A couple of processing methods we will see are Name Entity Recognition, sentimental analysis, and summarization.

What is Hugging Face and Transformers Library ?

Imagine you’re in a vast library filled with every book ever written, and you’re tasked with summarizing, translating, or even continuing the stories where they left off. Now, picture Hugging Face as your ultra-intelligent librarian who can help you with these tasks using just a few whispers (or lines of code).

Hugging Face is a company that’s created something called the Transformers library, which is like a treasure trove of magical tools for Natural Language Processing (NLP). More information here https://huggingface.co/

The Transformers library, despite its name, has nothing to do with Optimus Prime and his crew. Instead, it’s packed with pre-trained models — think of them as well-read robots that have already digested a vast amount of text and learned from it. These models can perform a wide range of language tasks, from summarizing articles to generating entirely new content that sounds convincingly human.

For each scenario, the Python code will be at most 6 to 8 lines long! I will guide you through three different NLP examples using Python.

Pre-requisite: Python 3.8 or better

Step 1 — Install Transformers Library

Install transformers by running the following command

pip install transformers        

More about Transformers library Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. The most basic object in the 🤗 Transformers library is the pipeline() function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer. The pipeline function abstracts away much of the complexity involved in loading models, processing input, and generating output.

Step 2 : Import the Library and write code in editor of your choice

You can use any text editor or IDE of your choice. Create a new Python file (e.g., ner_demo.py) and open it in your editor.

Here, with pipeline function ’ner’ as the task, and aggregation strategy as simple, we are asking pretrained model to find the person, organization and location in a freeform text.

When you input a sentence or paragraph into an NER pipeline, the model processes the text and returns a list of identified entities along with their corresponding labels (e.g., person, organization, location) and positions in the text along with a matching score. Higher score indicates a better match.

Named Entity Recognition(NER) Example :

# Name Entity Recognition Demo - ner_demo.py
from transformers import pipeline

# Initialize the Named Entity Recognition
generator = pipeline('ner', aggregation_strategy='simple')

# Your text prompt
prompt = "My name is Nilesh and I worked at UST Global in Richmond"

# Generate text
output = generator(prompt)

# here output returned is a dictionary object

# Print the generated text
for entity in output:
    print(f"Entity: {entity['entity_group']}, Word :  {entity['word'].ljust(25)}, \
          Score: {str(round(entity['score']*100,2)).ljust(5)}% , \
          Start: {entity['start']}, End: {entity['end']}  ")        

Output of Named Entity Recognition program

Article content

Named entity recognition capability is valuable for various applications, such as content classification, information retrieval, and enhancing search algorithms, by providing a structured understanding of the content within unstructured text data.

Real life usages of Named Entity Recognition(NER):

  • Information Extraction: NER automates the extraction of key entities like names and locations from large texts, streamlining data gathering.
  • Content Organization: It classifies text by identifying entities, aiding in content tagging and organization.
  • Search Enhancement: NER refines search capabilities, allowing for entity-based queries beyond simple keywords.
  • Customer Support: In customer interactions, NER identifies critical information, facilitating automated responses or routing.
  • Compliance: NER scans for sensitive information in documents to help maintain legal and privacy standards.
  • Healthcare: NER extracts medical terms and names from data, supporting patient care and research.
  • Language Applications: It assists in recognizing and correctly handling proper nouns in language learning and translation tools.

Sentimental Analysis : Similarly , use the sentiment-analysis pipeline to analyze the sentiment of the input text.

Sentimental Analysis Example

# sentimental_demo.py
from transformers import pipeline

# Load the sentiment-analysis pipeline
sentiment_analysis = pipeline("sentiment-analysis","distilbert/distilbert-base-uncased-finetuned-sst-2-english")


prompt=["I am very disappointed by your behavior","I like medium.com articles"] 
# Perform sentiment analysis
result = sentiment_analysis(prompt)
index=0
for sentiment  in result:
    print (f"Text is '{prompt[index]}' , Tone is {sentiment['label']},\
            Score is : {round(sentiment['score']*100,2)}%")
    index=index+1        

Output of Sentimental Analysis program

Article content

Practical Uses of Sentimental Analysis :

  • By analyzing sentiments expressed in news articles, social media, and financial reports, investors and traders can make more informed decisions, predict market movements, and assess market sentiment trends.
  • By automatically analyzing the sentiment of customer messages, companies can identify urgent issues, address negative feedback promptly, and improve overall customer satisfaction.
  • By analyzing sentiments in customer reviews, surveys, or social media discussions, businesses can gain valuable insights into customer satisfaction, preferences, and sentiment towards their products or services.
  • By analyzing sentiments expressed in online discussions, companies can identify areas for improvement, address customer concerns, and manage their brand reputation more effectively.

Summarization

Summarization feature is like a literary chef, expertly condensing lengthy texts into bite-sized, flavorful summaries. With just a few lines of Python, it serves up the essence of articles, papers, or any long documents, saving you hours of reading. It’s your go-to for quick insights and key points, all thanks to the magic of AI. Review the example below

# Summarization Example
from transformers import pipeline

# Initialize the text-generation pipeline 
generator = pipeline('summarization')

# Your text prompt
prompt = '''The Republican National Committee began laying off dozens of staffers on Monday, days after Donald Trump’s handpicked team took the reins of the organization, according to two Republican operatives with knowledge of the dismissals.

The layoffs affect staffers across multiple departments, the sources said. The cuts also go beyond senior staff to vendors and mid-level employees, one of the Republican operatives said. Vendor contracts will likely be cut as well.

Some staff who were asked to resign could reapply for jobs at the organization.

It’s not unusual for there to be staff turnover at a national committee after that committee’s party has a de facto or official presidential nominee, but the depths of these cuts go beyond the norm and underscore the lackluster fundraising the committee has experienced lately.

Politico first reported on the layoffs.

Top officials in communications, the political department and the data team were laid off, according to multiple Republicans with knowledge of those layoffs.
'''


# Generate text
output = generator(prompt)

# Print the generated text
for element in output:
 print(f"Output is {element['summary_text']}")        

Output of above program

Article content

Usage of Summarization

  • Content Creation and Management: Summarization enables media professionals to create quick abstracts of their content, making key points accessible without reading everything.
  • Information Retrieval and Research: Summarization aids researchers and students by extracting crucial insights from vast academic materials, supporting studies and literature reviews.
  • Business Intelligence: Summarization condenses critical business insights from reports and analyses, helping executives grasp important information swiftly.
  • Customer Support and Email Management: Summarization streamlines customer service by providing concise summaries of customer communications, enhancing response efficiency.
  • Legal Document Analysis: Summarization offers lawyers fast access to the essentials of legal texts, facilitating efficient legal review and analysis.
  • Educational Tools: Summarization can summarize educational content, helping students with quicker revision and better understanding.
  • News Aggregation: Summarization allows news services to offer brief summaries of articles, enabling readers to quickly catch up on important news.
  • Meeting and Conference Summaries: Summarization creates concise records of meetings and conferences, capturing essential points for easy reference.
  • Healthcare Documentation: Summarization simplifies the review of healthcare records and research, ensuring quick access to crucial medical information for professionals.

Note : When you run above Python Program it will download the required library and data, additionally it will run with defaults, so you would also see few messages about using defaults. One of the major download will be safetensors.

What is SafeTensors ?

SafeTensors is a modern solution for securely storing and easily accessing the numerical data used in AI and ML models, similar to tensors. Think of it as a digital photo frame for your data. Unlike traditional methods like ‘pickle’ in Python, which can be slow and risky for data, SafeTensors allows for quick, direct access to your data without compromising its integrity, thanks to its “zero-copy” approach. It’s a smarter, safer way to handle important AI data, making it as convenient as flipping through photos on a digital frame.

Summary

In wrapping up this tutorial on Natural Language Processing using the Hugging Face Transformers library and Python, it’s worth emphasizing the array of pipelines at your disposal. We have barely scratched the surface using the Named Entity Recognition , Sentimental Analysis and Summarization. And, there are many out there, the power and efficiency of these tools may surprise you, enabling significant achievements with minimal coding effort. Please note that the scope of your projects will inherently be constrained by smaller datasets and limited computational resources. While crafting the next ChatGPT might be beyond reach without access to vast datasets and substantial computing infrastructure, embarking on this journey with the knowledge and skills you’ve gained sets a solid foundation for your path forward in the world of NLP.

Reference :

Daniel Sullivan

Assoc Director at NightWing | Hands-On AWS Systems Engineer and Cloud Architect | U.S. Navy Veteran | Cyber Security Certified | Ex-Raytheon | Active Clearance

1mo

This is very cool Nilesh Trivedi using voice with python!

Like
Reply
Bimal Dave

Lawyer + Advisor in star Health Allied Health Insurance Company.

1mo

Mahadev bhai majama

Like
Reply

To view or add a comment, sign in

More articles by Nilesh Trivedi

Insights from the community

Others also viewed

Explore topics