How To use Cloud Speech-To-Text For Speech Recognition On GCP?
Last Updated :
15 Mar, 2024
Google Cloud Platform is one of the famous cloud service providers in the market. With cloud features focusing on deployment and storage, GCP also provides features like speech recognition. This powerful and easy-to-use service is called Cloud speech-to-text. This functionality enables developers to convert spoken language into text with high accuracy. Speech-to-text can be integrated with applications to provide transcriptions, and businesses can use this to enhance their accessibility. In this article, we will be learning about this Cloud Speech-to-Text provided by GCP and how we can use this feature to get transcription of Speech.
Key Terminologies
- Google Cloud Platform (GCP): Google Cloud Platform is a suite of cloud computing services provided by Google. The service provided by GCP are, computing, storage, machine learning, and more. Check out Google Cloud Platform Tutorial for tutorials on Google Cloud Platform.
- Cloud Speech-to-Text: Cloud speech-to-text is a service on GCP that enables developers to convert audio input to text using Google's speech recognition technology. This service can be integrated with other applications via API and helps in providing better accessibility.
Step To Use Cloud Speech-To-Text For Speech Recognition On GCP
Step 1: Open GCP Cloud Console
- Open the log into Google Cloud Platform. In your web browser go to GCP Cloud and log in with your valid credentials.
- You must have a valid subscription plan to use the services we are going to use.
- Make sure that you have an active subscription or a trial plan.
Step 2: Enable Cloud Speech-To-Text API
- Once you are logged into GCP Console, navigate to "API & Services" section.
- Click on "Enable APIs and Services".

- This will open a search bar to search for required APIs. Search for "Cloud Speech-to-Text API".

- Click on it and this will show you details about this API. Click on enable to enable this API for your project.

Step 3: Create A Service Account
- Next, we need to create a service account to generate Key which will help to authenticate our requests.
- Service account is a special type of account used by applications and Virtual Machines to authenticate and interact with other GCP services and APIs.
- Navigate to the "APIs & Services" and click on "Credentials".

- Click on "Create Credentials" and select "Service Account".

- Now give a name to this service account and click on "Create and continue".

- For Role, select owner and click on Continue.

- Leave all other details as default/ pre-set. Click on Done.
Step 4: Create JSON Key
- JSON Key, is also known as Service Account Key or Credentials File.
- It is a JSON (JavaScript Object Notation) file format that contains authentication information for a service account in Google Cloud Platform.
- To generate the JSON Key for our service account click on the newly created service account.

- Now go to the Key section and select Create new key.

For Key Type select JSON and it will create the Key and a JSON file will be download automatically.

Step 5: Install Required Packages
- Here we are going to implement this Cloud feature using Python.
- Open any Python IDE available in your system to procced further. You can also use Google Colab to implement this.
- We will first upgrade the google-cloud-speech package in Python. If this is not available, the package will be installed.
pip install --upgrade google-cloud-speech
Output

Step 6: Import Library
- Let's import the required library for our Cloud Speech-to-Text implementation.
from google.cloud import speech
- This module is a part of Google Cloud client library for Python, which provides convenient access to Google Cloud services, including the Cloud Speech-to-Text API.
Step 7: Connect With GCP
- Now connect the python environment to the Google Cloud service account using the JSON key we have generated. First put the JSON file in the working directory and then execute the following line of code.
client = speech.SpeechClient.from_service_account_file('[file_name].json')

Step 8: Select Speech File
- Get any audio file that contains some speech and paste it in the current directory.
- Then specify the path for the audio file and then open it and store it in a variable.

Step 9: Perform Speech-to-Text Operation
- First we will pass the binary data of audio file contained in the 'mp3_data' variable to the Cloud Speech-to-Text API for transcription.
audio_file = speech.RecognitionAudio(content = mp3_data)
- Now, create a variable to define a configuration object for speech recognition request.
- We will set the sample rate of the audio file which signifies the number of audio carried per second, in Hertz. Also enable automatic punctuation to get appropriate result including comma, question marks, etc.
- Lastly define the language-code which is American English (en-US) in this case.
config = speech.RecognitionConfig(
sample_rate_hertz=44100,
enable_automatic_punctuation=True,
language_code='en-US'
)
- Store the transcription results obtained from the Google Cloud Speech-to-Text API in a response variable.
- We will call the speech recognition process using specified configuration and audio data.
Output:

Step 10: Check Result
- Let's try printing the response we got and see what it shows
print(response)
Output:

The response has the following details including,
- Transcript: This is the text generated by the speech recognition process.
- Confidence: Confidence indicates the likelihood that the transcribed text accurately represents the spoken words.
- result_end_time: Incdicates the end time of the audio segment.
- Language Code: The language code specifies the language of the transcripted text.
- Total Billed Time: This is the time billed for transcription process mesured in seconds.
- Request Id: Request id is the unique identifier assigned to the speech recognition request by Google Cloud Speech-to-Text API.
Here, we need only the transcription as output, so let's format the print statement to get only the transcription.
for result in response.results:
print("Transcript : {} ".format(result.alternatives[0].transcript))
Output:

Conclusion
Google Cloud Speech-to-Text API offers a powerful and reliable solution for converting audio data into text with high accuracy. By using this Cloud feature, developers can easily integrate speech recognition functionality in their application. We can use this feature for cases like, transcription, voice-controlled interfaces, sentiment analysis and more. Google Cloud Speech-to-Text API provides the tools and functionality to provide accurate and efficient speech recognition as per requirements.
Similar Reads
Speech Recognition in Python using CMU Sphinx
"Hey, Siri!", "Okay, Google!" and "Alexa playing some music" are some of the words that have become an integral part of our life as giving voice commands to our virtual assistants make our life a lot easier. But have you ever wondered how these devices are giving commands via voice/speech? Do applic
5 min read
How to Set Up Speech Recognition on Windows?
Windows 11 and Windows 10, allow users to control their computer entirely with voice commands, allowing them to navigate, launch applications, dictate text, and perform other tasks. Originally designed for people with disabilities who cannot use a mouse or keyboard. In this article, We'll show you H
5 min read
How to Use Cloud TPU for High-Performance Machine Learning on GCP?
Google's Cloud Tensor Processing Units (TPUs) have emerged as a game-changer in the realm of machine learning. Designed to accelerate complex computations, these TPUs offer remarkable performance enhancements, making them an integral part of the Google Cloud Platform (GCP). This article aims to prov
4 min read
How to Convert Text to Speech on Linux
Text-to-speech (TTS) is the process of transforming written text into spoken words by means of computer technology. Just imagine a computer that reads a book to you. That is, quite literally, the ultimate device from TTS. TTS, in short, is an electronic voice living in the shell of robots. We can co
3 min read
How to Use Cloud NAT For Outbound Internet Access on GCP?
NAT Gateway is a Network address translation gateway that enables multiple devices to access a public network through a single IP address. It is mainly used to conserve registered IP addresses and use private IP addresses instead. Cloud NAT is used to expose internal private resources to the interne
3 min read
Speech Recognition Module Python
Speech recognition, a field at the intersection of linguistics, computer science, and electrical engineering, aims at designing systems capable of recognizing and translating spoken language into text. Python, known for its simplicity and robust libraries, offers several modules to tackle speech rec
4 min read
How to Use Cloud Storage to Store Your Data?
In today's fast-evolving world where data is the new currency, it is important to manage and store data appropriately, whether you are a business professional, a student, or simply someone who values your data and privacy. Cloud storage is one solution to it and is more effective than the traditiona
9 min read
How to use Google Colab for Machine Learning Projects
The Google Colab is a cloud-based Jypyter notebook platform that can be used in Data Science. The colab platform is freely accessible to everyone and it auto-saves the projects. This allows us to run and train complex machine-learning models efficiently. It provides a user-interactive development en
4 min read
PyTorch for Speech Recognition
Speech recognition is a transformative technology that enables computers to understand and interpret spoken language, fostering seamless interaction between humans and machines. By implementing algorithms and machine learning techniques, speech recognition systems transcribe spoken words into text,
5 min read
How to Use Cloud DNS to Expose Your Web Page to Internet?
Cloud DNS is a service provided by Google Cloud Platform that allows you to map domain names to IP addresses. It is a popular technology that enables you to manage a domain name system (DNS) in the cloud. It is offered as a service by various cloud computing providers and allows you to manage their
3 min read