Connecting the Dots (Part 2) : Deep Learning Basics for Generative AI
As a generative AI, exist because of the power of deep learning. This revolutionary field has given us the ability to learn from massive datasets and create entirely new forms of text, images, and even code. This article covers the basics of the Deep Learning. I have covered the high-level details for the quick deep learning introduction.
Introduction to Deep Learning:
Deep learning is a subfield of machine learning loosely inspired by the structure and function of the human brain. Deep learning leverages artificial neural networks with multiple layers to learn complex patterns and classifications from large amounts of data. This capability makes deep learning valuable for developing a wide range of applications, including image recognition, natural language processing (NLP), speech recognition, recommendation systems, time series forecasting, and even medical diagnosis. These capabilities allow deep learning models to make accurate predictions on unseen data, outperform traditional machine learning models in specific tasks, and contribute to advancements in medical diagnosis.
High-Level Overview of Deep Learning Architecture
The human brain is very powerful and complex, and its neural networks have undoubtedly influenced the development of deep learning. In this section, we'll explore how this biological inspiration translates into the artificial world of deep learning models.
Think of the human brain as a giant network of interconnected cells called neurons. These neurons have branches called dendrites that receive signals from other neurons. A long fiber called an axon carries a signal away from the neuron. The axon branches out and connects to other neurons at special junctions called synapses. When a neuron receives a sufficient electrical signal, it fires its own signal and releases chemicals called neurotransmitters. These chemicals can excite or inhibit nearby neurons, influencing their activity. Billions of neurons connect in complex ways, allowing your brain to learn, remember, and process information.
This activation function is crucial because it introduces non-linearity into the network, allowing it to learn complex patterns in data that go beyond simple linear relationships.
Input – X1 , X2 , X3 , X4
Weight – W1 , W2, W3 , W4
Z = W1 X1 + W2 X2 + W3 * X3
The weighted sum (Z) is then passed through a mathematical function called the activation function. This is where the magic of non-linearity happens Also note that all the input values to the ANN will be numerical. We initially assign random values to the weights for each input. Once the neuron determines the output based on the activation function, we can calculate the error. This error is the difference between the actual value (what the network should have predicted) and the predicted value (the output from the activation function). Here's where the magic of backpropagation comes in! This process uses the error to adjust the weights of the neuron in a way that reduces the error for that specific training example. By iteratively feeding data, making predictions, calculating errors, and adjusting weights, the network gradually learns the patterns within the data and improves its accuracy over time.
There are three main types of layers:
Recommended by LinkedIn
Input Layer: Receives the raw, pre-processed data as numerical values. The size of this layer depends on the number of features in your data (e.g., number of pixels in an image, number of words in a sentence).
Hidden Layers: These are the layers where the actual computation and learning take place. Deep learning architectures typically have multiple hidden layers, allowing them to learn complex patterns. The number of hidden layers and the number of neurons within each layer are hyperparameters that can be tuned for optimal performance. It's important to note that hidden layers with non-linear activation functions are crucial for learning these complex relationships between the input data and the desired output (as discussed earlier in the explanation of artificial neurons).
Output Layer: Produces the final classification or prediction based on the processed information from the hidden layers. The size of this layer depends on the complexity of the task.
o Convolutional Neural Networks (CNN’s) – specialized for image and video recognition / pattern identification use cases.
o Recurrent Neural Networks(RNNs) – Recurrent Neural Networks (RNNs) are particularly well-suited for NLP tasks due to their ability to handle sequential data like sentences. This capability also makes them valuable for video analysis, time series forecasting, and music generation. Interestingly, RNNs paved the way for advancements like the Transformer model, which played a key role in the recent Generative AI revolution. Unlike the sequential processing of RNNs, the Transformer utilizes a self-attention mechanism. This allows the Transformer to analyze relationships between all positions in the sequence simultaneously, offering advantages for certain tasks, particularly in generative AI applications.
o Autoencoders – Used for data compression by learning efficient representations of the input data
Training Deep Learning Models:
Normalization: Squishes the data points in each feature (column) to a specific range, typically between 0 and 1 (min-max scaling) or -1 and 1. Use normalization if the data distribution is unknown or you specifically need the data to fall within a certain range.
Standardization: Transforms the data points in each feature(column) to have a standard normal distribution (bell-shaped curve) with a mean of 0 and a standard deviation of 1. Use standardization if you want to preserve the original distribution shape of the data (except for centering and scaling) and you're confident the data doesn't contain many outliers.
o Scikit-learn: A library offers functionalities for data manipulation, model building (including basic neural networks), and evaluation for various machine learning tasks.
o Keras: A high-level neural network API that sits on top of frameworks like TensorFlow. It provides a user-friendly interface for building and training deep learning models.
o TensorFlow: An open-source framework for numerical computations, especially popular for deep learning. It handles the complex mathematical operations behind the scenes for training and running deep learning models.