The evolution of learning in neural networks
XOR

The evolution of learning in neural networks

A perceptron is the most basic type of artificial neural network, consisting of inputs as feature values, weights as modifiable parameters, and a bias as the additional parameter to shift the decision boundary.

Normally, the activation function is a step function or sigmoid, which squashes the output between 0 and 1, and the output is a prediction based on input processing.

You can see it as a simple decision-making unit: it takes multiple inputs, multiplies them by corresponding weights, sums them up, and applies an activation function to produce an output.

y = (x1 w1) + (x2 * w2) + ... + (xn * wn) + bias

So let's create a simple AND single neural network.



Article content
AND gate

The perceptron's algorithm iteratively adjusts the weights and bias of the perceptron to minimize the error between the predicted output and the target output.

The learning rate controls the step size of the weight adjustments.

The process will be repeated for a specified number of epochs.

The code in the example uses a step function as its activation:

output = 1 if weighted_sum >= 0 else 0.

This means that if the weighted sum is greater than or equal to 0, the output is 1; otherwise, it's 0.

It's a simple threshold activation.


Article content
perceptron's algorithm

Instead using ReLU Activation, as ReLU is defined as:

f(x) = max(0, x)

it outputs the input if it's positive, and 0 if it's negative.

So the first difference would be that the step function produces a binary output (0 or 1), while ReLU produces a continuous output, i.e. any non-negative number.

The step function is not differentiable at 0, while the ReLU function is differentiable almost everywhere.

And this differentiability is very important for backpropagation in deeper networks.


Article content
ReLU

So the perceptron learning rule is a simpler learning algorithm working for single-layer perceptrons.

It directly updates the weights facing the error between the predicted and actual outputs.

While the code does incorporate the ReLU activation function, it is still using the perceptron learning rule for a single-layer perceptron.

Backpropagation is a more general algorithm used to train multi-layer neural networks.

It concerns calculating the gradient of the error concerning the weights and biases throughout the network and then using this gradient to update the parameters.

ReLU is a computationally efficient activation function often used in conjunction with backpropagation in multi-layer networks.

But it is not enough.


Article content
backpropagation

The last code uses backpropagation to adjust the weights and bias based on the error between its predictions and the actual values. It can use either a ReLU or a Sigmoid activation function.

If the problem is a binary classification problem like the AND function, then a Sigmoid is often a good choice for the final activation layer since it provides a probability between 0 and 1. 

However, ReLU may also be used, even tho when the output is a continuous value or if the problem is a multi-class classification problem, ReLU might be a more suitable choice.


Most of the time you need to test different values to see what works best for your specific problem. And using multiple activation options, and observing their behavior, helps to better understand the problem. 

It depends on the specific characteristics of the task and dataset, using empirical evaluation.


 


To view or add a comment, sign in

More articles by Mauro Ghiani

  • The flow of thoughts is true communication.

    I don't think that communication with Gemini or ChatGPT is natural, 𝗱𝘆𝗻𝗮𝗺𝗶𝗰, 𝗮𝗻𝗱 𝗲𝘃𝗼𝗹𝘃𝗶𝗻𝗴. 𝗪𝗵𝗶𝗹𝗲…

  • The Ethics of AI: What It Isn't.

    One of the first problems related to the rise of artificial intelligence is 𝗲𝘁𝗵𝗶𝗰𝗮𝗹 questions. If any large…

  • Ubiquitous language: Lost in translation

    Ubiquitous language is an idea commonly used in Domain-Driven Design (DDD). It refers to a shared vocabulary between…

  • Nothing comes from nothing.

    One of the most significant inquiries humans have ever explored is how: "Nothing comes from nothing" (in Latin: ex…

  • Learning is making inferences.

    When Marin Mersenne (1588–1648), a French mathematician, was ordained into the Minim Order (pun intended), he took vows…

  • Empty features

    The void cannot exist. This very idea, in time, was linked to Aristotle, the ancient Greek philosopher.

  • Gravity is not data.

    "Gravity" is a song by John Mayer, featured on three different releases. It is a narcissistic representation of love…

  • The shipwreck of the Perceptron

    A pessimist is an optimist who didn't make it. Frank Rosenblatt (1) in 1957 introduced the Perceptron, a model inspired…

  • Misquoting Truth

    When George E. P.

  • All natural numbers

    In Category Theory identity it’s just an arrow that starts from the object and ends to it. Somehow the object and the…

    1 Comment

Insights from the community

Others also viewed

Explore topics