SlideShare a Scribd company logo
Feed-Forward Neural Networks
Content
 Introduction
 Single-Layer Perceptron Networks
 Learning Rules for Single-Layer Perceptron Networks
– Perceptron Learning Rule
– Adaline Leaning Rule
 -Leaning Rule
 Multilayer Perceptron
 Back Propagation Learning algorithm
Feed-Forward Neural Networks
Introduction
Historical Background
 1943 McCulloch and Pitts proposed the first
computational models of neuron.
 1949 Hebb proposed the first learning rule.
 1958 Rosenblatt’s work in perceptrons.
 1969 Minsky and Papert’s exposed limitation of the
theory.
 1970s Decade of dormancy for neural networks.
 1980-90s Neural network return (self-
organization, back-propagation algorithms, etc)
Nervous Systems
 Human brain contains ~ 1011
neurons.
 Each neuron is connected ~ 104
others.
 Some scientists compared the brain with a
“complex, nonlinear, parallel computer”.
 The largest modern neural networks
achieve the complexity comparable to a
nervous system of a fly.
Neurons
 The main purpose of neurons is to receive, analyze and transmit
further the information in a form of signals (electric pulses).
 When a neuron sends the information we say that a neuron “fires”.
Neurons
This animation demonstrates the firing of a
synapse between the pre-synaptic terminal of
one neuron to the soma (cell body) of another
neuron.
Acting through specialized projections known as
dendrites and axons, neurons carry information
throughout the neural network.
bias
x1
x2
xm= 1
wi1
wi2
wim =i
.
.
.
A Model of Artificial Neuron
yi
f (.) a (.)

bias
x1
x2
xm= 1
wi1
wi2
wim =i
.
.
.
A Model of Artificial Neuron
yi
f (.) a (.)

1
( )
m
i ij j
j
f w x



)
(
)
1
( f
a
t
yi 



 

otherwise
f
f
a
0
0
1
)
(
Feed-Forward Neural Networks
 Graph representation:
– nodes: neurons
– arrows: signal flow directions
 A neural network that does not
contain cycles (feedback loops) is
called a feed–forward network (or
perceptron).
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
Hidden Layer(s)
Input Layer
Output Layer
Layered Structure
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
Knowledge and Memory
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
 The output behavior of a network is
determined by the weights.
 Weights  the memory of an NN.
 Knowledge  distributed across the
network.
 Large number of nodes
– increases the storage “capacity”;
– ensures that the knowledge is
robust;
– fault tolerance.
 Store new information by changing
weights.
Pattern Classification
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
 Function: x  y
 The NN’s output is used to
distinguish between and recognize
different input patterns.
 Different output patterns
correspond to particular classes of
input patterns.
 Networks with hidden layers can be
used for solving more complex
problems then just a linear pattern
classification.
input pattern x
output pattern y
Training
. . .
. . .
. . .
. . .
 
(1) (2)
(1) (2 )
) ( )
(
( , ),( , ), ,( , ),
k
k
 d d d
x x
T x  
( )
1 2
( , , , )
i
i i im
x x x

x 
( )
1 2
( , , , )
i
i i in
d d d d
 
xi1 xi2 xim
yi1 yi2 yin
Training Set
di1 di2 din
Goal:
( ( )
)
(
M )
in i
i
i
E error
 
 d
y
( ) 2
( )
i
i
i
 
 d
y
Generalization
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
 By properly training a neural
network may produce reasonable
answers for input patterns not seen
during training (generalization).
 Generalization is particularly useful
for the analysis of a “noisy” data
(e.g. time–series).
Generalization
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
 By properly training a neural
network may produce reasonable
answers for input patterns not seen
during training (generalization).
 Generalization is particularly useful
for the analysis of a “noisy” data
(e.g. time–series).
-1.5
-1
-0.5
0
0.5
1
1.5
-1.5
-1
-0.5
0
0.5
1
1.5
without noise with noise
Applications
Pattern classification
Object recognition
Function approximation
Data compression
Time series analysis and forecast
. . .
Feed-Forward Neural Networks
Single-Layer
Perceptron Networks
The Single-Layered Perceptron
. . .
x1 x2 xm= 1
y1 y2 yn
xm-1
. . .
. . .
w11
w12
w1m
w21
w22
w2m wn1
wnm
wn2
Training a Single-Layered Perceptron
. . .
x1 x2 xm= 1
y1 y2 yn
xm-1
. . .
. . .
w11
w12
w1m
w21
w22
w2m wn1
wnm
wn2
d1 d2 dn
 
(1) (
(1) (2) )
)
2) (
(
( , ),( , ), ,( , )
p p
 x x
d d d
x
T 
Training Set
Goal:
( )
k
i
y 
( )
1
m
k
l
l
il x
w
a

 
 
 
 

1,2, ,
1,2, ,
i n
k p





( )
k
i
d
( )
( )
T
i
k
a x
w
Learning Rules
. . .
x1 x2 xm= 1
y1 y2 yn
xm-1
. . .
. . .
w11
w12
w1m
w21
w22
w2m wn1
wnm
wn2
d1 d2 dn
 
(1) (
(1) (2) )
)
2) (
(
( , ),( , ), ,( , )
p p
 x x
d d d
x
T 
Training Set
Goal:
( )
k
i
y 
( )
1
m
k
l
l
il x
w
a

 
 
 
 

1,2, ,
1,2, ,
i n
k p





( )
k
i
d
( )
( )
T
i
k
a x
w
 Linear Threshold Units (LTUs) : Perceptron Learning Rule
 Linearly Graded Units (LGUs) : Widrow-Hoff learning Rule
Feed-Forward Neural Networks
Learning Rules for
Single-Layered Perceptron
Networks
 Perceptron Learning Rule
 Adline Leaning Rule
 -Learning Rule
Perceptron
Linear Threshold Unit
( )
T k
i
w x
sgn
( ) ( )
sgn( )
k T k
i i
y  w x
x1
x2
xm= 1
wi1
wi2
wim =i
.
.
.
 +1
1
Perceptron
Linear Threshold Unit
( )
T k
i
w x
sgn
( ) ( )
sgn( )
k T k
i i
y  w x
) (
( ) ( )
sgn( ) { , }
1,2, ,
1,2, ,
1 1
k
i
k T k
i i
y
i n
k p
d
  



w x


Goal:
x1
x2
xm= 1
wi1
wi2
wim =i
.
.
.
 +1
1
Example
x1 x2 x3= 1
2 1 2
y
 
T
T
T
]
2
,
1
[
,
]
1
,
5
.
1
[
,
]
0
,
1
[ 




Class 1 (+1)
 
T
T
T
]
2
,
1
[
,
]
1
,
5
.
2
[
,
]
0
,
2
[ 

Class 2 (1)
Class 1
Class 2
x1
x2
g
(
x
)
=

2
x
1
+
x
2
+
2
=
0
) (
( ) ( )
sgn( ) { , }
1,2, ,
1,2, ,
1 1
k
i
k T k
i i
y
i n
k p
d
  



w x


Goal:
Augmented input vector
x1 x2 x3= 1
2 1 2
y
 
T
T
T
]
2
,
1
[
,
]
1
,
5
.
1
[
,
]
0
,
1
[ 




Class 1 (+1)
 
T
T
T
]
2
,
1
[
,
]
1
,
5
.
2
[
,
]
0
,
2
[ 

Class 2 (1)
(4) (5) (6)
(4) (5) (6)
2 2.5 1
0 , 1 , 2
1, 1, 1
1 1
1
x x x
d d d
  
     
     
    
     
     
     
  
(1) (2) (3)
(1) (2) (3)
1 1.5 1
0 , 1 , 2
1, 1,
1 1 1
1
x x x
d d d
  
     
     
    
     
     
     

 

 
Class 1 (+1)
Class 2 (1)
( ) ( ) ( )
1 2 3
sgn( )
( , , )
k T k k
T
y d
w w w
 

w x
w
Goal:
Augmented input vector
x1 x2 x3= 1
2 1 2
y
Class 1
(1, 2, 1)
(1.5, 1, 1)
(1,0, 1)
Class 2
(1, 2, 1)
(2.5, 1, 1)
(2,0, 1)
x1
x2
x3
(0,0,0)
1 2 3
( ) 2 2 0
g x x x x
   
0

x
wT
( ) ( ) ( )
1 2 3
sgn( )
( , , )
k T k k
T
y d
w w w
 

w x
w
Goal:
Augmented input vector
x1 x2 x3= 1
2 1 2
y
Class 1
(1, 2, 1)
(1.5, 1, 1)
(1,0, 1)
Class 2
(1, 2, 1)
(2.5, 1, 1)
(2,0, 1)
x1
x2
x3
(0,0,0)
1 2 3
( ) 2 2 0
g x x x x
   
0

x
wT
( ) ( ) ( )
1 2 3
sgn( )
( , , )
k T k k
T
y d
w w w
 

w x
w
Goal:
A plane passes through the origin
in the augmented input space.
Linearly Separable vs.
Linearly Non-Separable
0 1
1
0 1
1
0 1
1
AND OR XOR
Linearly Separable Linearly Separable Linearly Non-Separable
Goal
 Given training sets T1C1 and T2  C2 with
elements in form of x=(x1, x2 , ..., xm-1 , xm)T
,
where x1, x2 , ..., xm-1 R and xm= 1.
 Assume T1 and T2 are linearly separable.
 Find w=(w1, w2 , ..., wm)T
such that







2
1
1
1
)
sgn(
T
T
T
x
x
x
w
Goal
 Given training sets T1C1 and T2  C2 with
elements in form of x=(x1, x2 , ..., xm-1 , xm)T
,
where x1, x2 , ..., xm-1 R and xm= 1.
 Assume T1 and T2 are linearly separable.
 Find w=(w1, w2 , ..., wm)T
such that







2
1
1
1
)
sgn(
T
T
T
x
x
x
w
wT
x = 0 is a hyperplain passes through the
origin of augmented input space.
Observation
x1
x2
+

d = +1
d = 1
+
w1
w2
w3
w4
w5
w6
x
Which w’s correctly
classify x?
What trick can be
used?
Observation
x1
x2
+

d = +1
d = 1
+
w1
x1
+w2
x2
= 0
w
x
Is this w ok?
0
T

w x
Observation
x1
x2
+

d = +1
d = 1
+
w
1
x
1
+
w
2
x
2
=
0
w
x
Is this w ok?
0
T

w x
Observation
x1
x2
+

d = +1
d = 1
+
w
1
x
1
+
w
2
x
2
=
0
w
x
Is this w ok?
How to adjust w?
0
T

w x
w = ?
?
?
Observation
x1
x2
+

d = +1
d = 1
+
w
x
Is this w ok?
How to adjust w?
0
T

w x
w = x
( )T T
T


 
w
x x
w x
w x
reasonable?
<0 >0
Observation
x1
x2
+

d = +1
d = 1
+
w
x
Is this w ok?
How to adjust w?
0
T

w x
w = x
( )T T
T


 
w
x x
w x
w x
reasonable?
<0 >0
Observation
x1
x2
+

d = +1
d = 1

w
x
Is this w ok?
0
T

w x
w = ?
+x or x
Perceptron Learning Rule
+ d = +1
 d = 1
Upon misclassification on

 
w x

 
w x
Define error
r d y
 
2
2
0



 



+ 
 +
No error
0
 
Perceptron Learning Rule
r

 
w x
Define error
r d y
 
2
2
0



 



+ 
 +
No error
Perceptron Learning Rule
r

 
w x
Learning Rate
Error (d  y)
Input
Summary 
Perceptron Learning Rule
Based on the general weight learning rule.
( )
( ) i i
i x t
w t r

 
i
i i
r d y
 
( ( )
( )
) i i i
i
w t y t
d x

 

0
2 1, 1
2 1, 1
i i
i i
i i
d y
d y
d y



   

  

incorrect
correct
Summary 
Perceptron Learning Rule
x y

( )

 
 d y
w x

.
.
.
.
.
.
Converge?
d
+
x y

( )

 
 d y
w x

.
.
.
.
.
.
d
+
Perceptron Convergence Theorem
 Exercise: Reference some papers
or textbooks to prove the theorem.
If the given training set is linearly separable,
the learning process will converge in a finite
number of steps.
The Learning Scenario
x1
x2
+ x(1)
+
x(2)

x(3)  x(4)
Linearly Separable.
The Learning Scenario
x1
x2
w0
+ x(1)
+
x(2)

x(3)  x(4)
The Learning Scenario
x1
x2
w0
+ x(1)
+
x(2)

x(3)  x(4)
w1
w0
The Learning Scenario
x1
x2
w0
+ x(1)
+
x(2)

x(3)  x(4)
w1
w0
w2
w1
The Learning Scenario
x1
x2
w0
+ x(1)
+
x(2)

x(3)  x(4)
w1
w0
w1
w2
w2
w3
The Learning Scenario
x1
x2
w0
+ x(1)
+
x(2)

x(3)  x(4)
w1
w0
w1
w2
w2
w3
w4 = w3
The Learning Scenario
x1
x2
+ x(1)
+
x(2)

x(3)  x(4)
w
The Learning Scenario
x1
x2
+ x(1)
+
x(2)

x(3)  x(4)
w
The demonstration is in
augmented space.
Conceptually, in augmented space, we
adjust the weight vector to fit the data.
Weight Space
w1
w2
+
x
A weight in the shaded area will give correct
classification for the positive example.
w
Weight Space
w1
w2
+
x
A weight in the shaded area will give correct
classification for the positive example.
w
w = x
Weight Space
w1
w2

x
A weight not in the shaded area will give correct
classification for the negative example.
w
Weight Space
w1
w2

x
A weight not in the shaded area will give correct
classification for the negative example.
w
w = x
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w1
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w1
w2
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w1
w2=w3
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w1
w2=w3
w4
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w1
w2=w3
w4
w5
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w1
w2=w3
w4
w5
w6
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w1
w2=w3
w4
w5
w6
w7
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w1
w2=w3
w4
w5
w6
w7
w8
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w1
w2=w3
w4
w5
w6
w7
w8
w9
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w1
w2=w3
w4
w5
w6
w7
w8
w9
w10
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w1
w2=w3
w4
w5
w6
w7
w8
w9
w10 w11
The Learning Scenario in Weight Space
w1
w2
+ x(1)
+
x(2)

x(3)  x(4)
To correctly classify the training set, the
weight must move into the shaded area.
w0
w11
Conceptually, in weight space, we move
the weight into the feasible region.
Feed-Forward Neural Networks
Learning Rules for
Single-Layered Perceptron
Networks
 Perceptron Learning Rule
 Adaline Leaning Rule
 -Learning Rule
Adaline (Adaptive Linear Element)
Widrow [1962]
( )
T k
i
w x
x1
x2
xm
wi1
wi2
wim
.
.
.

( ) ( )
k T k
i i
y w x
Adaline (Adaptive Linear Element)
Widrow [1962]
( )
T k
i
w x
x1
x2
xm
wi1
wi2
wim
.
.
.

( ) ( )
k T k
i i
y w x
( )
( ) ( )
1,2, ,
1,2, ,
k T k k
i i i
y
i
d
n
k p
 


w x


Goal:
In what condition, the goal is reachable?
LMS (Least Mean Square)
Minimize the cost function (error function):
( 2
1
( )
)
1
( ) ( )
2
k
p
k
k
y
E d

 

w
( ( )
) 2
1
1
( )
2
T
k
p
k
k
d

 
 x
w
( )
1 1
(
2
)
1
2
p m
l
l
k
k
k l
x
w
d
 
 
 
 
 
 
Gradient Decent Algorithm
E(w)
w1
w2
Our goal is to go downhill.
Contour Map
(w1, w2)
w1
w2
w
Gradient Decent Algorithm
E(w)
w1
w2
Our goal is to go downhill.
Contour Map
(w1, w2)
w1
w2
w
How to find the steepest decent direction?
Gradient Operator
Let f(w) = f (w1, w2,…, wm) be a function over Rm
.
1 2
1 2 m
m
f f f
w w
dw dw
df w
w
d
  
   
  

Define
 
1 2
, , ,
T
m
dw dw dw

w 
,
df f f
     
w w
1 2
, , ,
T
m
f f f
f
w w w
 
  
  
  
 

Gradient Operator
f
w f
w f
w
df : positive df : zero df : negative
Go uphill Plain Go downhill
,
df f f
     
w w
f
w f
w f
w
The Steepest Decent Direction
df : positive df : zero df : negative
Go uphill Plain Go downhill
,
df f f
     
w w
To minimize f , we choose
w =   f
LMS (Least Mean Square)
Minimize the cost function (error function):
( )
( )
2
1 1
1
( )
2
p m
k l
l
k
l
k
d x
E w
 
 
 
 
 
 
w
( )
1
( ) (
1
)
( ) p m
k l
k k
k
l j
j
l
E
w
w
d x x
 
  
 
 
  
 
w
 
( ( )
(
1
) )
k
k
p
T k
j
k
x
d

 
 x
w  
( )
1
( )
) (
p
k
k k
j
k
y
d x

 

( )
1
(
)
( ) k
p
k
k
j
j
E
w
x






w
(k)
( ) ( ) ( )
k k k
d y
  
Adaline Learning Rule
Minimize the cost function (error function):
( )
( )
2
1 1
1
( )
2
p m
k l
l
k
l
k
d x
E w
 
 
 
 
 
 
w
1 2
( ) ( ) ( )
( ) , , ,
T
w
m
E E E
E
w w w
 
  
  
  
 
w w w
w 
( )
E

 
 w
w w Weight Modification Rule
( )
1
(
)
( ) k
p
k
k
j
j
E
w
x






w ( ) ( ) ( )
k k k
d y
  
Learning Modes
 Batch Learning Mode:
 Incremental Learning Mode:
( ( )
)
1
p
k
k k
j
j x
w  


 
( ( )
)
k k
j
j x
w 



( )
1
(
)
( ) k
p
k
k
j
j
E
w
x






w ( ) ( ) ( )
k k k
d y
  
Summary 
Adaline Learning Rule
x y


 
w δx

.
.
.
.
.
.
d
+

-Learning Rule
LMS Algorithm
Widrow-Hoff Learning Rule
Converge?
LMS Convergence
Based on the independence theory (Widrow, 1976).
1. The successive input vectors are statistically independent.
2. At time t, the input vector x(t) is statistically independent of
all previous samples of the desired response, namely d(1),
d(2), …, d(t1).
3. At time t, the desired response d(t) is dependent on x(t), but
statistically independent of all previous values of the
desired response.
4. The input vector x(t) and desired response d(t) are drawn
from Gaussian distributed populations.
LMS Convergence
It can be shown that LMS is convergent if
max
2
0


 
where max is the largest eigenvalue of the correlation
matrix Rx for the inputs.
1
1
lim T
i i
n
i
n

 

 
x
R x x
LMS Convergence
It can be shown that LMS is convergent if
max
2
0


 
where max is the largest eigenvalue of the correlation
matrix Rx for the inputs.
1
1
lim T
i i
n
i
n

 

 
x
R x x
Since max is hardly available, we commonly use
2
0
( )
tr

 
x
R
Comparisons
Fundamental Hebbian
Assumption
Gradient
Decent
Convergence In finite steps Converge
Asymptotically
Constraint Linearly
Separable
Linear
Independence
Perceptron
Learning Rule
Adaline
Learning Rule
(Widrow-Hoff)
Feed-Forward Neural Networks
Learning Rules for
Single-Layered Perceptron
Networks
 Perceptron Learning Rule
 Adaline Leaning Rule
 -Learning Rule
Adaline
( )
T k
i
w x
x1
x2
xm
wi1
wi2
wim
.
.
.

( ) ( )
k T k
i i
y w x
Unipolar Sigmoid
( )
T k
i
w x
x1
x2
xm
wi1
wi2
wim
.
.
.

i
net
i
e
net
a 



1
1
)
(
 
( ) ( )
k T k
i i
y a
 w x
Bipolar Sigmoid
( )
T k
i
w x
x1
x2
xm
wi1
wi2
wim
.
.
.
  
( ) ( )
k T k
i i
y a
 w x
1
1
2
)
( 

  i
net
i
e
net
a 
Goal
Minimize
( 2
1
( )
)
1
( ) ( )
2
k
p
k
k
y
E d

 

w
( 2
)
1
( )
1
( )
2
k
T
p
k
k
d a

 
 
 
 w x
Gradient Decent Algorithm
Minimize
( 2
1
( )
)
1
( ) ( )
2
k
p
k
k
y
E d

 

w
( )
E

 
 w
w w
1 2
( ) ( ) ( )
( ) , , ,
T
w
m
E E E
E
w w w
 
  
  
  
 
w w w
w 
( 2
)
1
( )
1
( )
2
k
T
p
k
k
d a

 
 
 
 w x
The Gradient
Minimize
( 2
1
( )
)
1
( ) ( )
2
k
p
k
k
y
E d

 

w
 
)
( (
)
k T k
a
y  w x
( )
(
1
)
( )
( )
( )
k
k
j
p
k j
k
d
y
y
w
E
w 
 
 
 

w
1 2
( ) ( ) ( )
( ) , , ,
T
w
m
E E E
E
w w w
 
  
  
  
 
w w w
w 
 
( )
( ) ( )
(
( )
1
)
( )
k k
k
p
k j
k k
net net
net
y
w
a
d

 
 
 

( ) ( )
k
T
k
net  x
w (
1
)
k
m
i
i
i x
w


? ?
( )
( )
j
k
k
j
w
net
x

 

Depends on the
activation function
used.
Weight Modification Rule
Minimize
( 2
1
( )
)
1
( ) ( )
2
k
p
k
k
y
E d

 

w
 
1
( )
( )
( )
(
)
)
(
( )
( )
p
k
k
k
k
j
j
k
k
net
x
a
E
y
w ne
d
t



 
 

w
1 2
( ) ( ) ( )
( ) , , ,
T
w
m
E E E
E
w w w
 
  
  
  
 
w w w
w 
 
(
( )
)
k k
net
y a

 
(
1
)
(
( )
( )
)
k
k
k
p
k
j j
k
ne
w
a t
x
net
 


 


( ) ( ) ( )
k k k
d y
  
Learning
Rule
Batch
 
)
( )
(
( )
( )
k
k
k
j j
k
net
net
a
x
w 

 

Incremental
The Learning Efficacy
Minimize
( 2
1
( )
)
1
( ) ( )
2
k
p
k
k
y
E d

 

w
1 2
( ) ( ) ( )
( ) , , ,
T
w
m
E E E
E
w w w
 
  
  
  
 
w w w
w 
Adaline
Sigmoid
Unipolar Bipolar
( )
net
a net

1
( )
1 net
a
e
net 



2
( ) 1
1 net
a n
e
et 

 

( )
1
a net
net



( ) ( )
(1 )
( ) k k
net
net
y y
a
 



Exercise
 
(
( )
)
k k
net
y a

 
1
( )
( )
( )
(
)
)
(
( )
( )
p
k
k
k
k
j
j
k
k
net
x
a
E
y
w ne
d
t



 
 

w
Learning Rule  Unipolar Sigmoid
Minimize
( 2
1
( )
)
1
( ) ( )
2
k
p
k
k
y
E d

 

w
( ) ( ) ( )
) ( )
1
(
(
( )
)
1
(
)
k k
j
k k
p
k
j
k
y
E
x
d y y
w





 

w
1 2
( ) ( ) ( )
( ) , , ,
T
w
m
E E E
E
w w w
 
  
  
  
 
w w w
w 
( ) )
) ( (
( )
1
(1 )
k k k
k
k
j
p
y y
x
 


 
( ) ( ) ( )
k k k
d y
  
( (
) ) ( )
(
1
)
(1 )
k
j
k
p
k
k k
j x
w y y




 
  Weight Modification Rule
( )
( )
(1
)
k
k
y
y


Comparisons
( (
) ) ( )
(
1
)
(1 )
k
j
k
p
k
k k
j x
w y y




 
 
Adaline
Sigmoid
Batch
Incremental
( ( )
)
1
p
k
k
j j
k
w x
 

  
( ( )
)
k k
j
j x
w 

 
Batch
Incremental
( )
) ( ) )
( (
(1 )
j
k
k
k k
j
w y y
x 

 
 
The Learning Efficacy
net
y=a(net) = net
net
y=a(net)
Adaline Sigmoid
( )
1
a net
net



)
)
(
(1
net
ne
y
a
t
y





constant depends on output
The Learning Efficacy
net
y=a(net) = net
net
y=a(net)
Adaline Sigmoid
( )
1
a net
net



)
)
(
(1
net
ne
y
a
t
y





constant depends on output
The learning efficacy of
Adaline is constant meaning
that the Adline will never get
saturated.
y=net
a’(net)
1
The Learning Efficacy
net
y=a(net) = net
net
y=a(net)
Adaline Sigmoid
( )
1
a net
net



)
)
(
(1
net
ne
y
a
t
y





constant depends on output
The sigmoid will get saturated
if its output value nears the two
extremes.
y
a’(net)
1
0
(1 )
y y
 
Initialization for Sigmoid Neurons
i
net
i
e
net
a 



1
1
)
(
wi1
wi2
wim
x1
x2
xm
.
.
.
( )
T k
i
w x
  
( ) ( )
k T k
i i
y a
 w x
Before training, it weight
must be sufficiently small. Why?
Feed-Forward Neural Networks
Multilayer
Perceptron
Multilayer Perceptron
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
Hidden Layer
Input Layer
Output Layer
Multilayer Perceptron
Input
Analysis
Classification
Output
Learning
Where the
knowledge from?
How an MLP Works?
XOR
0 1
1
x1
x2
Example:
 Not linearly separable.
 Is a single layer
perceptron workable?
How an MLP Works?
0 1
1
XOR
x1
x2
Example:
L1
L2
00
01
11
L2
L1
x1 x2 x3= 1
y1 y2
How an MLP Works?
0 1
1
XOR
x1
x2
Example:
L1
L2
00
01
11
0 1
1
y1
y2
L3
How an MLP Works?
0 1
1
XOR
x1
x2
Example:
L1
L2
00
01
11
0 1
1
y1
y2
L3
How an MLP Works?
0 1
1
y1
y2
L3
L2
L1
x1 x2 x3= 1
L3
y1 y2
y3= 1
z
Example:
Parity Problem
x1
x2
x3
0
1
1
0
1
0
0
000
001
010
011
100
101
110
111 1
x1 x2 x3
Is the problem linearly separable?
Parity Problem
x1
x2
x3
0
1
1
0
1
0
0
000
001
010
011
100
101
110
111 1
x1 x2 x3
P1
P2
P3
Parity Problem
0
1
1
0
1
0
0
000
001
010
011
100
101
110
111 1
x1 x2 x3
x1
x2
x3
P1
P2
P3
111
011
001
000
Parity Problem
x1
x2
x3
P1
P2
P3
111
011
001
000
P1
x1 x2 x3
y1
P2
y2
P3
y3
Parity Problem
x1
x2
x3
P1
P2
P3
111
011
001
000
y1
y2
y3
P4
Parity Problem
y1
y2
y3
P4
P1
x1 x2 x3
P2 P3
y1
z
y3
P4
y2
General Problem
General Problem
Hyperspace Partition
L1
L2
L3
Region Encoding
101
L1
L2
L3
001
000
010
110
111
100
Hyperspace Partition &
Region Encoding Layer
101
L1
L2
L3 001
000
010
110
111
100
L1
x1 x2 x3
L2 L3
101
Region Identification Layer
101
L1
L2
L3 001
000
010
110
111
100
L1
x1 x2 x3
L2 L3
001
Region Identification Layer
101
L1
L2
L3 001
000
010
110
111
100
L1
x1 x2 x3
L2 L3
000
Region Identification Layer
101
L1
L2
L3 001
000
010
110
111
100
L1
x1 x2 x3
L2 L3
110
Region Identification Layer
101
L1
L2
L3 001
000
010
110
111
100
L1
x1 x2 x3
L2 L3
010
Region Identification Layer
101
L1
L2
L3 001
000
010
110
111
100
L1
x1 x2 x3
L2 L3
100
Region Identification Layer
101
L1
L2
L3 001
000
010
110
111
100
L1
x1 x2 x3
L2 L3
111
Region Identification Layer
101
L1
L2
L3 001
000
010
110
111
100
L1
x1 x2 x3
L2 L3
Classification
101
L1
L2
L3 001
000
010
110
111
100
L1
x1 x2 x3
L2 L3
001
101
101 001 000 110 010 100 111
0
1
1 1
0
Feed-Forward Neural Networks
Back Propagation
Learning algorithm
Activation Function — Sigmoid
net
e
net
a
y 




1
1
)
(
net
net
e
e
net
a 

 












 )
(
1
1
)
(
2
y
y
e net 

 1

)
1
(
)
( y
y
net
a 

 
net
1
0.5
0
Remember this
Supervised Learning
. . .
. . .
. . .
. . .
x1 x2 xm
o1 o2 on
Hidden Layer
Input Layer
Output Layer
 
)
,
(
,
),
,
(
),
,
( )
(
)
(
)
2
(
)
2
(
)
1
(
)
1
( p
p
d
x
d
x
d
x
T 

Training Set
d1 d2 dn
Supervised Learning
. . .
. . .
. . .
. . .
x1 x2 xm
o1 o2 on
d1 d2 dn



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1
Sum of Squared Errors
Goal:
Minimize
 
)
,
(
,
),
,
(
),
,
( )
(
)
(
)
2
(
)
2
(
)
1
(
)
1
( p
p
d
x
d
x
d
x
T 

Training Set
Back Propagation Learning Algorithm
. . .
. . .
. . .
. . .
x1 x2 xm
o1 o2 on
d1 d2 dn



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1
 Learning on Output Neurons
 Learning on Hidden Neurons
Learning on Output Neurons
. . .
j . . .
. . .
i . . .
o1 oj on
d1
dj dn
. . .
. . .
. . .
. . .
wji

 
 






 p
l ji
l
p
l
l
ji
ji w
E
E
w
w
E
1
)
(
1
)
(
ji
l
j
l
j
l
ji
l
w
net
net
E
w
E







)
(
)
(
)
(
)
(
)
( )
(
)
( l
j
l
j net
a
o  )
(
)
( l
i
ji
l
j o
w
net 

? ?



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1
Learning on Output Neurons
. . .
j . . .
. . .
i . . .
o1 oj on
d1
dj dn
. . .
. . .
. . .
. . .
wji



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1
)
(
)
(
)
(
)
(
)
(
)
(
l
j
l
j
l
j
l
l
j
l
net
o
o
E
net
E








 
 






 p
l ji
l
p
l
l
ji
ji w
E
E
w
w
E
1
)
(
1
)
(
ji
l
j
l
j
l
ji
l
w
net
net
E
w
E







)
(
)
(
)
(
)
(
)
( )
(
)
( l
j
l
j net
a
o  )
(
)
( l
i
ji
l
j o
w
net 

)
( )
(
)
( l
j
l
j o
d 

depends on the
activation function
Learning on Output Neurons
. . .
j . . .
. . .
i . . .
o1 oj on
d1
dj dn
. . .
. . .
. . .
. . .
wji



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1
)
(
)
(
)
(
)
(
)
(
)
(
l
j
l
j
l
j
l
l
j
l
net
o
o
E
net
E








 
 






 p
l ji
l
p
l
l
ji
ji w
E
E
w
w
E
1
)
(
1
)
(
ji
l
j
l
j
l
ji
l
w
net
net
E
w
E







)
(
)
(
)
(
)
(
)
( )
(
)
( l
j
l
j net
a
o  )
(
)
( l
i
ji
l
j o
w
net 

( ) ( )
( )
l l
j j
d o
 
( ) ( )
(1 )
l l
j j
o o
 
Using sigmoid,
Learning on Output Neurons
. . .
j . . .
. . .
i . . .
o1 oj on
d1
dj dn
. . .
. . .
. . .
. . .
wji



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1
)
(
)
(
)
(
)
(
)
(
)
(
l
j
l
j
l
j
l
l
j
l
net
o
o
E
net
E








 
 






 p
l ji
l
p
l
l
ji
ji w
E
E
w
w
E
1
)
(
1
)
(
ji
l
j
l
j
l
ji
l
w
net
net
E
w
E







)
(
)
(
)
(
)
(
)
( )
(
)
( l
j
l
j net
a
o  )
(
)
( l
i
ji
l
j o
w
net 

( ) ( )
( )
l l
j j
d o
 
( ) ( )
(1 )
l l
j j
o o
 
Using sigmoid,
( )
l
j

(
( )
( )
( ) ( (
)
) ) )
(
( ) (1 )
l
l
l
j
l
j j
l l l
j j
j
E
net
o o
d o 



 

 
Learning on Output Neurons
. . .
j . . .
. . .
i . . .
o1 oj on
d1
dj dn
. . .
. . .
. . .
. . .
wji



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1

 
 






 p
l ji
l
p
l
l
ji
ji w
E
E
w
w
E
1
)
(
1
)
(
ji
l
j
l
j
l
ji
l
w
net
net
E
w
E







)
(
)
(
)
(
)
(
)
( )
(
)
( l
j
l
j net
a
o  )
(
)
( l
i
ji
l
j o
w
net 

)
(l
i
o
( )
( )
( )
l
l
i
ji
l
j
E
o
w




( ( )
) (
( )
( ) )
(1
( )
) l l
j
l l
j
l
j i
j
d o o o
o


 

Learning on Output Neurons
. . .
j . . .
. . .
i . . .
o1 oj on
d1
dj dn
. . .
. . .
. . .
. . .
wji



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1

 
 






 p
l ji
l
p
l
l
ji
ji w
E
E
w
w
E
1
)
(
1
)
(
ji
l
j
l
j
l
ji
l
w
net
net
E
w
E







)
(
)
(
)
(
)
(
)
( )
(
)
( l
j
l
j net
a
o  )
(
)
( l
i
ji
l
j o
w
net 

)
(l
i
o
( )
( )
( )
l
l
i
ji
l
j
E
o
w




( ( )
) (
( )
( ) )
(1
( )
) l l
j
l l
j
l
j i
j
d o o o
o


 

( ) ( )
1
p
l
i
l
l
j
ji
E
o
w






)
(
1
)
( l
i
p
l
l
j
ji o
w 



 

How to train the weights
connecting to output neurons?
Learning on Hidden Neurons



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1
. . .
j . . .
k . . .
i . . .
. . .
. . .
. . .
. . .
wik
wji

 
 






 p
l ik
l
p
l
l
ik
ik w
E
E
w
w
E
1
)
(
1
)
(
ik
l
i
l
i
l
ik
l
w
net
net
E
w
E






 )
(
)
(
)
(
)
(
? ?
Learning on Hidden Neurons



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1
. . .
j . . .
k . . .
i . . .
. . .
. . .
. . .
. . .
wik
wji

 
 






 p
l ik
l
p
l
l
ik
ik w
E
E
w
w
E
1
)
(
1
)
(
ik
l
i
l
i
l
ik
l
w
net
net
E
w
E






 )
(
)
(
)
(
)
(
)
(l
i

)
(l
k
o
Learning on Hidden Neurons



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1
. . .
j . . .
k . . .
i . . .
. . .
. . .
. . .
. . .
wik
wji

 
 






 p
l ik
l
p
l
l
ik
ik w
E
E
w
w
E
1
)
(
1
)
(
ik
l
i
l
i
l
ik
l
w
net
net
E
w
E






 )
(
)
(
)
(
)
(
)
(l
i

)
(l
k
o
)
(
)
(
)
(
)
(
)
(
)
(
l
i
l
i
l
i
l
l
i
l
net
o
o
E
net
E







( ) ( )
(1 )
l l
i i
o o
 
?
Learning on Hidden Neurons



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1
. . .
j . . .
k . . .
i . . .
. . .
. . .
. . .
. . .
wik
wji

 
 






 p
l ik
l
p
l
l
ik
ik w
E
E
w
w
E
1
)
(
1
)
(
ik
l
i
l
i
l
ik
l
w
net
net
E
w
E






 )
(
)
(
)
(
)
(
)
(l
i

)
(l
k
o
)
(
)
(
)
(
)
(
)
(
)
(
l
i
l
i
l
i
l
l
i
l
net
o
o
E
net
E







 






j
l
i
l
j
l
j
l
l
i
l
o
net
net
E
o
E
)
(
)
(
)
(
)
(
)
(
)
(
)
(l
j
 ji
w
( )
( ) ( ) ( ) ( )
( )
(1 )
l
l l l l
i i i ji j
l
j
i
E
o o w
net
  

  


( ) ( )
(1 )
l l
i i
o o
 
Learning on Hidden Neurons



p
l
l
E
E
1
)
(
 




n
j
l
j
l
j
l
o
d
E
1
2
)
(
)
(
)
(
2
1
. . .
j . . .
k . . .
i . . .
. . .
. . .
. . .
. . .
wik
wji

 
 






 p
l ik
l
p
l
l
ik
ik w
E
E
w
w
E
1
)
(
1
)
(
ik
l
i
l
i
l
ik
l
w
net
net
E
w
E






 )
(
)
(
)
(
)
(
)
(l
k
o
)
(
1
)
( l
k
p
l
l
i
ik
o
w
E






)
(
1
)
( l
k
p
l
l
i
ik o
w 



 

( )
( ) ( ) ( ) ( )
( )
(1 )
l
l l l l
i i i ji j
l
j
i
E
o o w
net
  

  


Back Propagation
o1 oj on
. . .
j . . .
k . . .
i . . .
d1
dj dn
. . .
. . .
. . .
. . .
x1 xm
. . .
Back Propagation
o1 oj on
. . .
j . . .
k . . .
i . . .
d1
dj dn
. . .
. . .
. . .
. . .
x1 xm
. . .
( )
( ) ( ) ( ) ( ) ( )
( )
( ) (1 )
l
l l l l l
j j j j j
l
j
E
d o o o
net
 

   

)
(
1
)
( l
i
p
l
l
j
ji o
w 



 

Back Propagation
o1 oj on
. . .
j . . .
k . . .
i . . .
d1
dj dn
. . .
. . .
. . .
. . .
x1 xm
. . .
( )
( ) ( ) ( ) ( ) ( )
( )
( ) (1 )
l
l l l l l
j j j j j
l
j
E
d o o o
net
 

   

)
(
1
)
( l
i
p
l
l
j
ji o
w 



 

( )
( ) ( ) ( ) ( )
( )
(1 )
l
l l l l
i i i ji j
l
j
i
E
o o w
net
  

  


)
(
1
)
( l
k
p
l
l
i
ik o
w 



 

Learning Factors
• Initial Weights
• Learning Constant ()
• Cost Functions
• Momentum
• Update Rules
• Training Data and Generalization
• Number of Layers
• Number of Hidden Nodes
Reading Assignments
 Shi Zhong and Vladimir Cherkassky, “Factors Controlling Generalization Ability of
MLP Networks.” In Proc. IEEE Int. Joint Conf. on Neural Networks, vol. 1, pp. 625-
630, Washington DC. July 1999. (http://www.cse.fau.edu/~zhong/pubs.htm)
 Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986b). "Learning Internal
Representations by Error Propagation," in Parallel Distributed Processing: Explorations
in the Microstructure of Cognition, vol. I, D. E. Rumelhart, J. L. McClelland, and the
PDP Research Group. MIT Press, Cambridge (1986).
(http://www.cnbc.cmu.edu/~plaut/85-419/papers/RumelhartETAL86.backprop.pdf).
Ad

More Related Content

Similar to Lecture2---Feed-Forward Neural Networks.ppt (20)

Neural network
Neural networkNeural network
Neural network
Mahmoud Hussein
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
Ildar Nurgaliev
 
SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1
sravanthi computers
 
Neural Networks and recent advancement.pptx
Neural Networks and recent advancement.pptxNeural Networks and recent advancement.pptx
Neural Networks and recent advancement.pptx
Anilkamboj25
 
Annintro
AnnintroAnnintro
Annintro
kaushaljha009
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
hirokazutanaka
 
6
66
6
Vaibhav Shah
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
Tamer Ahmed Farrag, PhD
 
Classification using perceptron.pptx
Classification using perceptron.pptxClassification using perceptron.pptx
Classification using perceptron.pptx
someyamohsen3
 
10-Perceptron.pdf
10-Perceptron.pdf10-Perceptron.pdf
10-Perceptron.pdf
ESTIBALYZJIMENEZCAST
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
Vara Prasad
 
Module1 (2).pptxvgybhunjimko,l.vgbyhnjmk;
Module1 (2).pptxvgybhunjimko,l.vgbyhnjmk;Module1 (2).pptxvgybhunjimko,l.vgbyhnjmk;
Module1 (2).pptxvgybhunjimko,l.vgbyhnjmk;
vallepubalaji66
 
Deep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroDeep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An Intro
Siby Jose Plathottam
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
zukun
 
Max net
Max netMax net
Max net
Sandilya Sridhara
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
Ahmed BESBES
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
Devashish Patel
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
San Kim
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
Dongheon Lee
 
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNSArtificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Mohammed Bennamoun
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
Ildar Nurgaliev
 
SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1
sravanthi computers
 
Neural Networks and recent advancement.pptx
Neural Networks and recent advancement.pptxNeural Networks and recent advancement.pptx
Neural Networks and recent advancement.pptx
Anilkamboj25
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
hirokazutanaka
 
Classification using perceptron.pptx
Classification using perceptron.pptxClassification using perceptron.pptx
Classification using perceptron.pptx
someyamohsen3
 
Module1 (2).pptxvgybhunjimko,l.vgbyhnjmk;
Module1 (2).pptxvgybhunjimko,l.vgbyhnjmk;Module1 (2).pptxvgybhunjimko,l.vgbyhnjmk;
Module1 (2).pptxvgybhunjimko,l.vgbyhnjmk;
vallepubalaji66
 
Deep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroDeep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An Intro
Siby Jose Plathottam
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
zukun
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
Ahmed BESBES
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
San Kim
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
Dongheon Lee
 
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNSArtificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Mohammed Bennamoun
 

More from KassahunAwoke (6)

04._sensors_and_actuators.ppt Kassa presentation
04._sensors_and_actuators.ppt Kassa presentation04._sensors_and_actuators.ppt Kassa presentation
04._sensors_and_actuators.ppt Kassa presentation
KassahunAwoke
 
Lecture.pptgjggijfgfygzvmgfxwyuivsdghjhfrf
Lecture.pptgjggijfgfygzvmgfxwyuivsdghjhfrfLecture.pptgjggijfgfygzvmgfxwyuivsdghjhfrf
Lecture.pptgjggijfgfygzvmgfxwyuivsdghjhfrf
KassahunAwoke
 
Chap22_2-1.pptkkhhfdggfgirryfdgbkhgvddgj
Chap22_2-1.pptkkhhfdggfgirryfdgbkhgvddgjChap22_2-1.pptkkhhfdggfgirryfdgbkhgvddgj
Chap22_2-1.pptkkhhfdggfgirryfdgbkhgvddgj
KassahunAwoke
 
Drone Safety Day and Educational Use of Drones_DronePro_3-16-2022.pptx
Drone Safety Day and Educational Use of Drones_DronePro_3-16-2022.pptxDrone Safety Day and Educational Use of Drones_DronePro_3-16-2022.pptx
Drone Safety Day and Educational Use of Drones_DronePro_3-16-2022.pptx
KassahunAwoke
 
Perceptron kkkkkkkkkkkkkkkkkkkkkkkk.pptx
Perceptron kkkkkkkkkkkkkkkkkkkkkkkk.pptxPerceptron kkkkkkkkkkkkkkkkkkkkkkkk.pptx
Perceptron kkkkkkkkkkkkkkkkkkkkkkkk.pptx
KassahunAwoke
 
NN-Ch4.pptkkkkkkkkhvgcxjvchvbdjjdjfjjfkdbd
NN-Ch4.pptkkkkkkkkhvgcxjvchvbdjjdjfjjfkdbdNN-Ch4.pptkkkkkkkkhvgcxjvchvbdjjdjfjjfkdbd
NN-Ch4.pptkkkkkkkkhvgcxjvchvbdjjdjfjjfkdbd
KassahunAwoke
 
04._sensors_and_actuators.ppt Kassa presentation
04._sensors_and_actuators.ppt Kassa presentation04._sensors_and_actuators.ppt Kassa presentation
04._sensors_and_actuators.ppt Kassa presentation
KassahunAwoke
 
Lecture.pptgjggijfgfygzvmgfxwyuivsdghjhfrf
Lecture.pptgjggijfgfygzvmgfxwyuivsdghjhfrfLecture.pptgjggijfgfygzvmgfxwyuivsdghjhfrf
Lecture.pptgjggijfgfygzvmgfxwyuivsdghjhfrf
KassahunAwoke
 
Chap22_2-1.pptkkhhfdggfgirryfdgbkhgvddgj
Chap22_2-1.pptkkhhfdggfgirryfdgbkhgvddgjChap22_2-1.pptkkhhfdggfgirryfdgbkhgvddgj
Chap22_2-1.pptkkhhfdggfgirryfdgbkhgvddgj
KassahunAwoke
 
Drone Safety Day and Educational Use of Drones_DronePro_3-16-2022.pptx
Drone Safety Day and Educational Use of Drones_DronePro_3-16-2022.pptxDrone Safety Day and Educational Use of Drones_DronePro_3-16-2022.pptx
Drone Safety Day and Educational Use of Drones_DronePro_3-16-2022.pptx
KassahunAwoke
 
Perceptron kkkkkkkkkkkkkkkkkkkkkkkk.pptx
Perceptron kkkkkkkkkkkkkkkkkkkkkkkk.pptxPerceptron kkkkkkkkkkkkkkkkkkkkkkkk.pptx
Perceptron kkkkkkkkkkkkkkkkkkkkkkkk.pptx
KassahunAwoke
 
NN-Ch4.pptkkkkkkkkhvgcxjvchvbdjjdjfjjfkdbd
NN-Ch4.pptkkkkkkkkhvgcxjvchvbdjjdjfjjfkdbdNN-Ch4.pptkkkkkkkkhvgcxjvchvbdjjdjfjjfkdbd
NN-Ch4.pptkkkkkkkkhvgcxjvchvbdjjdjfjjfkdbd
KassahunAwoke
 
Ad

Recently uploaded (20)

Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdfIBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
VigneshPalaniappanM
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
Modeling the Influence of Environmental Factors on Concrete Evaporation Rate
Modeling the Influence of Environmental Factors on Concrete Evaporation RateModeling the Influence of Environmental Factors on Concrete Evaporation Rate
Modeling the Influence of Environmental Factors on Concrete Evaporation Rate
Journal of Soft Computing in Civil Engineering
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdfATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ssuserda39791
 
Citizen Observatories to encourage more democratic data evidence-based decisi...
Citizen Observatories to encourage more democratic data evidence-based decisi...Citizen Observatories to encourage more democratic data evidence-based decisi...
Citizen Observatories to encourage more democratic data evidence-based decisi...
Diego López-de-Ipiña González-de-Artaza
 
Deepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber ThreatsDeepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber Threats
RaviKumar256934
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdfML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
rameshwarchintamani
 
Slide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptxSlide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptx
vvsasane
 
Physical and Physic-Chemical Based Optimization Methods: A Review
Physical and Physic-Chemical Based Optimization Methods: A ReviewPhysical and Physic-Chemical Based Optimization Methods: A Review
Physical and Physic-Chemical Based Optimization Methods: A Review
Journal of Soft Computing in Civil Engineering
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control
 
2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt
rakshaiya16
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdfIBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
VigneshPalaniappanM
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdfATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ssuserda39791
 
Citizen Observatories to encourage more democratic data evidence-based decisi...
Citizen Observatories to encourage more democratic data evidence-based decisi...Citizen Observatories to encourage more democratic data evidence-based decisi...
Citizen Observatories to encourage more democratic data evidence-based decisi...
Diego López-de-Ipiña González-de-Artaza
 
Deepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber ThreatsDeepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber Threats
RaviKumar256934
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdfML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
rameshwarchintamani
 
Slide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptxSlide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptx
vvsasane
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt
rakshaiya16
 
Ad

Lecture2---Feed-Forward Neural Networks.ppt

  • 2. Content  Introduction  Single-Layer Perceptron Networks  Learning Rules for Single-Layer Perceptron Networks – Perceptron Learning Rule – Adaline Leaning Rule  -Leaning Rule  Multilayer Perceptron  Back Propagation Learning algorithm
  • 4. Historical Background  1943 McCulloch and Pitts proposed the first computational models of neuron.  1949 Hebb proposed the first learning rule.  1958 Rosenblatt’s work in perceptrons.  1969 Minsky and Papert’s exposed limitation of the theory.  1970s Decade of dormancy for neural networks.  1980-90s Neural network return (self- organization, back-propagation algorithms, etc)
  • 5. Nervous Systems  Human brain contains ~ 1011 neurons.  Each neuron is connected ~ 104 others.  Some scientists compared the brain with a “complex, nonlinear, parallel computer”.  The largest modern neural networks achieve the complexity comparable to a nervous system of a fly.
  • 6. Neurons  The main purpose of neurons is to receive, analyze and transmit further the information in a form of signals (electric pulses).  When a neuron sends the information we say that a neuron “fires”.
  • 7. Neurons This animation demonstrates the firing of a synapse between the pre-synaptic terminal of one neuron to the soma (cell body) of another neuron. Acting through specialized projections known as dendrites and axons, neurons carry information throughout the neural network.
  • 8. bias x1 x2 xm= 1 wi1 wi2 wim =i . . . A Model of Artificial Neuron yi f (.) a (.) 
  • 9. bias x1 x2 xm= 1 wi1 wi2 wim =i . . . A Model of Artificial Neuron yi f (.) a (.)  1 ( ) m i ij j j f w x    ) ( ) 1 ( f a t yi        otherwise f f a 0 0 1 ) (
  • 10. Feed-Forward Neural Networks  Graph representation: – nodes: neurons – arrows: signal flow directions  A neural network that does not contain cycles (feedback loops) is called a feed–forward network (or perceptron). . . . . . . . . . . . . x1 x2 xm y1 y2 yn
  • 11. Hidden Layer(s) Input Layer Output Layer Layered Structure . . . . . . . . . . . . x1 x2 xm y1 y2 yn
  • 12. Knowledge and Memory . . . . . . . . . . . . x1 x2 xm y1 y2 yn  The output behavior of a network is determined by the weights.  Weights  the memory of an NN.  Knowledge  distributed across the network.  Large number of nodes – increases the storage “capacity”; – ensures that the knowledge is robust; – fault tolerance.  Store new information by changing weights.
  • 13. Pattern Classification . . . . . . . . . . . . x1 x2 xm y1 y2 yn  Function: x  y  The NN’s output is used to distinguish between and recognize different input patterns.  Different output patterns correspond to particular classes of input patterns.  Networks with hidden layers can be used for solving more complex problems then just a linear pattern classification. input pattern x output pattern y
  • 14. Training . . . . . . . . . . . .   (1) (2) (1) (2 ) ) ( ) ( ( , ),( , ), ,( , ), k k  d d d x x T x   ( ) 1 2 ( , , , ) i i i im x x x  x  ( ) 1 2 ( , , , ) i i i in d d d d   xi1 xi2 xim yi1 yi2 yin Training Set di1 di2 din Goal: ( ( ) ) ( M ) in i i i E error    d y ( ) 2 ( ) i i i    d y
  • 15. Generalization . . . . . . . . . . . . x1 x2 xm y1 y2 yn  By properly training a neural network may produce reasonable answers for input patterns not seen during training (generalization).  Generalization is particularly useful for the analysis of a “noisy” data (e.g. time–series).
  • 16. Generalization . . . . . . . . . . . . x1 x2 xm y1 y2 yn  By properly training a neural network may produce reasonable answers for input patterns not seen during training (generalization).  Generalization is particularly useful for the analysis of a “noisy” data (e.g. time–series). -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5 without noise with noise
  • 17. Applications Pattern classification Object recognition Function approximation Data compression Time series analysis and forecast . . .
  • 19. The Single-Layered Perceptron . . . x1 x2 xm= 1 y1 y2 yn xm-1 . . . . . . w11 w12 w1m w21 w22 w2m wn1 wnm wn2
  • 20. Training a Single-Layered Perceptron . . . x1 x2 xm= 1 y1 y2 yn xm-1 . . . . . . w11 w12 w1m w21 w22 w2m wn1 wnm wn2 d1 d2 dn   (1) ( (1) (2) ) ) 2) ( ( ( , ),( , ), ,( , ) p p  x x d d d x T  Training Set Goal: ( ) k i y  ( ) 1 m k l l il x w a           1,2, , 1,2, , i n k p      ( ) k i d ( ) ( ) T i k a x w
  • 21. Learning Rules . . . x1 x2 xm= 1 y1 y2 yn xm-1 . . . . . . w11 w12 w1m w21 w22 w2m wn1 wnm wn2 d1 d2 dn   (1) ( (1) (2) ) ) 2) ( ( ( , ),( , ), ,( , ) p p  x x d d d x T  Training Set Goal: ( ) k i y  ( ) 1 m k l l il x w a           1,2, , 1,2, , i n k p      ( ) k i d ( ) ( ) T i k a x w  Linear Threshold Units (LTUs) : Perceptron Learning Rule  Linearly Graded Units (LGUs) : Widrow-Hoff learning Rule
  • 22. Feed-Forward Neural Networks Learning Rules for Single-Layered Perceptron Networks  Perceptron Learning Rule  Adline Leaning Rule  -Learning Rule
  • 23. Perceptron Linear Threshold Unit ( ) T k i w x sgn ( ) ( ) sgn( ) k T k i i y  w x x1 x2 xm= 1 wi1 wi2 wim =i . . .  +1 1
  • 24. Perceptron Linear Threshold Unit ( ) T k i w x sgn ( ) ( ) sgn( ) k T k i i y  w x ) ( ( ) ( ) sgn( ) { , } 1,2, , 1,2, , 1 1 k i k T k i i y i n k p d       w x   Goal: x1 x2 xm= 1 wi1 wi2 wim =i . . .  +1 1
  • 25. Example x1 x2 x3= 1 2 1 2 y   T T T ] 2 , 1 [ , ] 1 , 5 . 1 [ , ] 0 , 1 [      Class 1 (+1)   T T T ] 2 , 1 [ , ] 1 , 5 . 2 [ , ] 0 , 2 [   Class 2 (1) Class 1 Class 2 x1 x2 g ( x ) =  2 x 1 + x 2 + 2 = 0 ) ( ( ) ( ) sgn( ) { , } 1,2, , 1,2, , 1 1 k i k T k i i y i n k p d       w x   Goal:
  • 26. Augmented input vector x1 x2 x3= 1 2 1 2 y   T T T ] 2 , 1 [ , ] 1 , 5 . 1 [ , ] 0 , 1 [      Class 1 (+1)   T T T ] 2 , 1 [ , ] 1 , 5 . 2 [ , ] 0 , 2 [   Class 2 (1) (4) (5) (6) (4) (5) (6) 2 2.5 1 0 , 1 , 2 1, 1, 1 1 1 1 x x x d d d                                          (1) (2) (3) (1) (2) (3) 1 1.5 1 0 , 1 , 2 1, 1, 1 1 1 1 x x x d d d                                             Class 1 (+1) Class 2 (1) ( ) ( ) ( ) 1 2 3 sgn( ) ( , , ) k T k k T y d w w w    w x w Goal:
  • 27. Augmented input vector x1 x2 x3= 1 2 1 2 y Class 1 (1, 2, 1) (1.5, 1, 1) (1,0, 1) Class 2 (1, 2, 1) (2.5, 1, 1) (2,0, 1) x1 x2 x3 (0,0,0) 1 2 3 ( ) 2 2 0 g x x x x     0  x wT ( ) ( ) ( ) 1 2 3 sgn( ) ( , , ) k T k k T y d w w w    w x w Goal:
  • 28. Augmented input vector x1 x2 x3= 1 2 1 2 y Class 1 (1, 2, 1) (1.5, 1, 1) (1,0, 1) Class 2 (1, 2, 1) (2.5, 1, 1) (2,0, 1) x1 x2 x3 (0,0,0) 1 2 3 ( ) 2 2 0 g x x x x     0  x wT ( ) ( ) ( ) 1 2 3 sgn( ) ( , , ) k T k k T y d w w w    w x w Goal: A plane passes through the origin in the augmented input space.
  • 29. Linearly Separable vs. Linearly Non-Separable 0 1 1 0 1 1 0 1 1 AND OR XOR Linearly Separable Linearly Separable Linearly Non-Separable
  • 30. Goal  Given training sets T1C1 and T2  C2 with elements in form of x=(x1, x2 , ..., xm-1 , xm)T , where x1, x2 , ..., xm-1 R and xm= 1.  Assume T1 and T2 are linearly separable.  Find w=(w1, w2 , ..., wm)T such that        2 1 1 1 ) sgn( T T T x x x w
  • 31. Goal  Given training sets T1C1 and T2  C2 with elements in form of x=(x1, x2 , ..., xm-1 , xm)T , where x1, x2 , ..., xm-1 R and xm= 1.  Assume T1 and T2 are linearly separable.  Find w=(w1, w2 , ..., wm)T such that        2 1 1 1 ) sgn( T T T x x x w wT x = 0 is a hyperplain passes through the origin of augmented input space.
  • 32. Observation x1 x2 +  d = +1 d = 1 + w1 w2 w3 w4 w5 w6 x Which w’s correctly classify x? What trick can be used?
  • 33. Observation x1 x2 +  d = +1 d = 1 + w1 x1 +w2 x2 = 0 w x Is this w ok? 0 T  w x
  • 34. Observation x1 x2 +  d = +1 d = 1 + w 1 x 1 + w 2 x 2 = 0 w x Is this w ok? 0 T  w x
  • 35. Observation x1 x2 +  d = +1 d = 1 + w 1 x 1 + w 2 x 2 = 0 w x Is this w ok? How to adjust w? 0 T  w x w = ? ? ?
  • 36. Observation x1 x2 +  d = +1 d = 1 + w x Is this w ok? How to adjust w? 0 T  w x w = x ( )T T T     w x x w x w x reasonable? <0 >0
  • 37. Observation x1 x2 +  d = +1 d = 1 + w x Is this w ok? How to adjust w? 0 T  w x w = x ( )T T T     w x x w x w x reasonable? <0 >0
  • 38. Observation x1 x2 +  d = +1 d = 1  w x Is this w ok? 0 T  w x w = ? +x or x
  • 39. Perceptron Learning Rule + d = +1  d = 1 Upon misclassification on    w x    w x Define error r d y   2 2 0         +   + No error 0  
  • 40. Perceptron Learning Rule r    w x Define error r d y   2 2 0         +   + No error
  • 41. Perceptron Learning Rule r    w x Learning Rate Error (d  y) Input
  • 42. Summary  Perceptron Learning Rule Based on the general weight learning rule. ( ) ( ) i i i x t w t r    i i i r d y   ( ( ) ( ) ) i i i i w t y t d x     0 2 1, 1 2 1, 1 i i i i i i d y d y d y             incorrect correct
  • 43. Summary  Perceptron Learning Rule x y  ( )     d y w x  . . . . . . Converge? d +
  • 44. x y  ( )     d y w x  . . . . . . d + Perceptron Convergence Theorem  Exercise: Reference some papers or textbooks to prove the theorem. If the given training set is linearly separable, the learning process will converge in a finite number of steps.
  • 45. The Learning Scenario x1 x2 + x(1) + x(2)  x(3)  x(4) Linearly Separable.
  • 46. The Learning Scenario x1 x2 w0 + x(1) + x(2)  x(3)  x(4)
  • 47. The Learning Scenario x1 x2 w0 + x(1) + x(2)  x(3)  x(4) w1 w0
  • 48. The Learning Scenario x1 x2 w0 + x(1) + x(2)  x(3)  x(4) w1 w0 w2 w1
  • 49. The Learning Scenario x1 x2 w0 + x(1) + x(2)  x(3)  x(4) w1 w0 w1 w2 w2 w3
  • 50. The Learning Scenario x1 x2 w0 + x(1) + x(2)  x(3)  x(4) w1 w0 w1 w2 w2 w3 w4 = w3
  • 51. The Learning Scenario x1 x2 + x(1) + x(2)  x(3)  x(4) w
  • 52. The Learning Scenario x1 x2 + x(1) + x(2)  x(3)  x(4) w The demonstration is in augmented space. Conceptually, in augmented space, we adjust the weight vector to fit the data.
  • 53. Weight Space w1 w2 + x A weight in the shaded area will give correct classification for the positive example. w
  • 54. Weight Space w1 w2 + x A weight in the shaded area will give correct classification for the positive example. w w = x
  • 55. Weight Space w1 w2  x A weight not in the shaded area will give correct classification for the negative example. w
  • 56. Weight Space w1 w2  x A weight not in the shaded area will give correct classification for the negative example. w w = x
  • 57. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4)
  • 58. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area.
  • 59. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w1
  • 60. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w1 w2
  • 61. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w1 w2=w3
  • 62. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w1 w2=w3 w4
  • 63. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w1 w2=w3 w4 w5
  • 64. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w1 w2=w3 w4 w5 w6
  • 65. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w1 w2=w3 w4 w5 w6 w7
  • 66. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w1 w2=w3 w4 w5 w6 w7 w8
  • 67. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w1 w2=w3 w4 w5 w6 w7 w8 w9
  • 68. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w1 w2=w3 w4 w5 w6 w7 w8 w9 w10
  • 69. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w1 w2=w3 w4 w5 w6 w7 w8 w9 w10 w11
  • 70. The Learning Scenario in Weight Space w1 w2 + x(1) + x(2)  x(3)  x(4) To correctly classify the training set, the weight must move into the shaded area. w0 w11 Conceptually, in weight space, we move the weight into the feasible region.
  • 71. Feed-Forward Neural Networks Learning Rules for Single-Layered Perceptron Networks  Perceptron Learning Rule  Adaline Leaning Rule  -Learning Rule
  • 72. Adaline (Adaptive Linear Element) Widrow [1962] ( ) T k i w x x1 x2 xm wi1 wi2 wim . . .  ( ) ( ) k T k i i y w x
  • 73. Adaline (Adaptive Linear Element) Widrow [1962] ( ) T k i w x x1 x2 xm wi1 wi2 wim . . .  ( ) ( ) k T k i i y w x ( ) ( ) ( ) 1,2, , 1,2, , k T k k i i i y i d n k p     w x   Goal: In what condition, the goal is reachable?
  • 74. LMS (Least Mean Square) Minimize the cost function (error function): ( 2 1 ( ) ) 1 ( ) ( ) 2 k p k k y E d     w ( ( ) ) 2 1 1 ( ) 2 T k p k k d     x w ( ) 1 1 ( 2 ) 1 2 p m l l k k k l x w d            
  • 75. Gradient Decent Algorithm E(w) w1 w2 Our goal is to go downhill. Contour Map (w1, w2) w1 w2 w
  • 76. Gradient Decent Algorithm E(w) w1 w2 Our goal is to go downhill. Contour Map (w1, w2) w1 w2 w How to find the steepest decent direction?
  • 77. Gradient Operator Let f(w) = f (w1, w2,…, wm) be a function over Rm . 1 2 1 2 m m f f f w w dw dw df w w d            Define   1 2 , , , T m dw dw dw  w  , df f f       w w 1 2 , , , T m f f f f w w w              
  • 78. Gradient Operator f w f w f w df : positive df : zero df : negative Go uphill Plain Go downhill , df f f       w w
  • 79. f w f w f w The Steepest Decent Direction df : positive df : zero df : negative Go uphill Plain Go downhill , df f f       w w To minimize f , we choose w =   f
  • 80. LMS (Least Mean Square) Minimize the cost function (error function): ( ) ( ) 2 1 1 1 ( ) 2 p m k l l k l k d x E w             w ( ) 1 ( ) ( 1 ) ( ) p m k l k k k l j j l E w w d x x               w   ( ( ) ( 1 ) ) k k p T k j k x d     x w   ( ) 1 ( ) ) ( p k k k j k y d x     ( ) 1 ( ) ( ) k p k k j j E w x       w (k) ( ) ( ) ( ) k k k d y   
  • 81. Adaline Learning Rule Minimize the cost function (error function): ( ) ( ) 2 1 1 1 ( ) 2 p m k l l k l k d x E w             w 1 2 ( ) ( ) ( ) ( ) , , , T w m E E E E w w w              w w w w  ( ) E     w w w Weight Modification Rule ( ) 1 ( ) ( ) k p k k j j E w x       w ( ) ( ) ( ) k k k d y   
  • 82. Learning Modes  Batch Learning Mode:  Incremental Learning Mode: ( ( ) ) 1 p k k k j j x w       ( ( ) ) k k j j x w     ( ) 1 ( ) ( ) k p k k j j E w x       w ( ) ( ) ( ) k k k d y   
  • 83. Summary  Adaline Learning Rule x y     w δx  . . . . . . d +  -Learning Rule LMS Algorithm Widrow-Hoff Learning Rule Converge?
  • 84. LMS Convergence Based on the independence theory (Widrow, 1976). 1. The successive input vectors are statistically independent. 2. At time t, the input vector x(t) is statistically independent of all previous samples of the desired response, namely d(1), d(2), …, d(t1). 3. At time t, the desired response d(t) is dependent on x(t), but statistically independent of all previous values of the desired response. 4. The input vector x(t) and desired response d(t) are drawn from Gaussian distributed populations.
  • 85. LMS Convergence It can be shown that LMS is convergent if max 2 0     where max is the largest eigenvalue of the correlation matrix Rx for the inputs. 1 1 lim T i i n i n       x R x x
  • 86. LMS Convergence It can be shown that LMS is convergent if max 2 0     where max is the largest eigenvalue of the correlation matrix Rx for the inputs. 1 1 lim T i i n i n       x R x x Since max is hardly available, we commonly use 2 0 ( ) tr    x R
  • 87. Comparisons Fundamental Hebbian Assumption Gradient Decent Convergence In finite steps Converge Asymptotically Constraint Linearly Separable Linear Independence Perceptron Learning Rule Adaline Learning Rule (Widrow-Hoff)
  • 88. Feed-Forward Neural Networks Learning Rules for Single-Layered Perceptron Networks  Perceptron Learning Rule  Adaline Leaning Rule  -Learning Rule
  • 89. Adaline ( ) T k i w x x1 x2 xm wi1 wi2 wim . . .  ( ) ( ) k T k i i y w x
  • 90. Unipolar Sigmoid ( ) T k i w x x1 x2 xm wi1 wi2 wim . . .  i net i e net a     1 1 ) (   ( ) ( ) k T k i i y a  w x
  • 91. Bipolar Sigmoid ( ) T k i w x x1 x2 xm wi1 wi2 wim . . .    ( ) ( ) k T k i i y a  w x 1 1 2 ) (     i net i e net a 
  • 92. Goal Minimize ( 2 1 ( ) ) 1 ( ) ( ) 2 k p k k y E d     w ( 2 ) 1 ( ) 1 ( ) 2 k T p k k d a         w x
  • 93. Gradient Decent Algorithm Minimize ( 2 1 ( ) ) 1 ( ) ( ) 2 k p k k y E d     w ( ) E     w w w 1 2 ( ) ( ) ( ) ( ) , , , T w m E E E E w w w              w w w w  ( 2 ) 1 ( ) 1 ( ) 2 k T p k k d a         w x
  • 94. The Gradient Minimize ( 2 1 ( ) ) 1 ( ) ( ) 2 k p k k y E d     w   ) ( ( ) k T k a y  w x ( ) ( 1 ) ( ) ( ) ( ) k k j p k j k d y y w E w         w 1 2 ( ) ( ) ( ) ( ) , , , T w m E E E E w w w              w w w w    ( ) ( ) ( ) ( ( ) 1 ) ( ) k k k p k j k k net net net y w a d         ( ) ( ) k T k net  x w ( 1 ) k m i i i x w   ? ? ( ) ( ) j k k j w net x     Depends on the activation function used.
  • 95. Weight Modification Rule Minimize ( 2 1 ( ) ) 1 ( ) ( ) 2 k p k k y E d     w   1 ( ) ( ) ( ) ( ) ) ( ( ) ( ) p k k k k j j k k net x a E y w ne d t         w 1 2 ( ) ( ) ( ) ( ) , , , T w m E E E E w w w              w w w w    ( ( ) ) k k net y a    ( 1 ) ( ( ) ( ) ) k k k p k j j k ne w a t x net         ( ) ( ) ( ) k k k d y    Learning Rule Batch   ) ( ) ( ( ) ( ) k k k j j k net net a x w      Incremental
  • 96. The Learning Efficacy Minimize ( 2 1 ( ) ) 1 ( ) ( ) 2 k p k k y E d     w 1 2 ( ) ( ) ( ) ( ) , , , T w m E E E E w w w              w w w w  Adaline Sigmoid Unipolar Bipolar ( ) net a net  1 ( ) 1 net a e net     2 ( ) 1 1 net a n e et      ( ) 1 a net net    ( ) ( ) (1 ) ( ) k k net net y y a      Exercise   ( ( ) ) k k net y a    1 ( ) ( ) ( ) ( ) ) ( ( ) ( ) p k k k k j j k k net x a E y w ne d t         w
  • 97. Learning Rule  Unipolar Sigmoid Minimize ( 2 1 ( ) ) 1 ( ) ( ) 2 k p k k y E d     w ( ) ( ) ( ) ) ( ) 1 ( ( ( ) ) 1 ( ) k k j k k p k j k y E x d y y w         w 1 2 ( ) ( ) ( ) ( ) , , , T w m E E E E w w w              w w w w  ( ) ) ) ( ( ( ) 1 (1 ) k k k k k j p y y x       ( ) ( ) ( ) k k k d y    ( ( ) ) ( ) ( 1 ) (1 ) k j k p k k k j x w y y         Weight Modification Rule
  • 98. ( ) ( ) (1 ) k k y y   Comparisons ( ( ) ) ( ) ( 1 ) (1 ) k j k p k k k j x w y y         Adaline Sigmoid Batch Incremental ( ( ) ) 1 p k k j j k w x       ( ( ) ) k k j j x w     Batch Incremental ( ) ) ( ) ) ( ( (1 ) j k k k k j w y y x      
  • 99. The Learning Efficacy net y=a(net) = net net y=a(net) Adaline Sigmoid ( ) 1 a net net    ) ) ( (1 net ne y a t y      constant depends on output
  • 100. The Learning Efficacy net y=a(net) = net net y=a(net) Adaline Sigmoid ( ) 1 a net net    ) ) ( (1 net ne y a t y      constant depends on output The learning efficacy of Adaline is constant meaning that the Adline will never get saturated. y=net a’(net) 1
  • 101. The Learning Efficacy net y=a(net) = net net y=a(net) Adaline Sigmoid ( ) 1 a net net    ) ) ( (1 net ne y a t y      constant depends on output The sigmoid will get saturated if its output value nears the two extremes. y a’(net) 1 0 (1 ) y y  
  • 102. Initialization for Sigmoid Neurons i net i e net a     1 1 ) ( wi1 wi2 wim x1 x2 xm . . . ( ) T k i w x    ( ) ( ) k T k i i y a  w x Before training, it weight must be sufficiently small. Why?
  • 104. Multilayer Perceptron . . . . . . . . . . . . x1 x2 xm y1 y2 yn Hidden Layer Input Layer Output Layer
  • 106. How an MLP Works? XOR 0 1 1 x1 x2 Example:  Not linearly separable.  Is a single layer perceptron workable?
  • 107. How an MLP Works? 0 1 1 XOR x1 x2 Example: L1 L2 00 01 11 L2 L1 x1 x2 x3= 1 y1 y2
  • 108. How an MLP Works? 0 1 1 XOR x1 x2 Example: L1 L2 00 01 11 0 1 1 y1 y2 L3
  • 109. How an MLP Works? 0 1 1 XOR x1 x2 Example: L1 L2 00 01 11 0 1 1 y1 y2 L3
  • 110. How an MLP Works? 0 1 1 y1 y2 L3 L2 L1 x1 x2 x3= 1 L3 y1 y2 y3= 1 z Example:
  • 113. Parity Problem 0 1 1 0 1 0 0 000 001 010 011 100 101 110 111 1 x1 x2 x3 x1 x2 x3 P1 P2 P3 111 011 001 000
  • 116. Parity Problem y1 y2 y3 P4 P1 x1 x2 x3 P2 P3 y1 z y3 P4 y2
  • 121. Hyperspace Partition & Region Encoding Layer 101 L1 L2 L3 001 000 010 110 111 100 L1 x1 x2 x3 L2 L3
  • 122. 101 Region Identification Layer 101 L1 L2 L3 001 000 010 110 111 100 L1 x1 x2 x3 L2 L3
  • 123. 001 Region Identification Layer 101 L1 L2 L3 001 000 010 110 111 100 L1 x1 x2 x3 L2 L3
  • 124. 000 Region Identification Layer 101 L1 L2 L3 001 000 010 110 111 100 L1 x1 x2 x3 L2 L3
  • 125. 110 Region Identification Layer 101 L1 L2 L3 001 000 010 110 111 100 L1 x1 x2 x3 L2 L3
  • 126. 010 Region Identification Layer 101 L1 L2 L3 001 000 010 110 111 100 L1 x1 x2 x3 L2 L3
  • 127. 100 Region Identification Layer 101 L1 L2 L3 001 000 010 110 111 100 L1 x1 x2 x3 L2 L3
  • 128. 111 Region Identification Layer 101 L1 L2 L3 001 000 010 110 111 100 L1 x1 x2 x3 L2 L3
  • 129. Classification 101 L1 L2 L3 001 000 010 110 111 100 L1 x1 x2 x3 L2 L3 001 101 101 001 000 110 010 100 111 0 1 1 1 0
  • 130. Feed-Forward Neural Networks Back Propagation Learning algorithm
  • 131. Activation Function — Sigmoid net e net a y      1 1 ) ( net net e e net a                  ) ( 1 1 ) ( 2 y y e net    1  ) 1 ( ) ( y y net a     net 1 0.5 0 Remember this
  • 132. Supervised Learning . . . . . . . . . . . . x1 x2 xm o1 o2 on Hidden Layer Input Layer Output Layer   ) , ( , ), , ( ), , ( ) ( ) ( ) 2 ( ) 2 ( ) 1 ( ) 1 ( p p d x d x d x T   Training Set d1 d2 dn
  • 133. Supervised Learning . . . . . . . . . . . . x1 x2 xm o1 o2 on d1 d2 dn    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1 Sum of Squared Errors Goal: Minimize   ) , ( , ), , ( ), , ( ) ( ) ( ) 2 ( ) 2 ( ) 1 ( ) 1 ( p p d x d x d x T   Training Set
  • 134. Back Propagation Learning Algorithm . . . . . . . . . . . . x1 x2 xm o1 o2 on d1 d2 dn    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1  Learning on Output Neurons  Learning on Hidden Neurons
  • 135. Learning on Output Neurons . . . j . . . . . . i . . . o1 oj on d1 dj dn . . . . . . . . . . . . wji             p l ji l p l l ji ji w E E w w E 1 ) ( 1 ) ( ji l j l j l ji l w net net E w E        ) ( ) ( ) ( ) ( ) ( ) ( ) ( l j l j net a o  ) ( ) ( l i ji l j o w net   ? ?    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1
  • 136. Learning on Output Neurons . . . j . . . . . . i . . . o1 oj on d1 dj dn . . . . . . . . . . . . wji    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1 ) ( ) ( ) ( ) ( ) ( ) ( l j l j l j l l j l net o o E net E                    p l ji l p l l ji ji w E E w w E 1 ) ( 1 ) ( ji l j l j l ji l w net net E w E        ) ( ) ( ) ( ) ( ) ( ) ( ) ( l j l j net a o  ) ( ) ( l i ji l j o w net   ) ( ) ( ) ( l j l j o d   depends on the activation function
  • 137. Learning on Output Neurons . . . j . . . . . . i . . . o1 oj on d1 dj dn . . . . . . . . . . . . wji    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1 ) ( ) ( ) ( ) ( ) ( ) ( l j l j l j l l j l net o o E net E                    p l ji l p l l ji ji w E E w w E 1 ) ( 1 ) ( ji l j l j l ji l w net net E w E        ) ( ) ( ) ( ) ( ) ( ) ( ) ( l j l j net a o  ) ( ) ( l i ji l j o w net   ( ) ( ) ( ) l l j j d o   ( ) ( ) (1 ) l l j j o o   Using sigmoid,
  • 138. Learning on Output Neurons . . . j . . . . . . i . . . o1 oj on d1 dj dn . . . . . . . . . . . . wji    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1 ) ( ) ( ) ( ) ( ) ( ) ( l j l j l j l l j l net o o E net E                    p l ji l p l l ji ji w E E w w E 1 ) ( 1 ) ( ji l j l j l ji l w net net E w E        ) ( ) ( ) ( ) ( ) ( ) ( ) ( l j l j net a o  ) ( ) ( l i ji l j o w net   ( ) ( ) ( ) l l j j d o   ( ) ( ) (1 ) l l j j o o   Using sigmoid, ( ) l j  ( ( ) ( ) ( ) ( ( ) ) ) ) ( ( ) (1 ) l l l j l j j l l l j j j E net o o d o         
  • 139. Learning on Output Neurons . . . j . . . . . . i . . . o1 oj on d1 dj dn . . . . . . . . . . . . wji    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1             p l ji l p l l ji ji w E E w w E 1 ) ( 1 ) ( ji l j l j l ji l w net net E w E        ) ( ) ( ) ( ) ( ) ( ) ( ) ( l j l j net a o  ) ( ) ( l i ji l j o w net   ) (l i o ( ) ( ) ( ) l l i ji l j E o w     ( ( ) ) ( ( ) ( ) ) (1 ( ) ) l l j l l j l j i j d o o o o     
  • 140. Learning on Output Neurons . . . j . . . . . . i . . . o1 oj on d1 dj dn . . . . . . . . . . . . wji    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1             p l ji l p l l ji ji w E E w w E 1 ) ( 1 ) ( ji l j l j l ji l w net net E w E        ) ( ) ( ) ( ) ( ) ( ) ( ) ( l j l j net a o  ) ( ) ( l i ji l j o w net   ) (l i o ( ) ( ) ( ) l l i ji l j E o w     ( ( ) ) ( ( ) ( ) ) (1 ( ) ) l l j l l j l j i j d o o o o      ( ) ( ) 1 p l i l l j ji E o w       ) ( 1 ) ( l i p l l j ji o w        How to train the weights connecting to output neurons?
  • 141. Learning on Hidden Neurons    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1 . . . j . . . k . . . i . . . . . . . . . . . . . . . wik wji             p l ik l p l l ik ik w E E w w E 1 ) ( 1 ) ( ik l i l i l ik l w net net E w E        ) ( ) ( ) ( ) ( ? ?
  • 142. Learning on Hidden Neurons    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1 . . . j . . . k . . . i . . . . . . . . . . . . . . . wik wji             p l ik l p l l ik ik w E E w w E 1 ) ( 1 ) ( ik l i l i l ik l w net net E w E        ) ( ) ( ) ( ) ( ) (l i  ) (l k o
  • 143. Learning on Hidden Neurons    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1 . . . j . . . k . . . i . . . . . . . . . . . . . . . wik wji             p l ik l p l l ik ik w E E w w E 1 ) ( 1 ) ( ik l i l i l ik l w net net E w E        ) ( ) ( ) ( ) ( ) (l i  ) (l k o ) ( ) ( ) ( ) ( ) ( ) ( l i l i l i l l i l net o o E net E        ( ) ( ) (1 ) l l i i o o   ?
  • 144. Learning on Hidden Neurons    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1 . . . j . . . k . . . i . . . . . . . . . . . . . . . wik wji             p l ik l p l l ik ik w E E w w E 1 ) ( 1 ) ( ik l i l i l ik l w net net E w E        ) ( ) ( ) ( ) ( ) (l i  ) (l k o ) ( ) ( ) ( ) ( ) ( ) ( l i l i l i l l i l net o o E net E                j l i l j l j l l i l o net net E o E ) ( ) ( ) ( ) ( ) ( ) ( ) (l j  ji w ( ) ( ) ( ) ( ) ( ) ( ) (1 ) l l l l l i i i ji j l j i E o o w net          ( ) ( ) (1 ) l l i i o o  
  • 145. Learning on Hidden Neurons    p l l E E 1 ) (       n j l j l j l o d E 1 2 ) ( ) ( ) ( 2 1 . . . j . . . k . . . i . . . . . . . . . . . . . . . wik wji             p l ik l p l l ik ik w E E w w E 1 ) ( 1 ) ( ik l i l i l ik l w net net E w E        ) ( ) ( ) ( ) ( ) (l k o ) ( 1 ) ( l k p l l i ik o w E       ) ( 1 ) ( l k p l l i ik o w        ( ) ( ) ( ) ( ) ( ) ( ) (1 ) l l l l l i i i ji j l j i E o o w net         
  • 146. Back Propagation o1 oj on . . . j . . . k . . . i . . . d1 dj dn . . . . . . . . . . . . x1 xm . . .
  • 147. Back Propagation o1 oj on . . . j . . . k . . . i . . . d1 dj dn . . . . . . . . . . . . x1 xm . . . ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (1 ) l l l l l l j j j j j l j E d o o o net         ) ( 1 ) ( l i p l l j ji o w       
  • 148. Back Propagation o1 oj on . . . j . . . k . . . i . . . d1 dj dn . . . . . . . . . . . . x1 xm . . . ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (1 ) l l l l l l j j j j j l j E d o o o net         ) ( 1 ) ( l i p l l j ji o w        ( ) ( ) ( ) ( ) ( ) ( ) (1 ) l l l l l i i i ji j l j i E o o w net          ) ( 1 ) ( l k p l l i ik o w       
  • 149. Learning Factors • Initial Weights • Learning Constant () • Cost Functions • Momentum • Update Rules • Training Data and Generalization • Number of Layers • Number of Hidden Nodes
  • 150. Reading Assignments  Shi Zhong and Vladimir Cherkassky, “Factors Controlling Generalization Ability of MLP Networks.” In Proc. IEEE Int. Joint Conf. on Neural Networks, vol. 1, pp. 625- 630, Washington DC. July 1999. (http://www.cse.fau.edu/~zhong/pubs.htm)  Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986b). "Learning Internal Representations by Error Propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. I, D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. MIT Press, Cambridge (1986). (http://www.cnbc.cmu.edu/~plaut/85-419/papers/RumelhartETAL86.backprop.pdf).
  翻译: