SlideShare a Scribd company logo
1

Mapping Data to Processors
2
 Processor arrays and multi-computers are characterized by a non-
uniform memory structure:
 each processor is able to get data from local memory much faster
than from nonlocal memory.
 When designing algorithms for these machines, it makes sense to
have processors manipulate local data as much as possible.
 For this reason, the distribution of parallel data structures often
dictates which processor is responsible for performing a particular
operation.

 An algorithm's data manipulation patterns can be represented as a
graph:
 each vertex represents a data subset allocated to the same local memory,
 and each edge represents a computation involving data from two data
sets.
 These graphs are often regular.
An important goal of a parallel algorithm designer is to map
the algorithm graph into the corresponding graph of the target
machine's processor organization (Bokhari 1981).
Mapping Data to Processors
3
Performance may suffer if the algorithm graph is not
a sub-graph of the parallel architecture's processor
organization.
 On a multicomputer two connected vertices in the
algorithm graph map to a pair of vertices distance
two apart on the machine.
 Passing a message from one processor to the other
requires roughly twice the time it takes to pass a
message between adjacent processors.
Mapping Data to Processors
4
 A parallel algorithm is implemented on a
multicomputer with circuit-switched routing.
 Different edges in the algorithm graph map to a
shared link on the machine.
 If simultaneous communications occur between
both pairs of nodes, the speed of the communication
will be affected by the shared use of the same link.
Mapping Data to Processors
5
 On some systems one message would be blocked until the other message had finished.
 On other systems the physical link would be multiplexed between the two virtual links.
Cutting the bandwidth of each link in half.
In either case, performance is lower because each message does not have exclusive use of a
communication path.

Mapping Data to Processors
6
• An embedding of a graph G = (V, E) into a graph G' = (V', E') is a
function  from V to V'.
Definition 5.1:
• Let  be the function that embeds graph G = (V, E) into graph
G' = (V'. E'). The dilation of the embedding is defined as follows :
dil() =max{dist( (u),  (v))} where dist(a, b) is the distance between
vertices a and b in G'.
Definition 5.2:

Mapping Data to Processors
7
There exists a dilation-I embedding of G into G' if G is a sub-graph of G'.
dilation-3 Embedding of G into G’ dilation-1 Embedding

• The load of an embedding  : G  G' is
the maximum number of vertices of G
that are mapped to a single vertex of G‘.
Definition 5.3:
8
Mapping Data to Processors

9

 The ring and the mesh have the same number of vertices.
 A dilation-1 embedding exists if the mesh has an even number of row
and/or columns.
 A mesh with an odd number of rows and columns has no way to
embed a ring in such a mesh without increasing the dilation.
10
Embedding a Ring into 2D-Mesh
0 1
2
3
4
5
6
7
8
91011
12
13
14
15
16
17
18
19

Embedding a 2D-Mesh into 2D-Mesh
11
Ar : denote the number of rows in the algorithm graph
Ac : denote the number of columns in the algorithm graph
Mr : denote the number of row in the machine graph
Mc : denote the number of columns in the machine graph.
The algorithm graph can be embedded with dilation-1 in the
machine graph if and only if
Ar <= Mr and Ac <= Mc
Ac <= Mr and Ar <= Mc

Embedding a 2D-Mesh into 2D-Mesh
12
Ar <= Mr and Ac <= Mc
Ac <= Mr and Ar <= Mc

Complete Binary Tree into 2D-Mesh
13
dilation-1 embedding of a complete binary tree of height 3 into a 2-D mesh.
129
8 64 13
32 1
10 75 15
1411

Complete Binary Tree into 2D-Mesh
14
• A complete binary tree of height greater than 4 cannot be embedded in
a 2-D mesh without increasing the dilation beyond 1.
Theorem 5.1.
• The total no. of mesh points in k or fewer jumps away from an arbitrary
point in a 2D-Mesh is 2k2+2k+1.
• The total no. of nodes in a complete binary tree of depth k is 2k+1 – 1.
• 2k+1 – 1 > 2k2 + 2k + 1 for k > 4
Proof:

Complete Binary Tree into 2D-Mesh
 The H-tree is a common way of embedding a complete binary tree into
a 2-D mesh. The name H-tree arises from the recursive "H" pattern
used to construct a large H-tree out of four small H-trees.
15

Complete Binary Tree into 2D-Mesh
16
• A complete binary tree of height n has a dilation 𝑛/2 embedding in a
2-D mesh.
Theorem 5.2.
• We use H-trees to map binary trees to nodes of a 2-D Mesh. The
longest edges in an H-tree are the edges from the root to its two
children. These edges have the same length. The length of the root
edges in an H-tree of height n is 𝑛/2 .
Proof:

Binomial Tree into 2D-Mesh
01234
Construction of Binomial Tree of height k using two B-Tree of height k-1

18
Binomial Tree into 2D-Mesh
1
23
4
5
67
8
9
1011
12
13
1415
16
13
2
5
4 6
7
8
1012
9
14
11 13
16
15
dilation-1 embedding of a binomial tree of height 4 into a 2-D mesh.

• A binomial tree of height greater than 4 cannot be embedded in 2-D mesh
without increasing the dilation beyond 1.
Theorem 5.3.
• The root node of a binomial tree of height d tree is connected to d other
nodes. No node in a 2-D mesh has more than 4 neighbours. Hence a
binomial tree of height greater than 4 cannot be embedded in a 2-D mesh
without increasing the dilation beyond 1.
Proof:
19
Binomial Tree into 2D-Mesh

• A binomial tree of height n has a dilation 𝑛/2
embedding in a 2-D mesh.
Theorem 5.4.
• The construction is illustrated in Fig.
Proof:
20
Binomial Tree into 2D-Mesh

21
Binomial Tree into 2D-Mesh

22

• A graph G is cubical if there is a dilation-1 embedding of G
into a hypercube.
Definition 5.4.
• The problem of determining whether an arbitrary graph G is
cubical is NP- Complete (Afrati et al. 1985; Cybenko et al. 1986).
Theorem 5.5.
23
Embedding Graphs Into Hypercubes

Theorem 5.6. : A dilation-1 embedding of a connected graph G into a
hypercube with n nodes exists if and only if it is possible to label the
edges of G with the integers (1 ,.... n) such that:
1. Edges incident with a common vertex have different labels.
2. In every path of G at least one label appears an odd number of times.
3. In every cycle of G no label appears an odd number of times.
24
Embedding Graphs Into Hypercubes

25
Embedding Graphs Into Hypercubes

• A dilation-1 embedding of a complete binary tree of height n into a
hypercube of dimension n + 1 does not exist if n > 1.
Theorem 5.7.
• A complete binary tree of height n has 2n+1 - 1 nodes.
• A hypercube of dimension n + 1 has 2n+1 nodes.
• Let root of the tree mapped to node X of the cube and height of the tree is k.
Proof:
26
Complete Binary Tree Into Hypercube

• Hypercube is a bipartite graph.
• Half the nodes can be reached only by following an even number of edges from X.
• Half the nodes can be reached only by following an odd number of edges from X.
• If k is odd, then more than half the nodes of the binary tree are an even distance away
from the root of the binary tree.
• If k is even, then more than half the nodes of the binary tree are an odd distance away
from the root of the binary tree.
• In either case, there is no way to embed the binary tree into the hypercube and keep the
dilation at 1, because there are not enough hypercube nodes to accommodate the leaves
of the tree while maintaining parity (odd or even) with the interior nodes.
Proof:
27
Complete Binary Tree Into Hypercube

• A balanced binary tree with of height n has a dilation-1
embedding into a hypercube of dimension n + 2 (see Nebesky 1974).
Theorem 5.8.
• A complete binary tree of height n has a dilation-2 embedding in
hypercube of dimension n + 1, for all n > 1 (see Leighton 1992).
Theorem 5.9.
28
Complete Binary Tree Into Hypercube

• A binomial tree of height n can be embedded in a hypercube of dimension n such that the
dilation is 1.
Theorem 5.10.
• Organize the sub-tree so that the nodes that are roots of larger sub-trees appear to the left of
nodes that are the roots of smaller sub-trees.
• Give the edge to the leftmost child of the root node the label 1,
• the edge to the second child of the root node the label 2, and so on, to the edge to the last child,
which gets label n.
• For all remaining interior nodes of the tree, if the edge above the node has label i then the
edges to the k children of the node should be given labels i + 1, i + 2,.... i + k, from left to right.
Proof:
29
Binomial Tree Into Hypercube

30
Binomial Tree Into Hypercube

31
Binomial Tree Into Hypercube
0
12
3
4
56
7
6 7
54
2 3
10

32
Ring Into Hypercube
6 7
54
2 3
10
0
1
2
3
4
5
6
7

 Assume the hypercube contains p = 2d processors.
 Let G(i) be the number of the processor assigned to ring
position i, where 0<=i<p.
 G(0) = 0, G(1) = 1, G(2) = 2 ,.... G(p -1) = p -1 will not work
 The encoding must have the following properties:
 1. The values must be unique, in other words, G(i) = G(j), i=j.
 2. G(i) and G(i+1) must differ in exactly one bit position, for
all i, where 0<=i<p.
 3. G(p-1) and G(0) must differ in exactly one bit position.
33
Ring Into Hypercube
6 7
54
2 3
10
0
1
2
3
4
5
6
7

• Gray code is a binary numeral system where two successive values differ in only
one bit position.
• There are many possible n-bit Gray codes.
Gray Code
• Longer Gray codes can be constructed from shorter Gray codes.
• Given a d-bit Gray code a (d + 1)-bit Gray code can be constructed by listing the
d-bit Gray code with the prefix 0, followed by the d-bit Gray code in reverse
order with the prefix 1.
Gray Code Generation
34
Ring Into Hypercube

 A one bit Gray code is {0,1}
 G(0) = 0 and
 G(1) = 1
35
Gray Code Generation
10
0 1

0
0
36
Gray Code Generation
0
1
1
0
1
1
G(0) = 0
G(1) = 1
G(2) = 3
G(3) = 2
2,3 3,2
1,10,0
2
1
0
3
 A two bit Gray code can be generated from two 1-bit Gray Code

37
Gray Code Generation
0
0
0
1
1
0
1
1
1
1
0
1
1
0
0
0
0
0
0
0
1
1
1
1
G(0) = 0
G(1) = 1
G(2) = 3
G(3) = 2
G(4) = 6
G(5) = 7
G(6) = 5
G(7) = 4
6,4 7,5
5,64,7
2,3 3,2
1,10,0
0
1
2
3
4
5
6
7
 A three bit Gray code can be generated from two 2-bit Gray Code

38
Gray Code Generation
G(0) = 0
G(1) = 1
G(2) = 3
G(3) = 2
G(4) = 6
G(5) = 7
G(6) = 5
G(7) = 4
0
1
2
3
4
5
6
7G-1 (0) = 0
G-1 (1) = 1
G-1 (2) = 3
G-1 (3) = 2
G-1 (4) = 7
G-1 (5) = 6
G-1 (6) = 4
G-1 (7) = 5
successor(0) = 1
successor(1) = 3
successor(2) = 6
successor(3) = 2
successor(4) = 0
successor(5) = 4
successor(6) = 7
successor(7) = 5
0-1-3-2-6-7-5-4-0
Successor(i) = ?
X = G-1 (i)
Y = X + 1
Successor(i) = G(Y)
= G(X+1)
= G(G-1 (i) + 1)
6,4 7,5
5,64,7
2,3 3,2
1,10,0

Gray Code
G(i)
Input: Ring Node
Output:
Hypercube Node
Inverse Gray Code
G-1(i)
Input: Hypercube
Node
Output: Ring Node
G-1(i) = j if and
only if G(j) = i
Successor(i)
i is Hypercube
node
0 if i = 2d-1
G(G-1(i) + 1)
Otherwise
39
Gray Code Generation

40
Gray Code Generation
i i/2 i  i/2 G(i)
0: 000 0: 000 000: 0 G(0) = 0
1: 001 0: 000 001: 1 G(1) = 1
2: 010 1: 001 011: 3 G(2) = 3
3: 011 1: 001 010: 2 G(3) = 2
4: 100 2: 010 110: 6 G(4) = 6
5: 101 2: 010 111: 7 G(5) = 7
6: 110 3: 011 101: 5 G(6) = 5
7: 111 3: 011 100: 4 G(7) = 4

41
Gray-1 Code Generation
i Ans mask
0: 000 0: 000 000: 0
1: 001 1: 001 000: 0
2: 010 2: 010 001: 1
3: 011 000: 0
3: 011 3: 011 001: 1
2: 010 000: 0
4: 100 4: 100 010: 2
6: 110 001: 1
7: 111 000: 0
i Ans mask
5: 101 5: 101 010: 2
7: 111 001: 1
6: 110 000: 0
6: 110 6: 110 011: 3
5: 101 001: 1
4: 100 000: 0
7: 111 7: 111 011: 3
4: 100 001: 1
5: 101 000: 0
G-1 (0) = 0
G-1 (1) = 1
G-1 (2) = 3
G-1 (3) = 2
G-1 (4) = 7
G-1 (5) = 6
G-1 (6) = 4
G-1 (7) = 5

42
Mesh into Hypercube
 The use of Gray codes yields a straight
forward solution with the constraint that
the size of the mesh in each dimension must
be a power of 2.
 Each dimension of the mesh is assigned an
appropriate number of bit positions of the
encoding string.
 Traversing mesh nodes along that
dimension yields a cycle.
 Gray codes determine the values assigned to
the bit field.

43
Mesh into Hypercube
 For example. consider mapping a 4 x 8 mesh into a 32-node hypercube.
 Two bit positions are reserved for the row and three bits are set for the column.
 Let us assume that the first two bit positions are used for the row. The 2-bit Gray
code (00, 01, 11, 10} corresponds to a traversal through rows 0, 1, 2, and 3.
 The 3-bit Gray code {000, 001, 011, 010,110,111,
 101, 100} corresponds to a traversal through columns 0,1, 2, 3, 4, 5, 6, and 7.
 Hence we have the following mapping of a 4 x 8 mesh into a 32-node hypercube:

44
Mesh into Hypercube

• Any two-dimensional mesh with n
vertices can be embedded in a
𝑙𝑜𝑔𝑛 -dimensional hypercube with
dilation 2 (Leighton 1992).
Theorem 5.11.
45
Mesh into Hypercube

46

 In multicomputer it has been assumed that if data were
distributed evenly among the local memories of a
multicomputer's processors, the processors' workloads
would be balanced for the entire computation.
 In many cases this assumption is true but not always.
 If nothing is done to change the size of each processor's area
of responsibility the processors, workloads may become
severely imbalanced.
47
Dynamic Load Balancing in Multi-computer

• Dynamic load balancing is the process of making changes to
the distribution of work among the processors at run time.
• The measure of success of dynamic load balancing is the net
reduction of execution time achieved by applying the load
balancing algorithm.
• Dynamic load balancing may increase the execution time of the
parallel algorithm if the time spent performing the load
balancing is more than the time saved by reducing the variance
in the execution time of tasks on the various processors.
Dynamic Load Balancing
48
Dynamic Load Balancing in Multi-computer

49
Centralized Load Balancing Algorithms
 It make a global decision about the reallocation of
work to processors.
 Some centralized algorithms assign the maintenance
of the system's global state information to a particular
node.
 Global state information can allow the algorithm to do
a good job balancing the work among the processors.
 However, this approach does not scale well, because
the amount of information increases linearly with the
number of processors.

50
Fully Distributed Load Balancing Algorithms
 Each processor build its own view of the state of the system.
 Processors exchange information with neighbouring processors and
use this information to make local changes in the allocation of work.
 A fully distributed algorithm has the advantage of lower scheduling
overhead.
 However. since processors have only local state information, the
workload may not be balanced as well as it would be by centralized
algorithms.
51
Semi Distributed Load Balancing Algorithms
 A semi-distributed load balancing algorithm divides the
processors into regions.
 Within each region a centralized algorithm distributes the
workload among the processors.
 A higher level scheduling mechanism balances the work load
between regions.

52
Dynamic Load Balancing
Load
Balancing
Sender
Initiated
a processor with too much work sends
some work to another processor. They
perform better in an environment with
light to medium workload per processor.
Receiver
Initiated
a processor with too little work takes some
work from another processor. They
perform better in an environment with
heavy workload per processor.
Task migration can be expensive if the receiver grabs a
partially completed task.

53

54
Scheduling on UMA Multiprocessors

First
• Static scheduling sometimes results in lower execution times than
dynamic scheduling.
Second
• static scheduling can allow the generation of only one process per
processor, reducing process creation, synchronization, and termination
overhead.
Third
• static scheduling can be used to predict the speedup that can be achieved
by a particular parallel algorithm on a target machine, assuming no pre
emption of processes occurs.
55
Static Vs. Dynamic Scheduling

 One way to view a parallel algorithm is as a collection of
tasks, some of which must be completed before others begin.
 In a deterministic model, the execution time needed by each
task and the precedence relations between the tasks are
fixed and known in advance.
 This information can be represented by a directed graph
called a task graph, which ignores variances in tasks'
execution times due to interrupts, contention for shared
memory, etc.
 Task graphs do provide a basis for the static allocation of
tasks to processors.
56
Deterministic Static Scheduling

• A schedule is an allocation of tasks to processors. Schedules arc often
illustrated with Gantt charts.
Schedule
• A Gantt chart indicates the time each task spend in
execution, as well as the processor on which it
executes. A desirable feature of Gantt charts is that
they graphically illustrate the utilization of each
processor (percentage of time spent executing tasks).
Gantt Chart
57
Deterministic Static Scheduling

58
Deterministic Static Scheduling
 Some simple scheduling problems are solvable in polynomial time, while other
problems, only slightly more complex, are intractable.
 For example, if all of the tasks take unit time, and the task graph is a forest, then a
polynomial time algorithm exists to find an optimal schedule (Hu 1961).
 If all of the tasks take unit time, and the number of processors is two, then a
polynomial time algorithm exists to find an optimal schedule (Coffinan and Graham 1972).
 If the task lengths vary at all, or if there are more than two processors, then the
problem of finding an optimal schedule is NP-hard (Ullman 1975), meaning the only
known algorithms that find an optimal schedule require exponential time in the
worst case.

59
Deterministic Static Scheduling
 In general we are interested in scheduling arbitrary task graphs
onto a reasonable number of processors, with polynomial time
scheduling algorithms that do a good, but not perfect, job.
 Given a list of tasks ordered by their relative priority, it is possible to
assign tasks to processors by always giving each available processor
the first unassigned task on the list whose predecessor tasks have
already finished execution.
 This list-scheduling algorithm was proposed by Graham (1966,
1969, 1972), and we formalize it next.

 Let T = {T1 , T2 ,..., Tn} be a set of tasks.
 Let : T  (0,) be a function that associates execution time with task.
 We are also given a partial order  on T.
 Let L, be a list of the tasks in T.
 Whenever a processor has no work to do, it instantaneously removes from L
the first ready task; that is, an unscheduled task whose predecessors under 
have all completed execution.
 If two or more processors simultaneously attempt to execute the same tasks.
the processor with the lowest index succeeds, and the other processors look
for another suitable task.
60
Graham‘s List scheduling Algorithm

61
Graham‘s List scheduling Algorithm
L = (T1, T2, T3, T4, T5 , T6 , T7 )
P = 3
P2
P1
P0
1
T1
2
T1
3
T4
T3
T2
4
T4
T2
5
T6
T2
6
T6
T5
7
T6
T5
8
T5
9
T7

Graham‘s List scheduling Algorithm
It is expected that length of the schedule should decrease by
Increasing the number of processors,
Decreasing the execution times of one or more tasks,
Eliminating some of the precedence constraints
62

Graham‘s List scheduling Algorithm
63
An example illustrating that increasing the number of processors can increase
the length of schedule generated using Graham’s heuristic.

Graham‘s List scheduling Algorithm
64
An example illustrating that decreasing the execution times of one or more tasks
can increase the length of schedule generated using Graham’s heuristic.

Graham‘s List scheduling Algorithm
65
An example illustrating that eliminating some of the precedence constraints can
increase the length of schedule generated using Graham’s heuristic.

 Graham's list-scheduling algorithm depends upon a
prioritized list of tasks to execute.
 A well-known and intuitive scheduling algorithm due to
Coffman and Graham (1972) constructs the list of tasks for
the simple case when all tasks take the same amount of time.
 Once this list L has been constructed, the algorithm applies
Graham's list-scheduling algorithm, already described.
Coffman & Graham’s List Construction Algo.
66

 Let T = T1,T2,..., Tn be a set of n unit-time tasks to be executed on p processors.
 Let  be a partial order on T, that specifies which tasks must complete before
other tasks begin.
 If Ti  Tj then task Ti is an immediate predecessor of task Tj and Tj is an
immediate successor of Ti.
 Let S(Ti ) : denote the set of immediate successors of Ti.
 Let (T) : be an integer label assigned to T.
 N(T) : denotes the decreasing sequence of integers formed by
ordering the set {(T')|T'  S(T)).
Coffman & Graham’s List Construction Algo.
67

Coffman & Graham’s List Construction Algo.
68

Coffman & Graham’s List Construction Algo.
69

Coffman & Graham’s List Construction Algo.
70

Coffman & Graham’s List Construction Algo.
• If  is the length of a schedule produced by the Coffman-Graham
algorithm and 0 is the length of an optimal schedule, then /0 <2 -
2/p, where p is the number of processors and p 2 (see Lam and
Sethi 1977).
Theorem 5.12.
• The Coffman-Graham algorithm generates an optimal schedule if the
number of processors is two (see Lan and Sethi 1977).
Corollary 5.1.
71

 In a nondeterministic model, the execution time of a task
is represented by a random variable, making the
scheduling problem more difficult.
 This subsection summarizes mathematics developed by
Robinson (1979) that allow an estimate of the execution
time of parallel programs with "simple" task graphs on
UMA multiprocessors.
72
Nondeterministic Model

Initial Tasks Tasks with no predecessors are called initial tasks.
Independent
Tasks
A set of tasks is independent if, for every pair of tasks Ti and Tj in the set,
neither is a predecessor of the other.
Width The width of a task graph is the size of the maximal set of independent tasks.
Chain A chain is a totally ordered task graph.
Chain Length The length of a chain is the number of tasks in the chain.
Level The level of a task T in a task graph G is the maximum chain length in G from
an initial task to T.
Depth The depth of a task graph G is the maximum level of any task in G.
73
Nondeterministic Model

74
Nondeterministic Model
• Given a task graph G let C1 C2 ...., Cm be all the m chains from initial
to final tasks in G.
• For every chain Ci, consisting of tasks Ti1, Ti2 ,..., Tij let Xi be the
expression xi1, xi2 ,..., xij where x1, x2 ,..., xn are polynomial variables.
• Then G is a simple task graph if the polynomial X1 + X2 ,..., Xm can
be factored so that every variable appears exactly once (see
Robinson 1979).
Definition 5.5.

75
Nondeterministic Model
x1x2x4 + x1x3x4 + x1x3x5
x1[x2x4 + x3x4 + x3x5]
Simple Graph

76
Nondeterministic Model
x1x2x4 + x1x3x4
x1[x2 + x3] x4
Simple Graph

77
Nondeterministic Model
x1x3 + x1x4 + x2x3 + x2x4
x1[x3 + x4] + x2[x3 + x4]
[x1 + x2] [x3 + x4]
Simple Graph

78
Nondeterministic Model
x1x3 + x1x4 + x2x4
x1[x3 + x4] + x2x4
Non-simple Graph

79
Nondeterministic Model

80

 A set of active concurrent processes is said to be deadlocked
if each holds non-pre-emptible resources that must be
acquired by some other process in order to proceed.
 The potential for deadlock exists whenever multiple
processes share resources in an unsupervised way.
 Hence deadlock can exist in multi-programmed operating
systems as well as in multiprocessors and multi-computers.
81
Deadlock
 Consider the two processes executing simultaneously.
 Each process attempts to lock on two resources.
 Note that lock and unlock correspond to P and V
operations on binary semaphores.
 Process 1 locks A while process 2 locks B.
 Process 1 is blocked when it tries to lock B; likewise,
process 2 is blocked when it tries to lock A. Neither
process can proceed.
 If neither of the processes can be made to back up and yield its
semaphore, the two processes will remain deadlocked indefinitely.
82
Deadlock
Proc.-1 Proc.-2
. .
. .
. .
lock(A) lock(B)
. .
. .
. .
lock(B) lock(A)
 Consider a multicomputer in which processors communicate asynchronously.
 When a processor sends a message to another processor the message are stored in
a system buffer until the receiving processor reads the message.
 Many processors are sending data to processor 0 that its system buffer fills up.
 Further attempts to send data are blocked until processor 0 reads one or more
messages, making room in the system buffer.
 Let processor i be one of the processors unable to send its message to processor 0.
 If processor 0 attempts to read the message sent by processor i, it will block until
the data appears in the system buffer.
 We have already seen that processor i is blocked until processor 0 removes one or
more messages from the system buffer.
 The two processors are dead locked.
83
Buffer - Deadlock

84
Buffer - Deadlock

85
Deadlock
Four necessary conditions for deadlock to occur
(Coffman and Denning 1973)
Mutual exclusion:
each process has
exclusive use of
its resources
Non-pre-emption:
a process never
releases
resources it holds
until it is through
using them
Resource waiting:
each process
holds resources
while waiting for
other processes
to release theirs
A cycle of waiting
processes:
each process in
the cycle waits for
resources that the
next process
owns and will not
relinquish

The problem of deadlock is commonly addressed in one of three ways.
• One approach is to detect deadlocks when they occur and try to
recover from them.
• Another approach is to avoid deadlocks by using advance
information about requests for resources to control allocation, so
that the next allocation of a resource will not cause processes to
enter a situation in which deadlock may occur.
• The third approach is to prevent deadlock by forbidding one of the
last three conditions listed above.
86
Deadlock

87
Deadlock
 A cycle of waiting processes can be prevented by ordering
shared resources and forcing processes to request
resources in that order.
 Deadlock can also be prevented by requiring processes to
acquire all their resources at once.
 The second approach often leads to underutilization of
resources.
Ad

More Related Content

What's hot (20)

Parallel computing chapter 3
Parallel computing chapter 3Parallel computing chapter 3
Parallel computing chapter 3
Md. Mahedi Mahfuj
 
Flynn's classification
Flynn's classificationFlynn's classification
Flynn's classification
Hamidul Islam
 
Amdahl`s law -Processor performance
Amdahl`s law -Processor performanceAmdahl`s law -Processor performance
Amdahl`s law -Processor performance
COMSATS Institute of Information Technology
 
Elementary Parallel Algorithms
Elementary Parallel AlgorithmsElementary Parallel Algorithms
Elementary Parallel Algorithms
Heman Pathak
 
Tcp and udp
Tcp and udpTcp and udp
Tcp and udp
Ahmad Khalid Nasrat
 
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Gyanmanjari Institute Of Technology
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
Pankaj Kumar Jain
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
Dr Sandeep Kumar Poonia
 
Transport layer
Transport layer Transport layer
Transport layer
Mukesh Chinta
 
Virtualization in cloud computing
Virtualization in cloud computingVirtualization in cloud computing
Virtualization in cloud computing
Mohammad Ilyas Malik
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
Heman Pathak
 
Sorting network
Sorting networkSorting network
Sorting network
BBAU Lucknow University
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
Danish Javed
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
Dr. C.V. Suresh Babu
 
Rabin Karp ppt
Rabin Karp pptRabin Karp ppt
Rabin Karp ppt
shreyasBharadwaj15
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
Pankaj Kumar Jain
 
Branch and Bound.pptx
Branch and Bound.pptxBranch and Bound.pptx
Branch and Bound.pptx
JoshipavanEdduluru1
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Ali A Jalil
 
Computer Networks: Quality of service
Computer Networks: Quality of serviceComputer Networks: Quality of service
Computer Networks: Quality of service
Kongu Engineering College, Perundurai, Erode
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processing
Kamal Acharya
 
Parallel computing chapter 3
Parallel computing chapter 3Parallel computing chapter 3
Parallel computing chapter 3
Md. Mahedi Mahfuj
 
Flynn's classification
Flynn's classificationFlynn's classification
Flynn's classification
Hamidul Islam
 
Elementary Parallel Algorithms
Elementary Parallel AlgorithmsElementary Parallel Algorithms
Elementary Parallel Algorithms
Heman Pathak
 
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Gyanmanjari Institute Of Technology
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
Pankaj Kumar Jain
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
Heman Pathak
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
Danish Javed
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
Pankaj Kumar Jain
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Ali A Jalil
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processing
Kamal Acharya
 

Similar to Chapter 5: Mapping and Scheduling (20)

An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
IJERA Editor
 
Analysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topologyAnalysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topology
Amir Masoud Sefidian
 
NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019
NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019
NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019
ssuser4b1f48
 
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
aftab alam
 
mesh generation techniqure of structured gridpdf
mesh generation techniqure of structured gridpdfmesh generation techniqure of structured gridpdf
mesh generation techniqure of structured gridpdf
ChrisLenard92
 
Arithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued LogicArithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued Logic
VLSICS Design
 
Arithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued LogicArithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued Logic
VLSICS Design
 
Arithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued Logic   Arithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued Logic
VLSICS Design
 
cs201-tree-graph_DATAStructur_Algorithm.ppt
cs201-tree-graph_DATAStructur_Algorithm.pptcs201-tree-graph_DATAStructur_Algorithm.ppt
cs201-tree-graph_DATAStructur_Algorithm.ppt
abhaysharma999437
 
Embedding of Poly Honeycomb Networks and the Metric dimension of Star of Davi...
Embedding of Poly Honeycomb Networks and the Metric dimension of Star of Davi...Embedding of Poly Honeycomb Networks and the Metric dimension of Star of Davi...
Embedding of Poly Honeycomb Networks and the Metric dimension of Star of Davi...
GiselleginaGloria
 
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor NetworksThe Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
GiselleginaGloria
 
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor NetworksThe Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
graphhoc
 
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor NetworksThe Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
Fransiskeran
 
SuperGraph visualization
SuperGraph visualizationSuperGraph visualization
SuperGraph visualization
Universidade de São Paulo
 
P2P Supernodes
P2P SupernodesP2P Supernodes
P2P Supernodes
Kevin Regan
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
ivaderivader
 
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Universitat Politècnica de Catalunya
 
for sbi so Ds c c++ unix rdbms sql cn os
for sbi so   Ds c c++ unix rdbms sql cn osfor sbi so   Ds c c++ unix rdbms sql cn os
for sbi so Ds c c++ unix rdbms sql cn os
alisha230390
 
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor NetworksThe Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
GiselleginaGloria
 
Siegel
SiegelSiegel
Siegel
Joel Siegel
 
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
IJERA Editor
 
Analysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topologyAnalysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topology
Amir Masoud Sefidian
 
NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019
NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019
NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019
ssuser4b1f48
 
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
aftab alam
 
mesh generation techniqure of structured gridpdf
mesh generation techniqure of structured gridpdfmesh generation techniqure of structured gridpdf
mesh generation techniqure of structured gridpdf
ChrisLenard92
 
Arithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued LogicArithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued Logic
VLSICS Design
 
Arithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued LogicArithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued Logic
VLSICS Design
 
Arithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued Logic   Arithmetic Operations in Multi-Valued Logic
Arithmetic Operations in Multi-Valued Logic
VLSICS Design
 
cs201-tree-graph_DATAStructur_Algorithm.ppt
cs201-tree-graph_DATAStructur_Algorithm.pptcs201-tree-graph_DATAStructur_Algorithm.ppt
cs201-tree-graph_DATAStructur_Algorithm.ppt
abhaysharma999437
 
Embedding of Poly Honeycomb Networks and the Metric dimension of Star of Davi...
Embedding of Poly Honeycomb Networks and the Metric dimension of Star of Davi...Embedding of Poly Honeycomb Networks and the Metric dimension of Star of Davi...
Embedding of Poly Honeycomb Networks and the Metric dimension of Star of Davi...
GiselleginaGloria
 
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor NetworksThe Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
GiselleginaGloria
 
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor NetworksThe Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
graphhoc
 
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor NetworksThe Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
Fransiskeran
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
ivaderivader
 
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Universitat Politècnica de Catalunya
 
for sbi so Ds c c++ unix rdbms sql cn os
for sbi so   Ds c c++ unix rdbms sql cn osfor sbi so   Ds c c++ unix rdbms sql cn os
for sbi so Ds c c++ unix rdbms sql cn os
alisha230390
 
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor NetworksThe Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
GiselleginaGloria
 
Ad

More from Heman Pathak (13)

Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Heman Pathak
 
Central processing unit
Central processing unitCentral processing unit
Central processing unit
Heman Pathak
 
Registers and counters
Registers and countersRegisters and counters
Registers and counters
Heman Pathak
 
Sequential Circuit
Sequential CircuitSequential Circuit
Sequential Circuit
Heman Pathak
 
Combinational logic 2
Combinational logic 2Combinational logic 2
Combinational logic 2
Heman Pathak
 
Combinational logic 1
Combinational logic 1Combinational logic 1
Combinational logic 1
Heman Pathak
 
Simplification of Boolean Function
Simplification of Boolean FunctionSimplification of Boolean Function
Simplification of Boolean Function
Heman Pathak
 
Chapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic GatesChapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic Gates
Heman Pathak
 
Chapter 7: Matrix Multiplication
Chapter 7: Matrix MultiplicationChapter 7: Matrix Multiplication
Chapter 7: Matrix Multiplication
Heman Pathak
 
Cost optimal algorithm
Cost optimal algorithmCost optimal algorithm
Cost optimal algorithm
Heman Pathak
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming Languages
Heman Pathak
 
Parallel Algorithm for Graph Coloring
Parallel Algorithm for Graph Coloring Parallel Algorithm for Graph Coloring
Parallel Algorithm for Graph Coloring
Heman Pathak
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
Heman Pathak
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Heman Pathak
 
Central processing unit
Central processing unitCentral processing unit
Central processing unit
Heman Pathak
 
Registers and counters
Registers and countersRegisters and counters
Registers and counters
Heman Pathak
 
Sequential Circuit
Sequential CircuitSequential Circuit
Sequential Circuit
Heman Pathak
 
Combinational logic 2
Combinational logic 2Combinational logic 2
Combinational logic 2
Heman Pathak
 
Combinational logic 1
Combinational logic 1Combinational logic 1
Combinational logic 1
Heman Pathak
 
Simplification of Boolean Function
Simplification of Boolean FunctionSimplification of Boolean Function
Simplification of Boolean Function
Heman Pathak
 
Chapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic GatesChapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic Gates
Heman Pathak
 
Chapter 7: Matrix Multiplication
Chapter 7: Matrix MultiplicationChapter 7: Matrix Multiplication
Chapter 7: Matrix Multiplication
Heman Pathak
 
Cost optimal algorithm
Cost optimal algorithmCost optimal algorithm
Cost optimal algorithm
Heman Pathak
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming Languages
Heman Pathak
 
Parallel Algorithm for Graph Coloring
Parallel Algorithm for Graph Coloring Parallel Algorithm for Graph Coloring
Parallel Algorithm for Graph Coloring
Heman Pathak
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
Heman Pathak
 
Ad

Recently uploaded (20)

Slide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptxSlide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptx
vvsasane
 
Lecture - 7 Canals of the topic of the civil engineering
Lecture - 7  Canals of the topic of the civil engineeringLecture - 7  Canals of the topic of the civil engineering
Lecture - 7 Canals of the topic of the civil engineering
MJawadkhan1
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdfIBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
VigneshPalaniappanM
 
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
Reflections on Morality, Philosophy, and History
 
vtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdfvtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdf
RaghavaGD1
 
Construction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil EngineeringConstruction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil Engineering
Lavish Kashyap
 
Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
Generative AI & Large Language Models Agents
Generative AI & Large Language Models AgentsGenerative AI & Large Language Models Agents
Generative AI & Large Language Models Agents
aasgharbee22seecs
 
introduction technology technology tec.pptx
introduction technology technology tec.pptxintroduction technology technology tec.pptx
introduction technology technology tec.pptx
Iftikhar70
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Physical and Physic-Chemical Based Optimization Methods: A Review
Physical and Physic-Chemical Based Optimization Methods: A ReviewPhysical and Physic-Chemical Based Optimization Methods: A Review
Physical and Physic-Chemical Based Optimization Methods: A Review
Journal of Soft Computing in Civil Engineering
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 
Artificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptxArtificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptx
rakshanatarajan005
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
Environment .................................
Environment .................................Environment .................................
Environment .................................
shadyozq9
 
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
Guru Nanak Technical Institutions
 
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning ModelsMode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Journal of Soft Computing in Civil Engineering
 
Modeling the Influence of Environmental Factors on Concrete Evaporation Rate
Modeling the Influence of Environmental Factors on Concrete Evaporation RateModeling the Influence of Environmental Factors on Concrete Evaporation Rate
Modeling the Influence of Environmental Factors on Concrete Evaporation Rate
Journal of Soft Computing in Civil Engineering
 
Slide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptxSlide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptx
vvsasane
 
Lecture - 7 Canals of the topic of the civil engineering
Lecture - 7  Canals of the topic of the civil engineeringLecture - 7  Canals of the topic of the civil engineering
Lecture - 7 Canals of the topic of the civil engineering
MJawadkhan1
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdfIBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
VigneshPalaniappanM
 
vtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdfvtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdf
RaghavaGD1
 
Construction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil EngineeringConstruction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil Engineering
Lavish Kashyap
 
Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
Generative AI & Large Language Models Agents
Generative AI & Large Language Models AgentsGenerative AI & Large Language Models Agents
Generative AI & Large Language Models Agents
aasgharbee22seecs
 
introduction technology technology tec.pptx
introduction technology technology tec.pptxintroduction technology technology tec.pptx
introduction technology technology tec.pptx
Iftikhar70
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 
Artificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptxArtificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptx
rakshanatarajan005
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
Environment .................................
Environment .................................Environment .................................
Environment .................................
shadyozq9
 

Chapter 5: Mapping and Scheduling

  • 1. 1
  • 2.  Mapping Data to Processors 2  Processor arrays and multi-computers are characterized by a non- uniform memory structure:  each processor is able to get data from local memory much faster than from nonlocal memory.  When designing algorithms for these machines, it makes sense to have processors manipulate local data as much as possible.  For this reason, the distribution of parallel data structures often dictates which processor is responsible for performing a particular operation.
  • 3.   An algorithm's data manipulation patterns can be represented as a graph:  each vertex represents a data subset allocated to the same local memory,  and each edge represents a computation involving data from two data sets.  These graphs are often regular. An important goal of a parallel algorithm designer is to map the algorithm graph into the corresponding graph of the target machine's processor organization (Bokhari 1981). Mapping Data to Processors 3
  • 4. Performance may suffer if the algorithm graph is not a sub-graph of the parallel architecture's processor organization.  On a multicomputer two connected vertices in the algorithm graph map to a pair of vertices distance two apart on the machine.  Passing a message from one processor to the other requires roughly twice the time it takes to pass a message between adjacent processors. Mapping Data to Processors 4
  • 5.  A parallel algorithm is implemented on a multicomputer with circuit-switched routing.  Different edges in the algorithm graph map to a shared link on the machine.  If simultaneous communications occur between both pairs of nodes, the speed of the communication will be affected by the shared use of the same link. Mapping Data to Processors 5  On some systems one message would be blocked until the other message had finished.  On other systems the physical link would be multiplexed between the two virtual links. Cutting the bandwidth of each link in half. In either case, performance is lower because each message does not have exclusive use of a communication path.
  • 6.  Mapping Data to Processors 6 • An embedding of a graph G = (V, E) into a graph G' = (V', E') is a function  from V to V'. Definition 5.1: • Let  be the function that embeds graph G = (V, E) into graph G' = (V'. E'). The dilation of the embedding is defined as follows : dil() =max{dist( (u),  (v))} where dist(a, b) is the distance between vertices a and b in G'. Definition 5.2:
  • 7.  Mapping Data to Processors 7 There exists a dilation-I embedding of G into G' if G is a sub-graph of G'. dilation-3 Embedding of G into G’ dilation-1 Embedding
  • 8.  • The load of an embedding  : G  G' is the maximum number of vertices of G that are mapped to a single vertex of G‘. Definition 5.3: 8 Mapping Data to Processors
  • 10.   The ring and the mesh have the same number of vertices.  A dilation-1 embedding exists if the mesh has an even number of row and/or columns.  A mesh with an odd number of rows and columns has no way to embed a ring in such a mesh without increasing the dilation. 10 Embedding a Ring into 2D-Mesh 0 1 2 3 4 5 6 7 8 91011 12 13 14 15 16 17 18 19
  • 11.  Embedding a 2D-Mesh into 2D-Mesh 11 Ar : denote the number of rows in the algorithm graph Ac : denote the number of columns in the algorithm graph Mr : denote the number of row in the machine graph Mc : denote the number of columns in the machine graph. The algorithm graph can be embedded with dilation-1 in the machine graph if and only if Ar <= Mr and Ac <= Mc Ac <= Mr and Ar <= Mc
  • 12.  Embedding a 2D-Mesh into 2D-Mesh 12 Ar <= Mr and Ac <= Mc Ac <= Mr and Ar <= Mc
  • 13.  Complete Binary Tree into 2D-Mesh 13 dilation-1 embedding of a complete binary tree of height 3 into a 2-D mesh. 129 8 64 13 32 1 10 75 15 1411
  • 14.  Complete Binary Tree into 2D-Mesh 14 • A complete binary tree of height greater than 4 cannot be embedded in a 2-D mesh without increasing the dilation beyond 1. Theorem 5.1. • The total no. of mesh points in k or fewer jumps away from an arbitrary point in a 2D-Mesh is 2k2+2k+1. • The total no. of nodes in a complete binary tree of depth k is 2k+1 – 1. • 2k+1 – 1 > 2k2 + 2k + 1 for k > 4 Proof:
  • 15.  Complete Binary Tree into 2D-Mesh  The H-tree is a common way of embedding a complete binary tree into a 2-D mesh. The name H-tree arises from the recursive "H" pattern used to construct a large H-tree out of four small H-trees. 15
  • 16.  Complete Binary Tree into 2D-Mesh 16 • A complete binary tree of height n has a dilation 𝑛/2 embedding in a 2-D mesh. Theorem 5.2. • We use H-trees to map binary trees to nodes of a 2-D Mesh. The longest edges in an H-tree are the edges from the root to its two children. These edges have the same length. The length of the root edges in an H-tree of height n is 𝑛/2 . Proof:
  • 17.  Binomial Tree into 2D-Mesh 01234 Construction of Binomial Tree of height k using two B-Tree of height k-1
  • 18.  18 Binomial Tree into 2D-Mesh 1 23 4 5 67 8 9 1011 12 13 1415 16 13 2 5 4 6 7 8 1012 9 14 11 13 16 15 dilation-1 embedding of a binomial tree of height 4 into a 2-D mesh.
  • 19.  • A binomial tree of height greater than 4 cannot be embedded in 2-D mesh without increasing the dilation beyond 1. Theorem 5.3. • The root node of a binomial tree of height d tree is connected to d other nodes. No node in a 2-D mesh has more than 4 neighbours. Hence a binomial tree of height greater than 4 cannot be embedded in a 2-D mesh without increasing the dilation beyond 1. Proof: 19 Binomial Tree into 2D-Mesh
  • 20.  • A binomial tree of height n has a dilation 𝑛/2 embedding in a 2-D mesh. Theorem 5.4. • The construction is illustrated in Fig. Proof: 20 Binomial Tree into 2D-Mesh
  • 23.  • A graph G is cubical if there is a dilation-1 embedding of G into a hypercube. Definition 5.4. • The problem of determining whether an arbitrary graph G is cubical is NP- Complete (Afrati et al. 1985; Cybenko et al. 1986). Theorem 5.5. 23 Embedding Graphs Into Hypercubes
  • 24.  Theorem 5.6. : A dilation-1 embedding of a connected graph G into a hypercube with n nodes exists if and only if it is possible to label the edges of G with the integers (1 ,.... n) such that: 1. Edges incident with a common vertex have different labels. 2. In every path of G at least one label appears an odd number of times. 3. In every cycle of G no label appears an odd number of times. 24 Embedding Graphs Into Hypercubes
  • 26.  • A dilation-1 embedding of a complete binary tree of height n into a hypercube of dimension n + 1 does not exist if n > 1. Theorem 5.7. • A complete binary tree of height n has 2n+1 - 1 nodes. • A hypercube of dimension n + 1 has 2n+1 nodes. • Let root of the tree mapped to node X of the cube and height of the tree is k. Proof: 26 Complete Binary Tree Into Hypercube
  • 27.  • Hypercube is a bipartite graph. • Half the nodes can be reached only by following an even number of edges from X. • Half the nodes can be reached only by following an odd number of edges from X. • If k is odd, then more than half the nodes of the binary tree are an even distance away from the root of the binary tree. • If k is even, then more than half the nodes of the binary tree are an odd distance away from the root of the binary tree. • In either case, there is no way to embed the binary tree into the hypercube and keep the dilation at 1, because there are not enough hypercube nodes to accommodate the leaves of the tree while maintaining parity (odd or even) with the interior nodes. Proof: 27 Complete Binary Tree Into Hypercube
  • 28.  • A balanced binary tree with of height n has a dilation-1 embedding into a hypercube of dimension n + 2 (see Nebesky 1974). Theorem 5.8. • A complete binary tree of height n has a dilation-2 embedding in hypercube of dimension n + 1, for all n > 1 (see Leighton 1992). Theorem 5.9. 28 Complete Binary Tree Into Hypercube
  • 29.  • A binomial tree of height n can be embedded in a hypercube of dimension n such that the dilation is 1. Theorem 5.10. • Organize the sub-tree so that the nodes that are roots of larger sub-trees appear to the left of nodes that are the roots of smaller sub-trees. • Give the edge to the leftmost child of the root node the label 1, • the edge to the second child of the root node the label 2, and so on, to the edge to the last child, which gets label n. • For all remaining interior nodes of the tree, if the edge above the node has label i then the edges to the k children of the node should be given labels i + 1, i + 2,.... i + k, from left to right. Proof: 29 Binomial Tree Into Hypercube
  • 31.  31 Binomial Tree Into Hypercube 0 12 3 4 56 7 6 7 54 2 3 10
  • 32.  32 Ring Into Hypercube 6 7 54 2 3 10 0 1 2 3 4 5 6 7
  • 33.   Assume the hypercube contains p = 2d processors.  Let G(i) be the number of the processor assigned to ring position i, where 0<=i<p.  G(0) = 0, G(1) = 1, G(2) = 2 ,.... G(p -1) = p -1 will not work  The encoding must have the following properties:  1. The values must be unique, in other words, G(i) = G(j), i=j.  2. G(i) and G(i+1) must differ in exactly one bit position, for all i, where 0<=i<p.  3. G(p-1) and G(0) must differ in exactly one bit position. 33 Ring Into Hypercube 6 7 54 2 3 10 0 1 2 3 4 5 6 7
  • 34.  • Gray code is a binary numeral system where two successive values differ in only one bit position. • There are many possible n-bit Gray codes. Gray Code • Longer Gray codes can be constructed from shorter Gray codes. • Given a d-bit Gray code a (d + 1)-bit Gray code can be constructed by listing the d-bit Gray code with the prefix 0, followed by the d-bit Gray code in reverse order with the prefix 1. Gray Code Generation 34 Ring Into Hypercube
  • 35.   A one bit Gray code is {0,1}  G(0) = 0 and  G(1) = 1 35 Gray Code Generation 10 0 1
  • 36.  0 0 36 Gray Code Generation 0 1 1 0 1 1 G(0) = 0 G(1) = 1 G(2) = 3 G(3) = 2 2,3 3,2 1,10,0 2 1 0 3  A two bit Gray code can be generated from two 1-bit Gray Code
  • 37.  37 Gray Code Generation 0 0 0 1 1 0 1 1 1 1 0 1 1 0 0 0 0 0 0 0 1 1 1 1 G(0) = 0 G(1) = 1 G(2) = 3 G(3) = 2 G(4) = 6 G(5) = 7 G(6) = 5 G(7) = 4 6,4 7,5 5,64,7 2,3 3,2 1,10,0 0 1 2 3 4 5 6 7  A three bit Gray code can be generated from two 2-bit Gray Code
  • 38.  38 Gray Code Generation G(0) = 0 G(1) = 1 G(2) = 3 G(3) = 2 G(4) = 6 G(5) = 7 G(6) = 5 G(7) = 4 0 1 2 3 4 5 6 7G-1 (0) = 0 G-1 (1) = 1 G-1 (2) = 3 G-1 (3) = 2 G-1 (4) = 7 G-1 (5) = 6 G-1 (6) = 4 G-1 (7) = 5 successor(0) = 1 successor(1) = 3 successor(2) = 6 successor(3) = 2 successor(4) = 0 successor(5) = 4 successor(6) = 7 successor(7) = 5 0-1-3-2-6-7-5-4-0 Successor(i) = ? X = G-1 (i) Y = X + 1 Successor(i) = G(Y) = G(X+1) = G(G-1 (i) + 1) 6,4 7,5 5,64,7 2,3 3,2 1,10,0
  • 39.  Gray Code G(i) Input: Ring Node Output: Hypercube Node Inverse Gray Code G-1(i) Input: Hypercube Node Output: Ring Node G-1(i) = j if and only if G(j) = i Successor(i) i is Hypercube node 0 if i = 2d-1 G(G-1(i) + 1) Otherwise 39 Gray Code Generation
  • 40.  40 Gray Code Generation i i/2 i  i/2 G(i) 0: 000 0: 000 000: 0 G(0) = 0 1: 001 0: 000 001: 1 G(1) = 1 2: 010 1: 001 011: 3 G(2) = 3 3: 011 1: 001 010: 2 G(3) = 2 4: 100 2: 010 110: 6 G(4) = 6 5: 101 2: 010 111: 7 G(5) = 7 6: 110 3: 011 101: 5 G(6) = 5 7: 111 3: 011 100: 4 G(7) = 4
  • 41.  41 Gray-1 Code Generation i Ans mask 0: 000 0: 000 000: 0 1: 001 1: 001 000: 0 2: 010 2: 010 001: 1 3: 011 000: 0 3: 011 3: 011 001: 1 2: 010 000: 0 4: 100 4: 100 010: 2 6: 110 001: 1 7: 111 000: 0 i Ans mask 5: 101 5: 101 010: 2 7: 111 001: 1 6: 110 000: 0 6: 110 6: 110 011: 3 5: 101 001: 1 4: 100 000: 0 7: 111 7: 111 011: 3 4: 100 001: 1 5: 101 000: 0 G-1 (0) = 0 G-1 (1) = 1 G-1 (2) = 3 G-1 (3) = 2 G-1 (4) = 7 G-1 (5) = 6 G-1 (6) = 4 G-1 (7) = 5
  • 42.  42 Mesh into Hypercube  The use of Gray codes yields a straight forward solution with the constraint that the size of the mesh in each dimension must be a power of 2.  Each dimension of the mesh is assigned an appropriate number of bit positions of the encoding string.  Traversing mesh nodes along that dimension yields a cycle.  Gray codes determine the values assigned to the bit field.
  • 43.  43 Mesh into Hypercube  For example. consider mapping a 4 x 8 mesh into a 32-node hypercube.  Two bit positions are reserved for the row and three bits are set for the column.  Let us assume that the first two bit positions are used for the row. The 2-bit Gray code (00, 01, 11, 10} corresponds to a traversal through rows 0, 1, 2, and 3.  The 3-bit Gray code {000, 001, 011, 010,110,111,  101, 100} corresponds to a traversal through columns 0,1, 2, 3, 4, 5, 6, and 7.  Hence we have the following mapping of a 4 x 8 mesh into a 32-node hypercube:
  • 45.  • Any two-dimensional mesh with n vertices can be embedded in a 𝑙𝑜𝑔𝑛 -dimensional hypercube with dilation 2 (Leighton 1992). Theorem 5.11. 45 Mesh into Hypercube
  • 47.   In multicomputer it has been assumed that if data were distributed evenly among the local memories of a multicomputer's processors, the processors' workloads would be balanced for the entire computation.  In many cases this assumption is true but not always.  If nothing is done to change the size of each processor's area of responsibility the processors, workloads may become severely imbalanced. 47 Dynamic Load Balancing in Multi-computer
  • 48.  • Dynamic load balancing is the process of making changes to the distribution of work among the processors at run time. • The measure of success of dynamic load balancing is the net reduction of execution time achieved by applying the load balancing algorithm. • Dynamic load balancing may increase the execution time of the parallel algorithm if the time spent performing the load balancing is more than the time saved by reducing the variance in the execution time of tasks on the various processors. Dynamic Load Balancing 48 Dynamic Load Balancing in Multi-computer
  • 49.  49 Centralized Load Balancing Algorithms  It make a global decision about the reallocation of work to processors.  Some centralized algorithms assign the maintenance of the system's global state information to a particular node.  Global state information can allow the algorithm to do a good job balancing the work among the processors.  However, this approach does not scale well, because the amount of information increases linearly with the number of processors.
  • 50.  50 Fully Distributed Load Balancing Algorithms  Each processor build its own view of the state of the system.  Processors exchange information with neighbouring processors and use this information to make local changes in the allocation of work.  A fully distributed algorithm has the advantage of lower scheduling overhead.  However. since processors have only local state information, the workload may not be balanced as well as it would be by centralized algorithms.
  • 51. 51 Semi Distributed Load Balancing Algorithms  A semi-distributed load balancing algorithm divides the processors into regions.  Within each region a centralized algorithm distributes the workload among the processors.  A higher level scheduling mechanism balances the work load between regions.
  • 52.  52 Dynamic Load Balancing Load Balancing Sender Initiated a processor with too much work sends some work to another processor. They perform better in an environment with light to medium workload per processor. Receiver Initiated a processor with too little work takes some work from another processor. They perform better in an environment with heavy workload per processor. Task migration can be expensive if the receiver grabs a partially completed task.
  • 54.  54 Scheduling on UMA Multiprocessors
  • 55.  First • Static scheduling sometimes results in lower execution times than dynamic scheduling. Second • static scheduling can allow the generation of only one process per processor, reducing process creation, synchronization, and termination overhead. Third • static scheduling can be used to predict the speedup that can be achieved by a particular parallel algorithm on a target machine, assuming no pre emption of processes occurs. 55 Static Vs. Dynamic Scheduling
  • 56.   One way to view a parallel algorithm is as a collection of tasks, some of which must be completed before others begin.  In a deterministic model, the execution time needed by each task and the precedence relations between the tasks are fixed and known in advance.  This information can be represented by a directed graph called a task graph, which ignores variances in tasks' execution times due to interrupts, contention for shared memory, etc.  Task graphs do provide a basis for the static allocation of tasks to processors. 56 Deterministic Static Scheduling
  • 57.  • A schedule is an allocation of tasks to processors. Schedules arc often illustrated with Gantt charts. Schedule • A Gantt chart indicates the time each task spend in execution, as well as the processor on which it executes. A desirable feature of Gantt charts is that they graphically illustrate the utilization of each processor (percentage of time spent executing tasks). Gantt Chart 57 Deterministic Static Scheduling
  • 58.  58 Deterministic Static Scheduling  Some simple scheduling problems are solvable in polynomial time, while other problems, only slightly more complex, are intractable.  For example, if all of the tasks take unit time, and the task graph is a forest, then a polynomial time algorithm exists to find an optimal schedule (Hu 1961).  If all of the tasks take unit time, and the number of processors is two, then a polynomial time algorithm exists to find an optimal schedule (Coffinan and Graham 1972).  If the task lengths vary at all, or if there are more than two processors, then the problem of finding an optimal schedule is NP-hard (Ullman 1975), meaning the only known algorithms that find an optimal schedule require exponential time in the worst case.
  • 59.  59 Deterministic Static Scheduling  In general we are interested in scheduling arbitrary task graphs onto a reasonable number of processors, with polynomial time scheduling algorithms that do a good, but not perfect, job.  Given a list of tasks ordered by their relative priority, it is possible to assign tasks to processors by always giving each available processor the first unassigned task on the list whose predecessor tasks have already finished execution.  This list-scheduling algorithm was proposed by Graham (1966, 1969, 1972), and we formalize it next.
  • 60.   Let T = {T1 , T2 ,..., Tn} be a set of tasks.  Let : T  (0,) be a function that associates execution time with task.  We are also given a partial order  on T.  Let L, be a list of the tasks in T.  Whenever a processor has no work to do, it instantaneously removes from L the first ready task; that is, an unscheduled task whose predecessors under  have all completed execution.  If two or more processors simultaneously attempt to execute the same tasks. the processor with the lowest index succeeds, and the other processors look for another suitable task. 60 Graham‘s List scheduling Algorithm
  • 61.  61 Graham‘s List scheduling Algorithm L = (T1, T2, T3, T4, T5 , T6 , T7 ) P = 3 P2 P1 P0 1 T1 2 T1 3 T4 T3 T2 4 T4 T2 5 T6 T2 6 T6 T5 7 T6 T5 8 T5 9 T7
  • 62.  Graham‘s List scheduling Algorithm It is expected that length of the schedule should decrease by Increasing the number of processors, Decreasing the execution times of one or more tasks, Eliminating some of the precedence constraints 62
  • 63.  Graham‘s List scheduling Algorithm 63 An example illustrating that increasing the number of processors can increase the length of schedule generated using Graham’s heuristic.
  • 64.  Graham‘s List scheduling Algorithm 64 An example illustrating that decreasing the execution times of one or more tasks can increase the length of schedule generated using Graham’s heuristic.
  • 65.  Graham‘s List scheduling Algorithm 65 An example illustrating that eliminating some of the precedence constraints can increase the length of schedule generated using Graham’s heuristic.
  • 66.   Graham's list-scheduling algorithm depends upon a prioritized list of tasks to execute.  A well-known and intuitive scheduling algorithm due to Coffman and Graham (1972) constructs the list of tasks for the simple case when all tasks take the same amount of time.  Once this list L has been constructed, the algorithm applies Graham's list-scheduling algorithm, already described. Coffman & Graham’s List Construction Algo. 66
  • 67.   Let T = T1,T2,..., Tn be a set of n unit-time tasks to be executed on p processors.  Let  be a partial order on T, that specifies which tasks must complete before other tasks begin.  If Ti  Tj then task Ti is an immediate predecessor of task Tj and Tj is an immediate successor of Ti.  Let S(Ti ) : denote the set of immediate successors of Ti.  Let (T) : be an integer label assigned to T.  N(T) : denotes the decreasing sequence of integers formed by ordering the set {(T')|T'  S(T)). Coffman & Graham’s List Construction Algo. 67
  • 68.  Coffman & Graham’s List Construction Algo. 68
  • 69.  Coffman & Graham’s List Construction Algo. 69
  • 70.  Coffman & Graham’s List Construction Algo. 70
  • 71.  Coffman & Graham’s List Construction Algo. • If  is the length of a schedule produced by the Coffman-Graham algorithm and 0 is the length of an optimal schedule, then /0 <2 - 2/p, where p is the number of processors and p 2 (see Lam and Sethi 1977). Theorem 5.12. • The Coffman-Graham algorithm generates an optimal schedule if the number of processors is two (see Lan and Sethi 1977). Corollary 5.1. 71
  • 72.   In a nondeterministic model, the execution time of a task is represented by a random variable, making the scheduling problem more difficult.  This subsection summarizes mathematics developed by Robinson (1979) that allow an estimate of the execution time of parallel programs with "simple" task graphs on UMA multiprocessors. 72 Nondeterministic Model
  • 73.  Initial Tasks Tasks with no predecessors are called initial tasks. Independent Tasks A set of tasks is independent if, for every pair of tasks Ti and Tj in the set, neither is a predecessor of the other. Width The width of a task graph is the size of the maximal set of independent tasks. Chain A chain is a totally ordered task graph. Chain Length The length of a chain is the number of tasks in the chain. Level The level of a task T in a task graph G is the maximum chain length in G from an initial task to T. Depth The depth of a task graph G is the maximum level of any task in G. 73 Nondeterministic Model
  • 74.  74 Nondeterministic Model • Given a task graph G let C1 C2 ...., Cm be all the m chains from initial to final tasks in G. • For every chain Ci, consisting of tasks Ti1, Ti2 ,..., Tij let Xi be the expression xi1, xi2 ,..., xij where x1, x2 ,..., xn are polynomial variables. • Then G is a simple task graph if the polynomial X1 + X2 ,..., Xm can be factored so that every variable appears exactly once (see Robinson 1979). Definition 5.5.
  • 75.  75 Nondeterministic Model x1x2x4 + x1x3x4 + x1x3x5 x1[x2x4 + x3x4 + x3x5] Simple Graph
  • 76.  76 Nondeterministic Model x1x2x4 + x1x3x4 x1[x2 + x3] x4 Simple Graph
  • 77.  77 Nondeterministic Model x1x3 + x1x4 + x2x3 + x2x4 x1[x3 + x4] + x2[x3 + x4] [x1 + x2] [x3 + x4] Simple Graph
  • 78.  78 Nondeterministic Model x1x3 + x1x4 + x2x4 x1[x3 + x4] + x2x4 Non-simple Graph
  • 81.   A set of active concurrent processes is said to be deadlocked if each holds non-pre-emptible resources that must be acquired by some other process in order to proceed.  The potential for deadlock exists whenever multiple processes share resources in an unsupervised way.  Hence deadlock can exist in multi-programmed operating systems as well as in multiprocessors and multi-computers. 81 Deadlock
  • 82.  Consider the two processes executing simultaneously.  Each process attempts to lock on two resources.  Note that lock and unlock correspond to P and V operations on binary semaphores.  Process 1 locks A while process 2 locks B.  Process 1 is blocked when it tries to lock B; likewise, process 2 is blocked when it tries to lock A. Neither process can proceed.  If neither of the processes can be made to back up and yield its semaphore, the two processes will remain deadlocked indefinitely. 82 Deadlock Proc.-1 Proc.-2 . . . . . . lock(A) lock(B) . . . . . . lock(B) lock(A)
  • 83.  Consider a multicomputer in which processors communicate asynchronously.  When a processor sends a message to another processor the message are stored in a system buffer until the receiving processor reads the message.  Many processors are sending data to processor 0 that its system buffer fills up.  Further attempts to send data are blocked until processor 0 reads one or more messages, making room in the system buffer.  Let processor i be one of the processors unable to send its message to processor 0.  If processor 0 attempts to read the message sent by processor i, it will block until the data appears in the system buffer.  We have already seen that processor i is blocked until processor 0 removes one or more messages from the system buffer.  The two processors are dead locked. 83 Buffer - Deadlock
  • 85.  85 Deadlock Four necessary conditions for deadlock to occur (Coffman and Denning 1973) Mutual exclusion: each process has exclusive use of its resources Non-pre-emption: a process never releases resources it holds until it is through using them Resource waiting: each process holds resources while waiting for other processes to release theirs A cycle of waiting processes: each process in the cycle waits for resources that the next process owns and will not relinquish
  • 86.  The problem of deadlock is commonly addressed in one of three ways. • One approach is to detect deadlocks when they occur and try to recover from them. • Another approach is to avoid deadlocks by using advance information about requests for resources to control allocation, so that the next allocation of a resource will not cause processes to enter a situation in which deadlock may occur. • The third approach is to prevent deadlock by forbidding one of the last three conditions listed above. 86 Deadlock
  • 87.  87 Deadlock  A cycle of waiting processes can be prevented by ordering shared resources and forcing processes to request resources in that order.  Deadlock can also be prevented by requiring processes to acquire all their resources at once.  The second approach often leads to underutilization of resources.
  翻译: