SlideShare a Scribd company logo
High Performance
Computing
Introduction to Parallel Computing
Acknowledgements
Content of the following presentation is
borrowed from
The Lawrence Livermore National
Laboratory
https://hpc.llnl.gov/training/tutorials
Serial Computing
• Traditionally, software has been
written for serial computation:
• A problem is broken into a
discrete series of instructions
• Instructions are executed
sequentially one after another
• Executed on a single processor
• Only one instruction may execute
at any moment in time
Parallel Computing
▪ Simultaneous use of multiple
compute resources to solve a
computational problem.
▪ Run on multiple CPUs
▪ Problem is decomposed into
multiple parts that can be solved
concurrently.
▪ Each part is decomposed into a
set of instructions.
▪ Instructions are executed
simultaneously on different CPUs
Parallel Computer Architecture
Example: Networks connect multiple stand-
alone computers (nodes) to make larger parallel
computer clusters.
Compute Resources
• Single Computer with
multiple processors.
• Arbitrary Number of
Computers connected
by a network.
• Combination of both
Parallel Computer Memory Architectures
Shared Memory
Interconnect
Sharing the same address space
Parallel Computer Memory Architectures
Shared Memory
Advantages:
• Global address space provides a user-friendly programming perspective to memory
• Data sharing between tasks is both fast and uniform due to the proximity of
memory to CPUs
Disadvantages:
• Lack of scalability between memory and CPUs.
• Programmer responsibility for synchronization constructs that ensure "correct"
access of global memory.
Parallel Computer Memory Architectures
Distributed Memory
A communication network to connect inter-processor
memory.
Processors have their own local memory. Memory addresses
in one processor do not map to another processor, so there
is no concept of global address space across all processors.
Changes it makes to its local memory have no effect on the
memory of other processors.
Task of the programmer to explicitly define how and when
data is communicated. Synchronization between tasks is
likewise the programmer's responsibility.
The network "fabric" used for data transfer varies widely,
Memory
Memory
Memory
Memory
CPU
CPU
CPU
CPU
Parallel Computer Memory Architectures
Distributed Memory
Advantages:
• Scalable Memory: is scalable with the number of processors.
• Each processor can rapidly access its own memory.
• Cost effective: can use commodity, off-the-shelf processors and networking.
Disadvantages:
• Programmer is responsible for details associated with data communication
between processors.
• It may be difficult to map existing data structures, based on global memory, to
this memory organization.
• Non-uniform memory access times - data residing on a remote node takes longer
to access than node local data.
Parallel Computer Memory Architectures
Hybrid Memory
The largest and fastest computers in the world today
employ both shared and distributed memory architectures
The shared memory component can be a shared memory
machine and/or graphics processing units (GPU).
The distributed memory component is the networking of
multiple shared memory/GPU machines, which know only
about their own memory - not the memory on another
machine.
Network communications are required to move data from
one machine to another.
Trends indicate that this type of memory architecture will
prevail and increase at the high end of computing for the
foreseeable future,
Memory
Memory
Memory
Memory
CPU
CPU
CPU
CPU
Memory
Memory
Memory
Memory
CPU
CPU
CPU
CPU
The Parallel Computing Terminology
Supercomputing / High Performance Computing (HPC) : Using the
world's fastest and largest computers to solve large problems.
Node : a standalone "computer in a box". Usually comprised of multiple
CPUs/processors/cores, memory, network interfaces, etc. Nodes are
networked together to comprise a supercomputer.
CPU / Processor / Core : It Depends..... (A Node has multiple Cores or
Processors)
LogicalView of a Supercomputer
Limits and Costs of Parallel Programming
Observed speedup of a code which has
been parallelized, defined as:
wall-clock time of serial execution
wall-clock time of parallel execution
Parallel programs contain
 Serial Section
 Parallel Section
Amdhal’s Law
Parallelizable Serial
Time = 5 units Time = 3 units
2 3 4 51
1
2
3
4 5
Speedup is limited
by the non-
parallelizable/ serial
portion of the work.
Gustafson's Law
Amdhal’s Law & Gustafson's Law
How quickly can we complete analysis
on a particular data set by increasing
Processor count?
Can we analyze more data in approx.
same amount of time by increasing
Processor count?
Amdhal’s Law / Strong Scaling Gustafson’s Law / Weak Scaling
Moving from Serial to Parallel Code
• Can the problem be parallelized?
• Calculation of the Fibonacci series (0,1,1,2,3,5,8,13,21,...)
by use of the formula:
• F(n) = F(n-1) + F(n-2)
• Identify the program's hotspots
• Identify bottlenecks in the program
• Identify Data Dependencies and Task Dependencies (inhibitors to
parallelism )
• Investigate other algorithms if possible & take advantage of
optimized third party parallel software.
Moving from Serial to Parallel Code
Problem to solve
Decomposition
Subproblems/Tasks
Parallel worker threads
Assignment
Identify & Resolve Data & Control
Dependencies
Mapping
Processor Cores
Execution on Parallel Machine
Data
Decomposition
Task
Decomposition
Orchestration
Specifying mechanism to divide work
up among processes. (Static or
Dynamic)
Moving from Serial to Parallel Code : Decomposition
Decomposition
Divide computation into smaller parts(tasks) that can be executed concurrently
There are two basic ways to partition computational work among parallel tasks
1. Domain/Data Decomposition
Data associated with the problem
is decomposed
More on Data Decomposition
For problems that operate on large amounts of data
Data is divided up between CPUs : Each CPU has its own chunk of
dataset to operate upon and then the results are collated.
Which data should we partition?
Input Data
Output Data
Intermediate Data
Ensure Load Balancing : Equal sized tasks not necessarily equal size
data sets.
• Static Load Balancing
• Dynamic Load Balancing
2. Functional/ Task Decomposition
The problem is decomposed according to the work that must be done.
Each task then performs a portion of the overall work .
Moving from Serial to Parallel Code: Decomposition
Decomposition : Example
Dense Matrix-Vector Multiplication
Computing y[i] only use ith row of A and b
Task = > computing y[i]
Task size is uniform
No dependence between tasks
All tasks need b
• Does the partition define (at least an order of magnitude) more tasks
than there are processors in your target computer? (Flexibility)
• Does the partition avoid redundant computation and storage
requirements? (Scalability)
• Are tasks of comparable size? (Load Balancing)
• Does the number of tasks scale with problem size? Increase in
problem size should increase the number of tasks rather than the
size of individual tasks.
• Identify alternative partitions? You can maximize flexibility in
subsequent design stages by considering alternatives now.
• Investigate both domain and functional decompositions.
Decomposition : What to look for
• The order of statement execution affects the results of the
program.
• Multiple use of the same location(s) in storage by different tasks.
Data Dependencies
DO J = MYSTART,MYEND
A(J) = A(J-1) * 2.0
END DO
Task 1 Task2
------ ------
X = 2 X = 4
. . . .
…………
Y = X**2 Y = X**3
Loop carried dependence Loop independent data dependency
DO J = MYSTART,MYEND
A(J) = A(J-1) * 2.0
END DO
Task 1 Task2
------ ------
X = 2 X = 4
. . . . …………
Y = X**2 Y = X**3
If Task 2 has A(J) and task 1 has A(J-1)
Distributed memory architecture - task 2 must
obtain the value of A(J-1) from task 1 after task
1 finishes its computation
Shared memory architecture - task 2 must read
A(J-1) after task 1 updates it
(Race Condition)The value ofY is dependent on:
Distributed memory architecture - if or when the
value of X is communicated between the tasks.
Shared memory architecture - which task last
stores the value of X.
Data Dependencies
• Distributed memory architectures - communicate required data
at synchronization points.
• Shared memory architectures -synchronize read/write
operations between tasks.
• Data Dependencies:- Mutual Exclusion
Locks & Critical Sections
• Task Dependencies:- Explicit or Implicit Synchronization
points called Barriers
Handling Dependencies
GOAL : Assigning the tasks/ processes to Processors while
Minimizing Parallel Processing Overheads
• Maximize data locality
• Minimize volume of data-exchange
• Minimize frequency of interactions
• Minimize contention and hot spots
• Overlap computation with interactions
• Selective data and computation replication
Mapping
Thank You
HPC Solution Stack
Pthreads
Application Libraries (MPI/
OpenMP)
C
l
o
u
d
/I
n
h
o
u
s
e
Cluster Management Job Scheduling
Billing &
Reporting
Custom User Interface
Compute
Servers
Storage Interconnect
Management
Servers
Debuggers
Dev and Perf
Tools
Ad

More Related Content

What's hot (20)

Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
Burhan Ahmed
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
abhiritva
 
Multivector and multiprocessor
Multivector and multiprocessorMultivector and multiprocessor
Multivector and multiprocessor
Kishan Panara
 
Course outline of parallel and distributed computing
Course outline of parallel and distributed computingCourse outline of parallel and distributed computing
Course outline of parallel and distributed computing
National College of Business Administration & Economics ( NCBA&E)
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
Ameya Waghmare
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
Pooja Dixit
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
Pankaj Kumar Jain
 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
Gd Goenka University
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
vtunotesbysree
 
Parallel processing
Parallel processingParallel processing
Parallel processing
Syed Zaid Irshad
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
Dr Sandeep Kumar Poonia
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
Shashank Kapoor
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
Uday Sharma
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
Page Maker
 
Distributed Computing ppt
Distributed Computing pptDistributed Computing ppt
Distributed Computing ppt
OECLIB Odisha Electronics Control Library
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query Processing
Mythili Kannan
 
Machine learning
Machine learningMachine learning
Machine learning
Amit Kumar Rathi
 
Chpt7
Chpt7Chpt7
Chpt7
RohitKeshari
 
CS9222 Advanced Operating System
CS9222 Advanced Operating SystemCS9222 Advanced Operating System
CS9222 Advanced Operating System
Kathirvel Ayyaswamy
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
Burhan Ahmed
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
abhiritva
 
Multivector and multiprocessor
Multivector and multiprocessorMultivector and multiprocessor
Multivector and multiprocessor
Kishan Panara
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
Pooja Dixit
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
Pankaj Kumar Jain
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
vtunotesbysree
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
Shashank Kapoor
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
Uday Sharma
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
Page Maker
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query Processing
Mythili Kannan
 
CS9222 Advanced Operating System
CS9222 Advanced Operating SystemCS9222 Advanced Operating System
CS9222 Advanced Operating System
Kathirvel Ayyaswamy
 

Similar to Introduction to Parallel Computing (20)

CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
AbcvDef
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
Malobe Lottin Cyrille Marcel
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
Oleg Tsal-Tsalko
 
Parallel processing
Parallel processingParallel processing
Parallel processing
Praveen Kumar
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)
Sudarshan Mondal
 
Lecture1
Lecture1Lecture1
Lecture1
tt_aljobory
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
krnaween
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
Murtadha Alsabbagh
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
RubenGabrielHernande
 
Distributive operating system
Distributive operating systemDistributive operating system
Distributive operating system
Muhammad Adeel Rajput
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
Balaji Vignesh
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
KishaKiddo
 
parallel computing.ppt
parallel computing.pptparallel computing.ppt
parallel computing.ppt
ssuser413a98
 
Thread
ThreadThread
Thread
Syed Zaid Irshad
 
Underlying principles of parallel and distributed computing
Underlying principles of parallel and distributed computingUnderlying principles of parallel and distributed computing
Underlying principles of parallel and distributed computing
GOVERNMENT COLLEGE OF ENGINEERING,TIRUNELVELI
 
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbxPP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
nairatarek3
 
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c91 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
JamesSalcedo2
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
AbcvDef
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
Malobe Lottin Cyrille Marcel
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
Oleg Tsal-Tsalko
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)
Sudarshan Mondal
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
krnaween
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
Murtadha Alsabbagh
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
RubenGabrielHernande
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
Balaji Vignesh
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
KishaKiddo
 
parallel computing.ppt
parallel computing.pptparallel computing.ppt
parallel computing.ppt
ssuser413a98
 
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbxPP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
nairatarek3
 
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c91 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
JamesSalcedo2
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
Ad

More from Akhila Prabhakaran (9)

Re Imagining Education
Re Imagining EducationRe Imagining Education
Re Imagining Education
Akhila Prabhakaran
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
Akhila Prabhakaran
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
Akhila Prabhakaran
 
Hypothesis testing Part1
Hypothesis testing Part1Hypothesis testing Part1
Hypothesis testing Part1
Akhila Prabhakaran
 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- III
Akhila Prabhakaran
 
Statistical Analysis with R -II
Statistical Analysis with R -IIStatistical Analysis with R -II
Statistical Analysis with R -II
Akhila Prabhakaran
 
Statistical Analysis with R -I
Statistical Analysis with R -IStatistical Analysis with R -I
Statistical Analysis with R -I
Akhila Prabhakaran
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
Akhila Prabhakaran
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
Akhila Prabhakaran
 
Ad

Recently uploaded (20)

Black hole and its division and categories
Black hole and its division and categoriesBlack hole and its division and categories
Black hole and its division and categories
MSafiullahALawi
 
Mycology:Characteristics of Ascomycetes Fungi
Mycology:Characteristics of Ascomycetes FungiMycology:Characteristics of Ascomycetes Fungi
Mycology:Characteristics of Ascomycetes Fungi
SAYANTANMALLICK5
 
Freshwater Biome Types, Characteristics and Factors
Freshwater Biome Types, Characteristics and FactorsFreshwater Biome Types, Characteristics and Factors
Freshwater Biome Types, Characteristics and Factors
mytriplemonlineshop
 
Cleaned_Expanded_Metal_Nanoparticles_Presentation.pptx
Cleaned_Expanded_Metal_Nanoparticles_Presentation.pptxCleaned_Expanded_Metal_Nanoparticles_Presentation.pptx
Cleaned_Expanded_Metal_Nanoparticles_Presentation.pptx
zainab98aug
 
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Professional Content Writing's
 
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Professional Content Writing's
 
What Are Dendritic Cells and Their Role in Immunobiology?
What Are Dendritic Cells and Their Role in Immunobiology?What Are Dendritic Cells and Their Role in Immunobiology?
What Are Dendritic Cells and Their Role in Immunobiology?
Kosheeka : Primary Cells for Research
 
Somato_Sensory _ somatomotor_Nervous_System.pptx
Somato_Sensory _ somatomotor_Nervous_System.pptxSomato_Sensory _ somatomotor_Nervous_System.pptx
Somato_Sensory _ somatomotor_Nervous_System.pptx
klynct
 
Controls over genes.ppt. Gene Expression
Controls over genes.ppt. Gene ExpressionControls over genes.ppt. Gene Expression
Controls over genes.ppt. Gene Expression
NABIHANAEEM2
 
Water Pollution control using microorganisms
Water Pollution control using microorganismsWater Pollution control using microorganisms
Water Pollution control using microorganisms
gerefam247
 
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
Sérgio Sacani
 
Study in Pink (forensic case study of Death)
Study in Pink (forensic case study of Death)Study in Pink (forensic case study of Death)
Study in Pink (forensic case study of Death)
memesologiesxd
 
Pharmacologically active constituents.pdf
Pharmacologically active constituents.pdfPharmacologically active constituents.pdf
Pharmacologically active constituents.pdf
Nistarini College, Purulia (W.B) India
 
Components of the Human Circulatory System.pptx
Components of the Human  Circulatory System.pptxComponents of the Human  Circulatory System.pptx
Components of the Human Circulatory System.pptx
autumnstreaks
 
Anti fungal agents Medicinal Chemistry III
Anti fungal agents Medicinal Chemistry  IIIAnti fungal agents Medicinal Chemistry  III
Anti fungal agents Medicinal Chemistry III
HRUTUJA WAGH
 
Issues in using AI in academic publishing.pdf
Issues in using AI in academic publishing.pdfIssues in using AI in academic publishing.pdf
Issues in using AI in academic publishing.pdf
Angelo Salatino
 
Animal Models for Biological and Clinical Research ppt 2.pptx
Animal Models for Biological and Clinical Research ppt 2.pptxAnimal Models for Biological and Clinical Research ppt 2.pptx
Animal Models for Biological and Clinical Research ppt 2.pptx
MahitaLaveti
 
Reticular formation_groups_organization_
Reticular formation_groups_organization_Reticular formation_groups_organization_
Reticular formation_groups_organization_
klynct
 
Hypothalamus_structure_nuclei_ functions.pptx
Hypothalamus_structure_nuclei_ functions.pptxHypothalamus_structure_nuclei_ functions.pptx
Hypothalamus_structure_nuclei_ functions.pptx
klynct
 
Carboxylic-Acid-Derivatives.lecture.presentation
Carboxylic-Acid-Derivatives.lecture.presentationCarboxylic-Acid-Derivatives.lecture.presentation
Carboxylic-Acid-Derivatives.lecture.presentation
GLAEXISAJULGA
 
Black hole and its division and categories
Black hole and its division and categoriesBlack hole and its division and categories
Black hole and its division and categories
MSafiullahALawi
 
Mycology:Characteristics of Ascomycetes Fungi
Mycology:Characteristics of Ascomycetes FungiMycology:Characteristics of Ascomycetes Fungi
Mycology:Characteristics of Ascomycetes Fungi
SAYANTANMALLICK5
 
Freshwater Biome Types, Characteristics and Factors
Freshwater Biome Types, Characteristics and FactorsFreshwater Biome Types, Characteristics and Factors
Freshwater Biome Types, Characteristics and Factors
mytriplemonlineshop
 
Cleaned_Expanded_Metal_Nanoparticles_Presentation.pptx
Cleaned_Expanded_Metal_Nanoparticles_Presentation.pptxCleaned_Expanded_Metal_Nanoparticles_Presentation.pptx
Cleaned_Expanded_Metal_Nanoparticles_Presentation.pptx
zainab98aug
 
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Professional Content Writing's
 
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Professional Content Writing's
 
Somato_Sensory _ somatomotor_Nervous_System.pptx
Somato_Sensory _ somatomotor_Nervous_System.pptxSomato_Sensory _ somatomotor_Nervous_System.pptx
Somato_Sensory _ somatomotor_Nervous_System.pptx
klynct
 
Controls over genes.ppt. Gene Expression
Controls over genes.ppt. Gene ExpressionControls over genes.ppt. Gene Expression
Controls over genes.ppt. Gene Expression
NABIHANAEEM2
 
Water Pollution control using microorganisms
Water Pollution control using microorganismsWater Pollution control using microorganisms
Water Pollution control using microorganisms
gerefam247
 
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
Sérgio Sacani
 
Study in Pink (forensic case study of Death)
Study in Pink (forensic case study of Death)Study in Pink (forensic case study of Death)
Study in Pink (forensic case study of Death)
memesologiesxd
 
Components of the Human Circulatory System.pptx
Components of the Human  Circulatory System.pptxComponents of the Human  Circulatory System.pptx
Components of the Human Circulatory System.pptx
autumnstreaks
 
Anti fungal agents Medicinal Chemistry III
Anti fungal agents Medicinal Chemistry  IIIAnti fungal agents Medicinal Chemistry  III
Anti fungal agents Medicinal Chemistry III
HRUTUJA WAGH
 
Issues in using AI in academic publishing.pdf
Issues in using AI in academic publishing.pdfIssues in using AI in academic publishing.pdf
Issues in using AI in academic publishing.pdf
Angelo Salatino
 
Animal Models for Biological and Clinical Research ppt 2.pptx
Animal Models for Biological and Clinical Research ppt 2.pptxAnimal Models for Biological and Clinical Research ppt 2.pptx
Animal Models for Biological and Clinical Research ppt 2.pptx
MahitaLaveti
 
Reticular formation_groups_organization_
Reticular formation_groups_organization_Reticular formation_groups_organization_
Reticular formation_groups_organization_
klynct
 
Hypothalamus_structure_nuclei_ functions.pptx
Hypothalamus_structure_nuclei_ functions.pptxHypothalamus_structure_nuclei_ functions.pptx
Hypothalamus_structure_nuclei_ functions.pptx
klynct
 
Carboxylic-Acid-Derivatives.lecture.presentation
Carboxylic-Acid-Derivatives.lecture.presentationCarboxylic-Acid-Derivatives.lecture.presentation
Carboxylic-Acid-Derivatives.lecture.presentation
GLAEXISAJULGA
 

Introduction to Parallel Computing

  • 2. Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials
  • 3. Serial Computing • Traditionally, software has been written for serial computation: • A problem is broken into a discrete series of instructions • Instructions are executed sequentially one after another • Executed on a single processor • Only one instruction may execute at any moment in time
  • 4. Parallel Computing ▪ Simultaneous use of multiple compute resources to solve a computational problem. ▪ Run on multiple CPUs ▪ Problem is decomposed into multiple parts that can be solved concurrently. ▪ Each part is decomposed into a set of instructions. ▪ Instructions are executed simultaneously on different CPUs
  • 5. Parallel Computer Architecture Example: Networks connect multiple stand- alone computers (nodes) to make larger parallel computer clusters. Compute Resources • Single Computer with multiple processors. • Arbitrary Number of Computers connected by a network. • Combination of both
  • 6. Parallel Computer Memory Architectures Shared Memory Interconnect Sharing the same address space
  • 7. Parallel Computer Memory Architectures Shared Memory Advantages: • Global address space provides a user-friendly programming perspective to memory • Data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs Disadvantages: • Lack of scalability between memory and CPUs. • Programmer responsibility for synchronization constructs that ensure "correct" access of global memory.
  • 8. Parallel Computer Memory Architectures Distributed Memory A communication network to connect inter-processor memory. Processors have their own local memory. Memory addresses in one processor do not map to another processor, so there is no concept of global address space across all processors. Changes it makes to its local memory have no effect on the memory of other processors. Task of the programmer to explicitly define how and when data is communicated. Synchronization between tasks is likewise the programmer's responsibility. The network "fabric" used for data transfer varies widely, Memory Memory Memory Memory CPU CPU CPU CPU
  • 9. Parallel Computer Memory Architectures Distributed Memory Advantages: • Scalable Memory: is scalable with the number of processors. • Each processor can rapidly access its own memory. • Cost effective: can use commodity, off-the-shelf processors and networking. Disadvantages: • Programmer is responsible for details associated with data communication between processors. • It may be difficult to map existing data structures, based on global memory, to this memory organization. • Non-uniform memory access times - data residing on a remote node takes longer to access than node local data.
  • 10. Parallel Computer Memory Architectures Hybrid Memory The largest and fastest computers in the world today employ both shared and distributed memory architectures The shared memory component can be a shared memory machine and/or graphics processing units (GPU). The distributed memory component is the networking of multiple shared memory/GPU machines, which know only about their own memory - not the memory on another machine. Network communications are required to move data from one machine to another. Trends indicate that this type of memory architecture will prevail and increase at the high end of computing for the foreseeable future, Memory Memory Memory Memory CPU CPU CPU CPU Memory Memory Memory Memory CPU CPU CPU CPU
  • 11. The Parallel Computing Terminology Supercomputing / High Performance Computing (HPC) : Using the world's fastest and largest computers to solve large problems. Node : a standalone "computer in a box". Usually comprised of multiple CPUs/processors/cores, memory, network interfaces, etc. Nodes are networked together to comprise a supercomputer. CPU / Processor / Core : It Depends..... (A Node has multiple Cores or Processors)
  • 12. LogicalView of a Supercomputer
  • 13. Limits and Costs of Parallel Programming Observed speedup of a code which has been parallelized, defined as: wall-clock time of serial execution wall-clock time of parallel execution Parallel programs contain  Serial Section  Parallel Section
  • 14. Amdhal’s Law Parallelizable Serial Time = 5 units Time = 3 units 2 3 4 51 1 2 3 4 5 Speedup is limited by the non- parallelizable/ serial portion of the work.
  • 16. Amdhal’s Law & Gustafson's Law How quickly can we complete analysis on a particular data set by increasing Processor count? Can we analyze more data in approx. same amount of time by increasing Processor count? Amdhal’s Law / Strong Scaling Gustafson’s Law / Weak Scaling
  • 17. Moving from Serial to Parallel Code • Can the problem be parallelized? • Calculation of the Fibonacci series (0,1,1,2,3,5,8,13,21,...) by use of the formula: • F(n) = F(n-1) + F(n-2) • Identify the program's hotspots • Identify bottlenecks in the program • Identify Data Dependencies and Task Dependencies (inhibitors to parallelism ) • Investigate other algorithms if possible & take advantage of optimized third party parallel software.
  • 18. Moving from Serial to Parallel Code Problem to solve Decomposition Subproblems/Tasks Parallel worker threads Assignment Identify & Resolve Data & Control Dependencies Mapping Processor Cores Execution on Parallel Machine Data Decomposition Task Decomposition Orchestration Specifying mechanism to divide work up among processes. (Static or Dynamic)
  • 19. Moving from Serial to Parallel Code : Decomposition Decomposition Divide computation into smaller parts(tasks) that can be executed concurrently There are two basic ways to partition computational work among parallel tasks 1. Domain/Data Decomposition Data associated with the problem is decomposed
  • 20. More on Data Decomposition For problems that operate on large amounts of data Data is divided up between CPUs : Each CPU has its own chunk of dataset to operate upon and then the results are collated. Which data should we partition? Input Data Output Data Intermediate Data Ensure Load Balancing : Equal sized tasks not necessarily equal size data sets. • Static Load Balancing • Dynamic Load Balancing
  • 21. 2. Functional/ Task Decomposition The problem is decomposed according to the work that must be done. Each task then performs a portion of the overall work . Moving from Serial to Parallel Code: Decomposition
  • 22. Decomposition : Example Dense Matrix-Vector Multiplication Computing y[i] only use ith row of A and b Task = > computing y[i] Task size is uniform No dependence between tasks All tasks need b
  • 23. • Does the partition define (at least an order of magnitude) more tasks than there are processors in your target computer? (Flexibility) • Does the partition avoid redundant computation and storage requirements? (Scalability) • Are tasks of comparable size? (Load Balancing) • Does the number of tasks scale with problem size? Increase in problem size should increase the number of tasks rather than the size of individual tasks. • Identify alternative partitions? You can maximize flexibility in subsequent design stages by considering alternatives now. • Investigate both domain and functional decompositions. Decomposition : What to look for
  • 24. • The order of statement execution affects the results of the program. • Multiple use of the same location(s) in storage by different tasks. Data Dependencies DO J = MYSTART,MYEND A(J) = A(J-1) * 2.0 END DO Task 1 Task2 ------ ------ X = 2 X = 4 . . . . ………… Y = X**2 Y = X**3 Loop carried dependence Loop independent data dependency
  • 25. DO J = MYSTART,MYEND A(J) = A(J-1) * 2.0 END DO Task 1 Task2 ------ ------ X = 2 X = 4 . . . . ………… Y = X**2 Y = X**3 If Task 2 has A(J) and task 1 has A(J-1) Distributed memory architecture - task 2 must obtain the value of A(J-1) from task 1 after task 1 finishes its computation Shared memory architecture - task 2 must read A(J-1) after task 1 updates it (Race Condition)The value ofY is dependent on: Distributed memory architecture - if or when the value of X is communicated between the tasks. Shared memory architecture - which task last stores the value of X. Data Dependencies
  • 26. • Distributed memory architectures - communicate required data at synchronization points. • Shared memory architectures -synchronize read/write operations between tasks. • Data Dependencies:- Mutual Exclusion Locks & Critical Sections • Task Dependencies:- Explicit or Implicit Synchronization points called Barriers Handling Dependencies
  • 27. GOAL : Assigning the tasks/ processes to Processors while Minimizing Parallel Processing Overheads • Maximize data locality • Minimize volume of data-exchange • Minimize frequency of interactions • Minimize contention and hot spots • Overlap computation with interactions • Selective data and computation replication Mapping
  • 29. HPC Solution Stack Pthreads Application Libraries (MPI/ OpenMP) C l o u d /I n h o u s e Cluster Management Job Scheduling Billing & Reporting Custom User Interface Compute Servers Storage Interconnect Management Servers Debuggers Dev and Perf Tools
  翻译: