SlideShare a Scribd company logo
Parallel Architecture




     Dr. Doug L. Hoffman
     Computer Science 330
     Spring 2002
Parallel Computers

  „ Definition: “A parallel computer is a
    collection of processiong elements that
    cooperate and communicate to solve
    large problems fast.”
  „ Questions about parallel computers:
    ƒ   How large a collection?
    ƒ   How powerful are processing elements?
    ƒ   How do they cooperate and communicate?
    ƒ   How are data transmitted?
    ƒ   What type of interconnection?
    ƒ   What are HW and SW primitives for programmer?
    ƒ   Does it translate into performance?
Parallel Processors
“Religion”
 „ The dream of computer architects since 1960:
   replicate processors to add performance vs. design
   a faster processor
 „ Led to innovative organization tied to particular
   programming models since
   “uniprocessors can’t keep going”
    ƒ e.g., uniprocessors must stop getting faster due to limit of speed
      of light: 1972, … , 1989
    ƒ Borders religious fervor: you must believe!
    ƒ Fervor damped some when 1990s companies went out of
      business: Thinking Machines, Kendall Square, ...
 „ Argument instead is the “pull” of opportunity of
   scalable performance, not the “push” of
   uniprocessor performance plateau
Opportunities:
   Scientific Computing
„ Nearly Unlimited Demand (Grand Challenge):
App                   Perf (GFLOPS)           Memory (GB)
48 hour weather                   0.1                   0.1
72 hour weather                   3                     1
Pharmaceutical design           100                    10
Global Change, Genome          1000                 1000


„ Successes in some real industries:
   ƒ   Petroleum: reservoir modeling
   ƒ   Automotive: crash simulation, drag analysis, engine
   ƒ   Aeronautics: airflow analysis, engine, structural mechanics
   ƒ   Pharmaceuticals: molecular modeling
   ƒ   Entertainment: full length movies (“Toy Story”)
Opportunities:
  Commercial Computing
„Throughput (Transactions per minute) vs. Time (1996)
„Speedup:       1      4     8 16      32     64    112
IBM RS6000 735 1438 3119
             1.00 1.96 4.24
Tandem Himilaya               3043 6067 12021 20918
                               1.00 1.99 3.95 6.87
   ƒ IBM performance hit 1=>4, good 4=>8
   ƒ Tandem scales: 112/16 = 7.0
„Others: File servers, eletronic CAD simulation (multiple
processes), WWW search engines
What level Parallelism?
  „ Bit level parallelism: 1970 to ­1985
     ƒ 4 bits, 8 bit, 16 bit, 32 bit microprocessors
  „ Instruction level parallelism (ILP):
    1985 through today
     ƒ   Pipelining
     ƒ   Superscalar
     ƒ   VLIW
     ƒ   Out­of­Order execution
     ƒ   Limits to benefits of ILP?
  „ Process Level or Thread level parallelism;
    mainstream for general purpose computing?
     ƒ Servers are parallel
     ƒ High end Desktop dual processor PC soon??
Parallel Architecture

„ Parallel Architecture extends traditional
  computer architecture with a
  communication architecture
  ƒ abstractions (HW/SW interface)
  ƒ organizational structure to realize abstraction
    efficiently
Fundamental Issues

 „ 3 Issues to characterize parallel
   machines
 1) Naming
 2) Synchronization
 3) Latency and Bandwidth
Parallel Framework

 „ Layers:
   ƒ Programming Model:
     ‚   Multiprogramming : lots of jobs, no communication
     ‚   Shared address space: communicate via memory
     ‚   Message passing: send and recieve messages
     ‚   Data Parallel: several agents operate on several data sets
         simultaneously and then exchange information globally and
         simultaneously (shared or message passing)

   ƒ Communication Abstraction:
     ‚ Shared address space: e.g., load, store, atomic swap
     ‚ Message passing: e.g., send, receive library calls
     ‚ Debate over this topic (ease of programming, scaling)
       => many hardware designs 1:1 programming model
Shared Address/Memory
Multiprocessor Model
„ Communicate via Load and Store
   ƒ Oldest and most popular model
„ Based on timesharing: processes on multiple processors vs. sharing
  single processor
„ process: a virtual address space
  and 1 thread of control
   ƒ Multiple processes can overlap (share), but ALL threads share a
      process address space
„ Writes to shared address space by one thread are visible to reads
  of other threads
   ƒ Usual model: share code, private stack, some shared heap,
      some private heap
Example: Small-Scale
   MP Designs
„ Memory: centralized with uniform memory access
  time (“uma”) and bus interconnect, I/O
„ Examples: Sun Enterprise 6000, SGI Challenge, Intel
  SystemPro
SMP Interconnect

„ Processors to Memory AND to I/O
„ Bus based: all memory locations equal access time so
  SMP = “Symmetric MP”
   ƒ Sharing limited BW as add processors, I/O
   ƒ (see Chapter 1, Figs 1­18/19, page 42­43 of
     [CSG96])
„ Crossbar: expensive to expand
„ Multistage network (less expensive to expand than
  crossbar with more BW)
„ “Dance Hall” designs: All processors on the left, all
  memories on the right
Small-Scale—Shared
Memory


„ Caches serve to:
  ƒ Increase bandwidth
    versus bus/memory
  ƒ Reduce latency of
    access
  ƒ Valuable for both
    private data and
    shared data
„ What about cache
  consistency?
What Does Coherency
Mean?
„ Informally:
   ƒ “Any read must return the most recent write”
   ƒ Too strict and too difficult to implement
„ Better:
   ƒ “Any write must eventually be seen by a read”
   ƒ All writes are seen in proper order (“serialization”)
„ Two rules to ensure this:
   ƒ “If P writes x and P1 reads it, P’s write will be seen by P1 if the
     read and write are sufficiently far apart”
   ƒ Writes to a single location are serialized:
     seen in one order
       ‚ Latest write will be seen
       ‚ Otherewise could see writes in illogical order
          (could see older value after a newer value)
Potential HW Coherency
 Solutions
„ Snooping Solution (Snoopy Bus):
  ƒ Send all requests for data to all processors
  ƒ Processors snoop to see if they have a copy and respond
    accordingly
  ƒ Requires broadcast, since caching information is at processors
  ƒ Works well with bus (natural broadcast medium)
  ƒ Dominates for small scale machines (most of the market)
„ Directory­Based Schemes
  ƒ Keep track of what is being shared in one centralized place
  ƒ Distributed memory => distributed directory for scalability
    (avoids bottlenecks)
  ƒ Send point­to­point requests to processors via network
  ƒ Scales better than Snooping
  ƒ Actually existed BEFORE Snooping­based schemes
Large-Scale MP Designs
„ Memory: distributed with non­uniform memory
  access time (“numa”) and scalable interconnect
  (distributed memory)

1 cycle


 40 cycles   100 cycles
                                    Low Latency
                                    High Reliability
Shared Address Model
Summary
 „ Each processor can name every physical location in the
   machine
 „ Each process can name all data it shares with other processes
 „ Data transfer via load and store
 „ Data size: byte, word, ... or cache blocks
 „ Uses virtual memory to map virtual to local or remote physical
 „ Memory hierarchy model applies: now communication moves
   data to local proc. cache (as load moves data from memory to
   cache)
    ƒ Latency, BW (cache block?),
      scalability when communicate?
Message Passing Model

„ Whole computers (CPU, memory, I/O devices)
  communicate as explicit I/O operations
   ƒ Essentially NUMA but integrated at I/O devices vs.
      memory system
„ Send specifies local buffer + receiving process on remote
  computer
„ Receive specifies sending process on remote computer +
  local buffer to place data
   ƒ Usually send includes process tag
      and receive has rule on tag: match 1, match any
   ƒ Synch: when send completes, when buffer free, when
      request accepted, receive wait for send
„ Send+receive => memory­memory copy, where each
  each supplies local address,
  AND does pairwise synchronization!
Message Passing Model
„ Send+receive => memory­memory copy, synchronization on OS
  even on 1 processor
„ History of message passing:
   ƒ Network topology important because could only send to
      immediate neighbor
   ƒ Typically synchronous, blocking send & receive
   ƒ Later DMA with non­blocking sends, DMA for receive into
      buffer until processor does receive, and then data is
      transferred to local memory
   ƒ Later SW libraries to allow arbitrary communication
„ Example: IBM SP­2, RS6000 workstations in racks
   ƒ Network Interface Card has Intel 960
   ƒ 8X8 Crossbar switch as communication building block
   ƒ 40 MByte/sec per link
Communication Models
„ Shared Memory
   ƒ Processors communicate with shared address space
   ƒ Easy on small­scale machines
   ƒ Advantages:
       ‚ Model of choice for uniprocessors, small­scale MPs
       ‚ Ease of programming
       ‚ Lower latency
       ‚ Easier to use hardware controlled caching
„ Message passing
   ƒ Processors have private memories,
     communicate via messages
   ƒ Advantages:
       ‚ Less hardware, easier to design
       ‚ Focuses attention on costly non­local operations
„ Can support either SW model on either HW base
Popular Flynn Categories
(e.g., -RAID level for MPPs)
„ SISD (Single Instruction Single Data)
   ƒ Uniprocessors
„ MISD (Multiple Instruction Single Data)
   ƒ ???
„ SIMD (Single Instruction Multiple Data)
   ƒ Examples: Illiac­IV, CM­2
      ‚ Simple programming model
      ‚ Low overhead
      ‚ Flexibility
      ‚ All custom integrated circuits
„ MIMD (Multiple Instruction Multiple Data)
   ƒ Examples: Sun Enterprise 5000, Cray T3D, SGI Origin
      ‚ Flexible
      ‚ Use off­the­shelf micros
Data Parallel Model

 „ Operations can be performed in parallel on each element
   of a large regular data structure, such as an array
 „ 1 Control Processsor broadcast to many PEs (see Ch. 1,
   Fig. 1­26, page 51 of [CSG96])
    ƒ When computers were large, could amortize the
       control portion of many replicated PEs
 „ Condition flag per PE so that can skip
 „ Data distributed in each memory
 „ Early 1980s VLSI => SIMD rebirth:
   32 1­bit PEs + memory on a chip was the PE
 „ Data parallel programming languages lay out data to
   processor
Data Parallel Model
 „ Vector processors have similar ISAs,
   but no data placement restriction
 „ SIMD led to Data Parallel Programming languages
 „ Advancing VLSI led to single chip FPUs and whole fast
   µProcs (SIMD less attractive)
 „ SIMD programming model led to
   Single Program Multiple Data (SPMD) model
    ƒ All processors execute identical program
 „ Data parallel programming languages still useful, do
   communication all at once:
    “Bulk Synchronous” phases in which all communicate
   after a global barrier
Convergence in Parallel
Architecture
„ Complete computers connected to scalable network via
  communication assist
„ Different programming models place different
  requirements on communication assist
   ƒ Shared address space: tight integration with memory
      to capture memory events that interact with others +
      to accept requests from other nodes
   ƒ Message passing: send messages quickly and respond
      to incoming messages: tag match, allocate buffer,
      transfer data, wait for receive posting
   ƒ Data Parallel: fast global synchronization
„ Hi Perf Fortran shared­memory, data parallel;
  Msg. Passing Inter. message passing library;
  both work on many machines, different implementations
Summary:
  Parallel Framework
                                          Programming Model
                                          Communication Abstraction
                                          Interconnection SW/OS
„ Layers:                                 Interconnection HW
  ƒ Programming Model:
     ‚   Multiprogramming : lots of jobs, no communication
     ‚   Shared address space: communicate via memory
     ‚   Message passing: send and recieve messages
     ‚   Data Parallel: several agents operate on several data sets
         simultaneously and then exchange information globally and
         simultaneously (shared or message passing)
  ƒ Communication Abstraction:
     ‚ Shared address space: e.g., load, store, atomic swap
     ‚ Message passing: e.g., send, recieve library calls
     ‚ Debate over this topic (ease of programming, scaling)
       => many hardware designs 1:1 programming model
Summary : Small-Scale
   MP Designs
„ Memory: centralized with uniform access time
  (“uma”) and bus interconnect
„ Examples: Sun Enterprise 5000 , SGI Challenge,
  Intel SystemPro
Summary

„ Caches contain all information on state of cached
  memory blocks
„ Snooping and Directory Protocols similar; bus makes
  snooping easier because of broadcast (snooping =>
  uniform memory access)
„ Directory has extra data structure to keep track of state
  of all cache blocks
„ Distributing directory => scalable shared address
  multiprocessor => Cache coherent, Non uniform
  memory access
Ad

More Related Content

What's hot (20)

10 Instruction Sets Characteristics
10  Instruction  Sets Characteristics10  Instruction  Sets Characteristics
10 Instruction Sets Characteristics
Jeanie Delos Arcos
 
Pgp pretty good privacy
Pgp pretty good privacyPgp pretty good privacy
Pgp pretty good privacy
Pawan Arya
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
Shashank Kapoor
 
Multithreading
MultithreadingMultithreading
Multithreading
Dr. A. B. Shinde
 
Distributed deadlock
Distributed deadlockDistributed deadlock
Distributed deadlock
Md. Mahedi Mahfuj
 
Software coding and testing
Software coding and testingSoftware coding and testing
Software coding and testing
Sandeep Kumar Nayak
 
Type Checking(Compiler Design) #ShareThisIfYouLike
Type Checking(Compiler Design) #ShareThisIfYouLikeType Checking(Compiler Design) #ShareThisIfYouLike
Type Checking(Compiler Design) #ShareThisIfYouLike
United International University
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Heman Pathak
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHM
vikas dhakane
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system model
Harshad Umredkar
 
OS - Process Concepts
OS - Process ConceptsOS - Process Concepts
OS - Process Concepts
Mukesh Chinta
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed system
ishapadhy
 
TinyOS
TinyOSTinyOS
TinyOS
Sneha Shodhan
 
Scheduling algorithms
Scheduling algorithmsScheduling algorithms
Scheduling algorithms
Chankey Pathak
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer Architecture
Pankaj Kumar Jain
 
Data Encryption Standard (DES)
Data Encryption Standard (DES)Data Encryption Standard (DES)
Data Encryption Standard (DES)
Haris Ahmed
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dos
vanamali_vanu
 
Array Processor
Array ProcessorArray Processor
Array Processor
Anshuman Biswal
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer Architecture
InteX Research Lab
 
pipelining
pipeliningpipelining
pipelining
Siddique Ibrahim
 
10 Instruction Sets Characteristics
10  Instruction  Sets Characteristics10  Instruction  Sets Characteristics
10 Instruction Sets Characteristics
Jeanie Delos Arcos
 
Pgp pretty good privacy
Pgp pretty good privacyPgp pretty good privacy
Pgp pretty good privacy
Pawan Arya
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
Shashank Kapoor
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Heman Pathak
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHM
vikas dhakane
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system model
Harshad Umredkar
 
OS - Process Concepts
OS - Process ConceptsOS - Process Concepts
OS - Process Concepts
Mukesh Chinta
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed system
ishapadhy
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer Architecture
Pankaj Kumar Jain
 
Data Encryption Standard (DES)
Data Encryption Standard (DES)Data Encryption Standard (DES)
Data Encryption Standard (DES)
Haris Ahmed
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dos
vanamali_vanu
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer Architecture
InteX Research Lab
 

Viewers also liked (7)

Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
Mr SMAK
 
Advanced Computer Architecture Chapter 123 Problems Solution
Advanced Computer Architecture Chapter 123 Problems SolutionAdvanced Computer Architecture Chapter 123 Problems Solution
Advanced Computer Architecture Chapter 123 Problems Solution
Joe Christensen
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
Mr SMAK
 
Advanced Computer Architecture chapter 5 problem solutions
Advanced Computer  Architecture  chapter 5 problem solutionsAdvanced Computer  Architecture  chapter 5 problem solutions
Advanced Computer Architecture chapter 5 problem solutions
Joe Christensen
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
Yasir Khan
 
Slideshare Powerpoint presentation
Slideshare Powerpoint presentationSlideshare Powerpoint presentation
Slideshare Powerpoint presentation
elliehood
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
Mr SMAK
 
Advanced Computer Architecture Chapter 123 Problems Solution
Advanced Computer Architecture Chapter 123 Problems SolutionAdvanced Computer Architecture Chapter 123 Problems Solution
Advanced Computer Architecture Chapter 123 Problems Solution
Joe Christensen
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
Mr SMAK
 
Advanced Computer Architecture chapter 5 problem solutions
Advanced Computer  Architecture  chapter 5 problem solutionsAdvanced Computer  Architecture  chapter 5 problem solutions
Advanced Computer Architecture chapter 5 problem solutions
Joe Christensen
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
Yasir Khan
 
Slideshare Powerpoint presentation
Slideshare Powerpoint presentationSlideshare Powerpoint presentation
Slideshare Powerpoint presentation
elliehood
 
Ad

Similar to Parallel architecture (20)

Dedicated fully parallel architecture
Dedicated fully parallel architectureDedicated fully parallel architecture
Dedicated fully parallel architecture
Ghufran Hasan
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
Sudarsun Santhiappan
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
mona_hakmy
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
tech2click
 
Lecture1
Lecture1Lecture1
Lecture1
Asad Abbas
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
VAISHNAVI MADHAN
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
KRamasamy2
 
Par com
Par comPar com
Par com
tttoracle
 
distributed system lab materials about ad
distributed system lab materials about addistributed system lab materials about ad
distributed system lab materials about ad
milkesa13
 
multithread in multiprocessor architecture
multithread in multiprocessor architecturemultithread in multiprocessor architecture
multithread in multiprocessor architecture
myjuni04
 
Mainframe
MainframeMainframe
Mainframe
Kanika Kapoor
 
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
aminnezarat
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
Renato Lucindo
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
MEPCO Schlenk Engineering College
 
2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semester2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semester
Rafi Ullah
 
Introduction to parallel computing
Introduction to parallel computingIntroduction to parallel computing
Introduction to parallel computing
VIKAS SINGH BHADOURIA
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
VIKAS SINGH BHADOURIA
 
Massively Parallel Architectures
Massively Parallel ArchitecturesMassively Parallel Architectures
Massively Parallel Architectures
Jason Hearne-McGuiness
 
intro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxintro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptx
ssuser413a98
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
Mehul Patel
 
Dedicated fully parallel architecture
Dedicated fully parallel architectureDedicated fully parallel architecture
Dedicated fully parallel architecture
Ghufran Hasan
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
mona_hakmy
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
tech2click
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
VAISHNAVI MADHAN
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
KRamasamy2
 
distributed system lab materials about ad
distributed system lab materials about addistributed system lab materials about ad
distributed system lab materials about ad
milkesa13
 
multithread in multiprocessor architecture
multithread in multiprocessor architecturemultithread in multiprocessor architecture
multithread in multiprocessor architecture
myjuni04
 
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
aminnezarat
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
Renato Lucindo
 
2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semester2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semester
Rafi Ullah
 
intro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxintro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptx
ssuser413a98
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
Mehul Patel
 
Ad

More from Mr SMAK (20)

Fyp list batch-2009 (project approval -rejected list)
Fyp list batch-2009 (project approval -rejected list)Fyp list batch-2009 (project approval -rejected list)
Fyp list batch-2009 (project approval -rejected list)
Mr SMAK
 
Assigments2009
Assigments2009Assigments2009
Assigments2009
Mr SMAK
 
Week1
Week1Week1
Week1
Mr SMAK
 
Evaluation of cellular network
Evaluation of cellular networkEvaluation of cellular network
Evaluation of cellular network
Mr SMAK
 
Common protocols
Common protocolsCommon protocols
Common protocols
Mr SMAK
 
Cellular network
Cellular networkCellular network
Cellular network
Mr SMAK
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
Mr SMAK
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
Mr SMAK
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
Mr SMAK
 
Chapter 2 ASE
Chapter 2 ASEChapter 2 ASE
Chapter 2 ASE
Mr SMAK
 
Structure of project plan and schedule
Structure of project plan and scheduleStructure of project plan and schedule
Structure of project plan and schedule
Mr SMAK
 
Proposal format
Proposal formatProposal format
Proposal format
Mr SMAK
 
Proposal announcement batch2009
Proposal announcement batch2009Proposal announcement batch2009
Proposal announcement batch2009
Mr SMAK
 
List ofsuparco projectsforuniversities
List ofsuparco projectsforuniversitiesList ofsuparco projectsforuniversities
List ofsuparco projectsforuniversities
Mr SMAK
 
Fyp timeline & assessment policy batch 2009
Fyp timeline & assessment policy batch 2009Fyp timeline & assessment policy batch 2009
Fyp timeline & assessment policy batch 2009
Mr SMAK
 
Fyp registration form batch 2009
Fyp registration form batch 2009Fyp registration form batch 2009
Fyp registration form batch 2009
Mr SMAK
 
Fyp ideas
Fyp ideasFyp ideas
Fyp ideas
Mr SMAK
 
Final year projects orientation 2009
Final year projects orientation 2009Final year projects orientation 2009
Final year projects orientation 2009
Mr SMAK
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
Mr SMAK
 
Fyp list batch-2009 (project approval -rejected list)
Fyp list batch-2009 (project approval -rejected list)Fyp list batch-2009 (project approval -rejected list)
Fyp list batch-2009 (project approval -rejected list)
Mr SMAK
 
Assigments2009
Assigments2009Assigments2009
Assigments2009
Mr SMAK
 
Evaluation of cellular network
Evaluation of cellular networkEvaluation of cellular network
Evaluation of cellular network
Mr SMAK
 
Common protocols
Common protocolsCommon protocols
Common protocols
Mr SMAK
 
Cellular network
Cellular networkCellular network
Cellular network
Mr SMAK
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
Mr SMAK
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
Mr SMAK
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
Mr SMAK
 
Chapter 2 ASE
Chapter 2 ASEChapter 2 ASE
Chapter 2 ASE
Mr SMAK
 
Structure of project plan and schedule
Structure of project plan and scheduleStructure of project plan and schedule
Structure of project plan and schedule
Mr SMAK
 
Proposal format
Proposal formatProposal format
Proposal format
Mr SMAK
 
Proposal announcement batch2009
Proposal announcement batch2009Proposal announcement batch2009
Proposal announcement batch2009
Mr SMAK
 
List ofsuparco projectsforuniversities
List ofsuparco projectsforuniversitiesList ofsuparco projectsforuniversities
List ofsuparco projectsforuniversities
Mr SMAK
 
Fyp timeline & assessment policy batch 2009
Fyp timeline & assessment policy batch 2009Fyp timeline & assessment policy batch 2009
Fyp timeline & assessment policy batch 2009
Mr SMAK
 
Fyp registration form batch 2009
Fyp registration form batch 2009Fyp registration form batch 2009
Fyp registration form batch 2009
Mr SMAK
 
Fyp ideas
Fyp ideasFyp ideas
Fyp ideas
Mr SMAK
 
Final year projects orientation 2009
Final year projects orientation 2009Final year projects orientation 2009
Final year projects orientation 2009
Mr SMAK
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
Mr SMAK
 

Recently uploaded (20)

In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptxIn-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
aptyai
 
Understanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdfUnderstanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdf
Fulcrum Concepts, LLC
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
HusseinMalikMammadli
 
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
UXPA Boston
 
Dark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanizationDark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanization
Jakub Šimek
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025
Damco Salesforce Services
 
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdfGoogle DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
derrickjswork
 
Right to liberty and security of a person.pdf
Right to liberty and security of a person.pdfRight to liberty and security of a person.pdf
Right to liberty and security of a person.pdf
danielbraico197
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdfICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
Eryk Budi Pratama
 
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Preeti Jha
 
Build With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdfBuild With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdf
Google Developer Group - Harare
 
AI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamsonAI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamson
UXPA Boston
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptxIn-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
aptyai
 
Understanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdfUnderstanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdf
Fulcrum Concepts, LLC
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
HusseinMalikMammadli
 
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
UXPA Boston
 
Dark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanizationDark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanization
Jakub Šimek
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025
Damco Salesforce Services
 
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdfGoogle DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
derrickjswork
 
Right to liberty and security of a person.pdf
Right to liberty and security of a person.pdfRight to liberty and security of a person.pdf
Right to liberty and security of a person.pdf
danielbraico197
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdfICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
Eryk Budi Pratama
 
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Preeti Jha
 
AI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamsonAI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamson
UXPA Boston
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 

Parallel architecture

  • 1. Parallel Architecture Dr. Doug L. Hoffman Computer Science 330 Spring 2002
  • 2. Parallel Computers „ Definition: “A parallel computer is a collection of processiong elements that cooperate and communicate to solve large problems fast.” „ Questions about parallel computers: ƒ How large a collection? ƒ How powerful are processing elements? ƒ How do they cooperate and communicate? ƒ How are data transmitted? ƒ What type of interconnection? ƒ What are HW and SW primitives for programmer? ƒ Does it translate into performance?
  • 3. Parallel Processors “Religion” „ The dream of computer architects since 1960: replicate processors to add performance vs. design a faster processor „ Led to innovative organization tied to particular programming models since “uniprocessors can’t keep going” ƒ e.g., uniprocessors must stop getting faster due to limit of speed of light: 1972, … , 1989 ƒ Borders religious fervor: you must believe! ƒ Fervor damped some when 1990s companies went out of business: Thinking Machines, Kendall Square, ... „ Argument instead is the “pull” of opportunity of scalable performance, not the “push” of uniprocessor performance plateau
  • 4. Opportunities: Scientific Computing „ Nearly Unlimited Demand (Grand Challenge): App Perf (GFLOPS) Memory (GB) 48 hour weather 0.1 0.1 72 hour weather 3 1 Pharmaceutical design 100 10 Global Change, Genome 1000 1000 „ Successes in some real industries: ƒ Petroleum: reservoir modeling ƒ Automotive: crash simulation, drag analysis, engine ƒ Aeronautics: airflow analysis, engine, structural mechanics ƒ Pharmaceuticals: molecular modeling ƒ Entertainment: full length movies (“Toy Story”)
  • 5. Opportunities: Commercial Computing „Throughput (Transactions per minute) vs. Time (1996) „Speedup: 1 4 8 16 32 64 112 IBM RS6000 735 1438 3119 1.00 1.96 4.24 Tandem Himilaya 3043 6067 12021 20918 1.00 1.99 3.95 6.87 ƒ IBM performance hit 1=>4, good 4=>8 ƒ Tandem scales: 112/16 = 7.0 „Others: File servers, eletronic CAD simulation (multiple processes), WWW search engines
  • 6. What level Parallelism? „ Bit level parallelism: 1970 to ­1985 ƒ 4 bits, 8 bit, 16 bit, 32 bit microprocessors „ Instruction level parallelism (ILP): 1985 through today ƒ Pipelining ƒ Superscalar ƒ VLIW ƒ Out­of­Order execution ƒ Limits to benefits of ILP? „ Process Level or Thread level parallelism; mainstream for general purpose computing? ƒ Servers are parallel ƒ High end Desktop dual processor PC soon??
  • 7. Parallel Architecture „ Parallel Architecture extends traditional computer architecture with a communication architecture ƒ abstractions (HW/SW interface) ƒ organizational structure to realize abstraction efficiently
  • 8. Fundamental Issues „ 3 Issues to characterize parallel machines 1) Naming 2) Synchronization 3) Latency and Bandwidth
  • 9. Parallel Framework „ Layers: ƒ Programming Model: ‚ Multiprogramming : lots of jobs, no communication ‚ Shared address space: communicate via memory ‚ Message passing: send and recieve messages ‚ Data Parallel: several agents operate on several data sets simultaneously and then exchange information globally and simultaneously (shared or message passing) ƒ Communication Abstraction: ‚ Shared address space: e.g., load, store, atomic swap ‚ Message passing: e.g., send, receive library calls ‚ Debate over this topic (ease of programming, scaling) => many hardware designs 1:1 programming model
  • 10. Shared Address/Memory Multiprocessor Model „ Communicate via Load and Store ƒ Oldest and most popular model „ Based on timesharing: processes on multiple processors vs. sharing single processor „ process: a virtual address space and 1 thread of control ƒ Multiple processes can overlap (share), but ALL threads share a process address space „ Writes to shared address space by one thread are visible to reads of other threads ƒ Usual model: share code, private stack, some shared heap, some private heap
  • 11. Example: Small-Scale MP Designs „ Memory: centralized with uniform memory access time (“uma”) and bus interconnect, I/O „ Examples: Sun Enterprise 6000, SGI Challenge, Intel SystemPro
  • 12. SMP Interconnect „ Processors to Memory AND to I/O „ Bus based: all memory locations equal access time so SMP = “Symmetric MP” ƒ Sharing limited BW as add processors, I/O ƒ (see Chapter 1, Figs 1­18/19, page 42­43 of [CSG96]) „ Crossbar: expensive to expand „ Multistage network (less expensive to expand than crossbar with more BW) „ “Dance Hall” designs: All processors on the left, all memories on the right
  • 13. Small-Scale—Shared Memory „ Caches serve to: ƒ Increase bandwidth versus bus/memory ƒ Reduce latency of access ƒ Valuable for both private data and shared data „ What about cache consistency?
  • 14. What Does Coherency Mean? „ Informally: ƒ “Any read must return the most recent write” ƒ Too strict and too difficult to implement „ Better: ƒ “Any write must eventually be seen by a read” ƒ All writes are seen in proper order (“serialization”) „ Two rules to ensure this: ƒ “If P writes x and P1 reads it, P’s write will be seen by P1 if the read and write are sufficiently far apart” ƒ Writes to a single location are serialized: seen in one order ‚ Latest write will be seen ‚ Otherewise could see writes in illogical order (could see older value after a newer value)
  • 15. Potential HW Coherency Solutions „ Snooping Solution (Snoopy Bus): ƒ Send all requests for data to all processors ƒ Processors snoop to see if they have a copy and respond accordingly ƒ Requires broadcast, since caching information is at processors ƒ Works well with bus (natural broadcast medium) ƒ Dominates for small scale machines (most of the market) „ Directory­Based Schemes ƒ Keep track of what is being shared in one centralized place ƒ Distributed memory => distributed directory for scalability (avoids bottlenecks) ƒ Send point­to­point requests to processors via network ƒ Scales better than Snooping ƒ Actually existed BEFORE Snooping­based schemes
  • 16. Large-Scale MP Designs „ Memory: distributed with non­uniform memory access time (“numa”) and scalable interconnect (distributed memory) 1 cycle 40 cycles 100 cycles Low Latency High Reliability
  • 17. Shared Address Model Summary „ Each processor can name every physical location in the machine „ Each process can name all data it shares with other processes „ Data transfer via load and store „ Data size: byte, word, ... or cache blocks „ Uses virtual memory to map virtual to local or remote physical „ Memory hierarchy model applies: now communication moves data to local proc. cache (as load moves data from memory to cache) ƒ Latency, BW (cache block?), scalability when communicate?
  • 18. Message Passing Model „ Whole computers (CPU, memory, I/O devices) communicate as explicit I/O operations ƒ Essentially NUMA but integrated at I/O devices vs. memory system „ Send specifies local buffer + receiving process on remote computer „ Receive specifies sending process on remote computer + local buffer to place data ƒ Usually send includes process tag and receive has rule on tag: match 1, match any ƒ Synch: when send completes, when buffer free, when request accepted, receive wait for send „ Send+receive => memory­memory copy, where each each supplies local address, AND does pairwise synchronization!
  • 19. Message Passing Model „ Send+receive => memory­memory copy, synchronization on OS even on 1 processor „ History of message passing: ƒ Network topology important because could only send to immediate neighbor ƒ Typically synchronous, blocking send & receive ƒ Later DMA with non­blocking sends, DMA for receive into buffer until processor does receive, and then data is transferred to local memory ƒ Later SW libraries to allow arbitrary communication „ Example: IBM SP­2, RS6000 workstations in racks ƒ Network Interface Card has Intel 960 ƒ 8X8 Crossbar switch as communication building block ƒ 40 MByte/sec per link
  • 20. Communication Models „ Shared Memory ƒ Processors communicate with shared address space ƒ Easy on small­scale machines ƒ Advantages: ‚ Model of choice for uniprocessors, small­scale MPs ‚ Ease of programming ‚ Lower latency ‚ Easier to use hardware controlled caching „ Message passing ƒ Processors have private memories, communicate via messages ƒ Advantages: ‚ Less hardware, easier to design ‚ Focuses attention on costly non­local operations „ Can support either SW model on either HW base
  • 21. Popular Flynn Categories (e.g., -RAID level for MPPs) „ SISD (Single Instruction Single Data) ƒ Uniprocessors „ MISD (Multiple Instruction Single Data) ƒ ??? „ SIMD (Single Instruction Multiple Data) ƒ Examples: Illiac­IV, CM­2 ‚ Simple programming model ‚ Low overhead ‚ Flexibility ‚ All custom integrated circuits „ MIMD (Multiple Instruction Multiple Data) ƒ Examples: Sun Enterprise 5000, Cray T3D, SGI Origin ‚ Flexible ‚ Use off­the­shelf micros
  • 22. Data Parallel Model „ Operations can be performed in parallel on each element of a large regular data structure, such as an array „ 1 Control Processsor broadcast to many PEs (see Ch. 1, Fig. 1­26, page 51 of [CSG96]) ƒ When computers were large, could amortize the control portion of many replicated PEs „ Condition flag per PE so that can skip „ Data distributed in each memory „ Early 1980s VLSI => SIMD rebirth: 32 1­bit PEs + memory on a chip was the PE „ Data parallel programming languages lay out data to processor
  • 23. Data Parallel Model „ Vector processors have similar ISAs, but no data placement restriction „ SIMD led to Data Parallel Programming languages „ Advancing VLSI led to single chip FPUs and whole fast µProcs (SIMD less attractive) „ SIMD programming model led to Single Program Multiple Data (SPMD) model ƒ All processors execute identical program „ Data parallel programming languages still useful, do communication all at once: “Bulk Synchronous” phases in which all communicate after a global barrier
  • 24. Convergence in Parallel Architecture „ Complete computers connected to scalable network via communication assist „ Different programming models place different requirements on communication assist ƒ Shared address space: tight integration with memory to capture memory events that interact with others + to accept requests from other nodes ƒ Message passing: send messages quickly and respond to incoming messages: tag match, allocate buffer, transfer data, wait for receive posting ƒ Data Parallel: fast global synchronization „ Hi Perf Fortran shared­memory, data parallel; Msg. Passing Inter. message passing library; both work on many machines, different implementations
  • 25. Summary: Parallel Framework Programming Model Communication Abstraction Interconnection SW/OS „ Layers: Interconnection HW ƒ Programming Model: ‚ Multiprogramming : lots of jobs, no communication ‚ Shared address space: communicate via memory ‚ Message passing: send and recieve messages ‚ Data Parallel: several agents operate on several data sets simultaneously and then exchange information globally and simultaneously (shared or message passing) ƒ Communication Abstraction: ‚ Shared address space: e.g., load, store, atomic swap ‚ Message passing: e.g., send, recieve library calls ‚ Debate over this topic (ease of programming, scaling) => many hardware designs 1:1 programming model
  • 26. Summary : Small-Scale MP Designs „ Memory: centralized with uniform access time (“uma”) and bus interconnect „ Examples: Sun Enterprise 5000 , SGI Challenge, Intel SystemPro
  • 27. Summary „ Caches contain all information on state of cached memory blocks „ Snooping and Directory Protocols similar; bus makes snooping easier because of broadcast (snooping => uniform memory access) „ Directory has extra data structure to keep track of state of all cache blocks „ Distributing directory => scalable shared address multiprocessor => Cache coherent, Non uniform memory access
  翻译: