SlideShare a Scribd company logo
Insertion Tree PhasersEfficient and Scalable Barrier Synchronization for Fine-grained ParallelismStefan MarrS. Verhaegen, B. De Fraine, T. D’Hondt, W. De MeuterSoftware Languages LabVrijeUniversiteitBrussel
AgendaIntroductionBarriers, PhasersInsertion Tree PhasersInsertion TreePhaser AlgorithmEvaluationSummary9/26/102
BarriersSynchronizing parallel activitiesHigh productivity: easy to get right Mostly for scientific computingMany-core evolutionSynchronizing dynamic and irregular problemsRequires low-overhead dynamic hierarchical barriers9/26/103Introductiont1p1t2p1t3p1t1p2t2p2t3p2t1p3t2p3t3p3
t1p1Phasers9/26/104IntroductionExtension of X10 clocksClocks: dynamic two-phase barrier for fork/join parallelismRegistration modes for barrierEnables expression of producer/consumer relationSingle statementsExecuted only by single thread, avoids duplicated barrier operationst1p2t2p2t3p2t2p2t3p2t2p3t3p3
Hierarchical Phasers9/26/105IntroductionShirako & Sarkar in Proc. of IEEE IPDPS 2010 [1]Array accessList accessFirst scalable implementation strategyPredefined tree structureDegree, i.e., tree arityMax. number of tiers, i.e., heightComposed from phasersProblematicNone dynamic structureTwo-phase support incompleteLeaves design decisions open PhaserTier 0subsubTier 1subsubsubsubTier 2(leafs)sigsigsigsigsigsigsigsigA1A2A3A4A5A6A7A8
Open Questions withHierarchical PhasersDynamic tree construction, or on initialization?Tradeoffs for atomic operations, overhead of joining/leaving phaserHow are operations synchronized?Tradeoffs for overheads and restrictions on parallelismGarbage collection problem for dropped participantsKeeps list of synchronization objects incl. dropped participantsAfter reaching max. #participantsIs the tree rebalanced? (Hint at it for dropped nodes)Two-phase barrier support does not hide latency for original phasers9/26/106Introduction
Insertion Tree Phasers9/26/107
Design GoalSupport for full generality of Phaser propertiesTwo-phase supportSignal-only/wait-only for producers/consumersSingle statementFull dynamicity: fine-grained hierarchical fork/joinAdaptation of existing, scalable approachesDissemination barrier not adaptableRemaining are tree-based approaches9/26/108Insertion TreePhaserAlgorithm
Insertion TreeGoalsStable, i.e., minimized tree modificationsAvoid inconsistent synchronization informationMaximizing parallel operationsSolution: Insertion TreeInverted treeNo removalComplete smallest subtree first9/26/109Insertion TreePhaserAlgorithm1/2
Insertion Tree9/26/1010Insertion TreePhaserAlgorithm2/2
Insertion Tree9/26/1011Insertion TreePhaserAlgorithm2/21
Insertion Tree9/26/1012Insertion TreePhaserAlgorithm2/2h112
Insertion Tree9/26/1013Insertion TreePhaserAlgorithm2/2h2h1123
Insertion Tree9/26/1014Insertion TreePhaserAlgorithm2/2h2h1h31234
Insertion Tree9/26/1015Insertion TreePhaserAlgorithm2/2h4h2h1h312345
Insertion Tree9/26/1016Insertion TreePhaserAlgorithm2/2h4h2h6h1h3h5h712345678
Determining the Insertion PointdefgetNextInsertNode(tree):  result = tree.lastNodei = tree.numLeaveswhileimod 2 == 0:    result = result.parenti = i/2return result  # this is for 2-ary trees  # is adaptable for n-ary trees, too9/26/1017Insertion TreePhaserAlgorithm
Synchronization Tree*9/26/1018Insertion TreePhaserAlgorithmPhaserphase:  000Phase counter0000woHelper nodesWait-only flagPhase counter0000rsmdParticipant nodesResume flag*)	is simplified, leaves out registration modesA1A2A3A4
Announcing Synchronization9/26/1019Insertion Tree Phaser AlgorithmPhaserphase:  00000000000A1A2A3A4
Announcing Synchronization9/26/1020Insertion Tree Phaser AlgorithmPhaserphase:  0000110001rsmd1rsmdA1A2A3A4
Announcing Synchronization9/26/1021Insertion Tree Phaser AlgorithmPhaserphase:  00011111rsmd1rsmd1rsmd1rsmdA1A2A3A4
Announcing Synchronization9/26/1022Insertion Tree Phaser AlgorithmPhaserphase:  00111111rsmd1rsmd1rsmd1rsmdA1A2A3A4
Announcing Synchronization9/26/1023Insertion Tree Phaser AlgorithmPhaserphase:  01111111rsmd1rsmd1rsmd1rsmdA1A2A3A4
Announcing Synchronization9/26/1024Insertion Tree Phaser AlgorithmSynchronization reached.Continue to next phase.Phaserphase:  11111111rsmd1rsmd1rsmd1rsmdA1A2A3A4
Dropping Participants9/26/1025Insertion TreePhaserAlgorithmPhaserphase:  0010011001rsmd1rsmdA1A2A3A4
Dropping Participants9/26/1026Insertion TreePhaserAlgorithmPhaserphase:  0010wo1101rsmd1rsmdA1A2A3A4
h1:RDropping Participants9/26/1027Insertion TreePhaserAlgorithmPhaserphase:  001wowo111rsmd1rsmdA1A2A3A4
h1:RDropping Participants9/26/1028Insertion TreePhaserAlgorithmPhaserphase:  0wo1wowo111rsmd1rsmdA1A2A3A4
Dropping Participants9/26/1029Insertion TreePhaserAlgorithmSynchronization reached.Continue to next phase.Phaserphase:  1h1:Rwo1wowo111rsmd1rsmdA1A2A3A4
h1:RDropping Participants9/26/1030Insertion TreePhaserAlgorithmPhaserphase:  1wo1h1:Lwowo111rsmd1rsmdA1A2A3A4
Adding New Participants9/26/1031Insertion TreePhaserAlgorithmPhaserphase:  898899rsmd89rsmd8A1A2A3A4
Adding New Participants9/26/1032Insertion TreePhaserAlgorithmPhaserphase:  89888899rsmd89rsmd8A1A2A3A4
Adding New Participants9/26/1033Insertion TreePhaserAlgorithmPhaserphase:  8-188+198899rsmd89rsmd8A1A2A3A4
Adding New Participants9/26/1034Insertion TreePhaserAlgorithmPhaserphase:  888propagate phase count98899rsmd89rsmd8A1A2A3A4
Evaluation9/26/1035
Two-Phaser Barrier Operation9/26/1036Evaluation
Overhead: Two-Phase vs. Classic9/26/1037Evaluation
Use as Drop-In Replacement for SPLASH-2Speedup compared to TmcSpinBarrier9/26/1038Evaluation
SummaryScalable and efficient approach to PhasersDocuments implementationBased on fully dynamic insertion treeOvercomes limitations of existing approachesUsable as drop-in replacementFuture workScalability beyond 59 coresOptimization for other memory architectures9/26/1039Stefan Marr, IEEE HPCC 2010, Insertion TreePhasers

More Related Content

Similar to Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fine-grained Parallelism (20)

High Accuracy Distance Measurement for Bluetooth Based on Phase Ranging
High Accuracy Distance Measurement for Bluetooth Based on Phase RangingHigh Accuracy Distance Measurement for Bluetooth Based on Phase Ranging
High Accuracy Distance Measurement for Bluetooth Based on Phase Ranging
Ealwan Lee
 
8085 interrupts
8085 interrupts8085 interrupts
8085 interrupts
Meena Rathore
 
MPC8313E PowerQUICC II Pro Processor
MPC8313E PowerQUICC II Pro ProcessorMPC8313E PowerQUICC II Pro Processor
MPC8313E PowerQUICC II Pro Processor
Premier Farnell
 
IRJET- MASH 1-2 Delta Sigma Modulator with Quantizer for Fractional-N Frequen...
IRJET- MASH 1-2 Delta Sigma Modulator with Quantizer for Fractional-N Frequen...IRJET- MASH 1-2 Delta Sigma Modulator with Quantizer for Fractional-N Frequen...
IRJET- MASH 1-2 Delta Sigma Modulator with Quantizer for Fractional-N Frequen...
IRJET Journal
 
Aw25293296
Aw25293296Aw25293296
Aw25293296
IJERA Editor
 
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate TopologyIRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET Journal
 
ADS1256 library documentation
ADS1256 library documentationADS1256 library documentation
ADS1256 library documentation
CuriousScientist
 
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET Journal
 
Crash course in verilog
Crash course in verilogCrash course in verilog
Crash course in verilog
Pantech ProLabs India Pvt Ltd
 
361
361361
361
dhandeesh
 
Efficient Design of Reversible Multiplexers with Low Quantum Cost
Efficient Design of Reversible Multiplexers with Low Quantum CostEfficient Design of Reversible Multiplexers with Low Quantum Cost
Efficient Design of Reversible Multiplexers with Low Quantum Cost
IJERA Editor
 
Signal descriptors of 8086
Signal descriptors of 8086Signal descriptors of 8086
Signal descriptors of 8086
aviban
 
ICIECA 2014 Paper 10
ICIECA 2014 Paper 10ICIECA 2014 Paper 10
ICIECA 2014 Paper 10
Association of Scientists, Developers and Faculties
 
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adderFpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Malik Tauqir Hasan
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
Hideyuki Tanaka
 
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET Journal
 
Ad4103173176
Ad4103173176Ad4103173176
Ad4103173176
IJERA Editor
 
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...
Masashi Imano
 
Site Operation Manual for a Typical Air Monitoring Site
Site Operation Manual for a Typical Air Monitoring SiteSite Operation Manual for a Typical Air Monitoring Site
Site Operation Manual for a Typical Air Monitoring Site
TAMUK
 
Building communication platforms for the IoT
Building communication platforms for the IoTBuilding communication platforms for the IoT
Building communication platforms for the IoT
Troels Brødsgaard
 
High Accuracy Distance Measurement for Bluetooth Based on Phase Ranging
High Accuracy Distance Measurement for Bluetooth Based on Phase RangingHigh Accuracy Distance Measurement for Bluetooth Based on Phase Ranging
High Accuracy Distance Measurement for Bluetooth Based on Phase Ranging
Ealwan Lee
 
MPC8313E PowerQUICC II Pro Processor
MPC8313E PowerQUICC II Pro ProcessorMPC8313E PowerQUICC II Pro Processor
MPC8313E PowerQUICC II Pro Processor
Premier Farnell
 
IRJET- MASH 1-2 Delta Sigma Modulator with Quantizer for Fractional-N Frequen...
IRJET- MASH 1-2 Delta Sigma Modulator with Quantizer for Fractional-N Frequen...IRJET- MASH 1-2 Delta Sigma Modulator with Quantizer for Fractional-N Frequen...
IRJET- MASH 1-2 Delta Sigma Modulator with Quantizer for Fractional-N Frequen...
IRJET Journal
 
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate TopologyIRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET Journal
 
ADS1256 library documentation
ADS1256 library documentationADS1256 library documentation
ADS1256 library documentation
CuriousScientist
 
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET Journal
 
Efficient Design of Reversible Multiplexers with Low Quantum Cost
Efficient Design of Reversible Multiplexers with Low Quantum CostEfficient Design of Reversible Multiplexers with Low Quantum Cost
Efficient Design of Reversible Multiplexers with Low Quantum Cost
IJERA Editor
 
Signal descriptors of 8086
Signal descriptors of 8086Signal descriptors of 8086
Signal descriptors of 8086
aviban
 
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adderFpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Malik Tauqir Hasan
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
Hideyuki Tanaka
 
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET Journal
 
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...
Masashi Imano
 
Site Operation Manual for a Typical Air Monitoring Site
Site Operation Manual for a Typical Air Monitoring SiteSite Operation Manual for a Typical Air Monitoring Site
Site Operation Manual for a Typical Air Monitoring Site
TAMUK
 
Building communication platforms for the IoT
Building communication platforms for the IoTBuilding communication platforms for the IoT
Building communication platforms for the IoT
Troels Brødsgaard
 

More from Stefan Marr (20)

Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Stefan Marr
 
Seminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingSeminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent Programming
Stefan Marr
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with Truffle
Stefan Marr
 
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Stefan Marr
 
Why Is Concurrent Programming Hard? And What Can We Do about It?
Why Is Concurrent Programming Hard? And What Can We Do about It?Why Is Concurrent Programming Hard? And What Can We Do about It?
Why Is Concurrent Programming Hard? And What Can We Do about It?
Stefan Marr
 
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Stefan Marr
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low Effort
Stefan Marr
 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsCloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Stefan Marr
 
Supporting Concurrency Abstractions in High-level Language Virtual Machines
Supporting Concurrency Abstractions in High-level Language Virtual MachinesSupporting Concurrency Abstractions in High-level Language Virtual Machines
Supporting Concurrency Abstractions in High-level Language Virtual Machines
Stefan Marr
 
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Stefan Marr
 
Sly and the RoarVM: Parallel Programming with Smalltalk
Sly and the RoarVM: Parallel Programming with SmalltalkSly and the RoarVM: Parallel Programming with Smalltalk
Sly and the RoarVM: Parallel Programming with Smalltalk
Stefan Marr
 
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Stefan Marr
 
Sly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of ProgrammingSly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of Programming
Stefan Marr
 
PHP.next: Traits
PHP.next: TraitsPHP.next: Traits
PHP.next: Traits
Stefan Marr
 
The Price of the Free Lunch: Programming in the Multicore Era
The Price of the Free Lunch: Programming in the Multicore EraThe Price of the Free Lunch: Programming in the Multicore Era
The Price of the Free Lunch: Programming in the Multicore Era
Stefan Marr
 
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Stefan Marr
 
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Stefan Marr
 
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Stefan Marr
 
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Stefan Marr
 
VMADL: An Architecture Definition Language for Variability and Composition ...
VMADL: An Architecture Definition Language  for Variability and Composition  ...VMADL: An Architecture Definition Language  for Variability and Composition  ...
VMADL: An Architecture Definition Language for Variability and Composition ...
Stefan Marr
 
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Stefan Marr
 
Seminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingSeminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent Programming
Stefan Marr
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with Truffle
Stefan Marr
 
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Stefan Marr
 
Why Is Concurrent Programming Hard? And What Can We Do about It?
Why Is Concurrent Programming Hard? And What Can We Do about It?Why Is Concurrent Programming Hard? And What Can We Do about It?
Why Is Concurrent Programming Hard? And What Can We Do about It?
Stefan Marr
 
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Stefan Marr
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low Effort
Stefan Marr
 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsCloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Stefan Marr
 
Supporting Concurrency Abstractions in High-level Language Virtual Machines
Supporting Concurrency Abstractions in High-level Language Virtual MachinesSupporting Concurrency Abstractions in High-level Language Virtual Machines
Supporting Concurrency Abstractions in High-level Language Virtual Machines
Stefan Marr
 
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Stefan Marr
 
Sly and the RoarVM: Parallel Programming with Smalltalk
Sly and the RoarVM: Parallel Programming with SmalltalkSly and the RoarVM: Parallel Programming with Smalltalk
Sly and the RoarVM: Parallel Programming with Smalltalk
Stefan Marr
 
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Stefan Marr
 
Sly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of ProgrammingSly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of Programming
Stefan Marr
 
PHP.next: Traits
PHP.next: TraitsPHP.next: Traits
PHP.next: Traits
Stefan Marr
 
The Price of the Free Lunch: Programming in the Multicore Era
The Price of the Free Lunch: Programming in the Multicore EraThe Price of the Free Lunch: Programming in the Multicore Era
The Price of the Free Lunch: Programming in the Multicore Era
Stefan Marr
 
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Stefan Marr
 
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Stefan Marr
 
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Stefan Marr
 
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Stefan Marr
 
VMADL: An Architecture Definition Language for Variability and Composition ...
VMADL: An Architecture Definition Language  for Variability and Composition  ...VMADL: An Architecture Definition Language  for Variability and Composition  ...
VMADL: An Architecture Definition Language for Variability and Composition ...
Stefan Marr
 

Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fine-grained Parallelism

Editor's Notes

  • #5: Shirako et al.X10 Vijay Saraswat
  • #6: Shirako + Sarkar
  • #8: So I went to the whiteboard drew a tree and figured out how to do it slightly different
  • #10: How to build a tree to synchronize dynamic parallelism?
  • #19: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #20: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #21: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #22: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #23: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #24: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #25: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #26: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #27: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #28: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #29: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #30: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #31: Example tree like in paper, briefly the different properties, and that they are aggregations of the subtree
  • #32: In the general case: - propagate the phase count minimum up the tree - while doing this, wait for racing values, by checking that the found value is the expected from the last visited node, if it is not, wait until it is, thus the racing activity passed
  • #33: In the general case: - propagate the phase count minimum up the tree - while doing this, wait for racing values, by checking that the found value is the expected from the last visited node, if it is not, wait until it is, thus the racing activity passed
  • #34: In the general case: - propagate the phase count minimum up the tree - while doing this, wait for racing values, by checking that the found value is the expected from the last visited node, if it is not, wait until it is, thus the racing activity passed
  • #35: In the general case: - propagate the phase count minimum up the tree - while doing this, wait for racing values, by checking that the found value is the expected from the last visited node, if it is not, wait until it is, thus the racing activity passed
  翻译: