SlideShare a Scribd company logo
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
A Homomorphism-based MapReduce Framework
for Systematic Parallel Programming
Yu Liu
The Graduate University for Advanced Studies
Jan 12, 2011
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Outline
1 Motivations
2 Brief introduction of MapReduce
3 The Homomorphism-based Framework
4 Case Study: Parallel sum, Maximum prefix sum, Variance of
numbers
5 Experimental Results
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Motivation of This Talk
Show how to make programming with MapReduce easier.
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Motivation of This Talk
Show how to make programming with MapReduce easier.
Introduce an approach of automatic parallel program
generating.
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Programming Paradigm of MapReduce
List Homomorphism and Homomorphism Theorems
MapReduce Programming model
The Computation of MapReduce Framework
Google’s MapReduce is a parallel-distributed programming model,
together with an associated implementation, for processing very
large data sets in a massively parallel manner.
Components of a MapReduce program (Hadoop)
A Mapper;
A Partitioner that can be used shuffling data;
A Combiner that can be used doing local reduction;
A Reducer ;
A Comparator can be used while sorting or grouping;
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Programming Paradigm of MapReduce
List Homomorphism and Homomorphism Theorems
MapReduce Programming model
MapReduce Data-processing flow
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Programming Paradigm of MapReduce
List Homomorphism and Homomorphism Theorems
MapReduce Programming model
A simple functional specifcation of the MapReduce framework
Function mapS is a set version of the map function. Function
groupByKey :: {[(k, v)]} → {(k, [v])} takes a set of list of
key-value pairs (each pair is called a record) and groups the values
of the same key into a list.
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Programming Paradigm of MapReduce
List Homomorphism and Homomorphism Theorems
Maximum Prefix Sum problem
The Maximum Prefix Sum problem (mps) is to find the maximum
prefix-summation in a list:
3, −1, 4, 1, −5, 9, 2, −6, 5
This problem seems not obvious to solve this problem efficiently
with MapReduce.
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Programming Paradigm of MapReduce
List Homomorphism and Homomorphism Theorems
List Homomorphism
Function h is said to be a list homomorphism
If there are a function f and an associated operator such that
for any list x and list y
h [a] = f a
h (x ++ y) = h(x) h(y).
Where ++ is the list concatenation.
For instance, the function sum can be described as a list
homomorphism
sum [a] = a
sum (x ++ y) = sum x + sum y.
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Programming Paradigm of MapReduce
List Homomorphism and Homomorphism Theorems
List Homomorphism and Homomorphism Theorems
Leftwards function
Function h is leftwards if it is defined in the following form with
function f and operator ⊕,
h [a] = f a
h ([a] ++ x) = a ⊕ h x.
Rightwards function
Function h is rightwards if it is defined in the following form with
function f and operator ⊗,
h [a] = f a
h (x ++ [a]) = h x ⊗ a.
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Programming Paradigm of MapReduce
List Homomorphism and Homomorphism Theorems
List Homomorphism and Homomorphism Theorems
Map and Reduce
For a given function f , the function of the form ([[·] ◦ f , ++ ]) is a
map function, and is written as map f .
————————————————————————————
The function of the form ([id, ]) for some is a reduce function,
and is written as reduce ( ).
The First Homomorphism Theorem
Any homomorphism can be written as the composition of a map
and a reduce:
([f , ]) = reduce ( ) ◦ map f .
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Programming Paradigm of MapReduce
List Homomorphism and Homomorphism Theorems
List Homomorphism and Homomorphism Theorems
The Third Homomorphism Theorem
Function h can be described as a list homomorphism, iff ∃ and
∃ f such that:
h = ([f , ])
if and only if there exist f , ⊕, and ⊕ such that
h [a] = f a
h ([a] ++ x) = a ⊕ h x
h (x ++ [b]) = h x ⊗ b.
The third homomorphism gives a necessary and sufficient condition
for the existence of a list homomorphism.
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
A homomorphism-based framework wrapping MapReduce
To make it easy for resolving problems such as mps by
MapReduce. We using the knowledge of homomorphism especially
the third homomorphism theorem to wrapping MapReduce model.
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
A homomorphism-based framework wrapping MapReduce
Basic Homomorphism-Programming Interface
filter :: a → b
aggregator :: b → b → b.
The implementlation on Hadoop
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
A homomorphism-based framework wrapping MapReduce
A simple example of using this interface for computing the sum of
a list
The implementlation on Hadoop
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
A homomorphism-based framework wrapping MapReduce
Programming Interface with Right Inverse
fold :: [a] → b
unfold :: b → [a].
The implementlation on Hadoop
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
A homomorphism-based framework wrapping MapReduce
A simple example of using this interface for computing the sum of
a list
The implementlation on Hadoop
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
A homomorphism-based framework wrapping MapReduce
Requirements of using this interface in addition to the right-inverse
property of unfold over fold.
Both leftwards and rightwards functions exist
fold([a] ++ x) = fold([a] ++ unfold(fold(x)))
fold(x ++ [a]) = fold(unfold(fold(x)) ++ [a]).
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
The implementation of homomorphism framework upon
Hadoop
To implement our programming interface with Hadoop, we need to
consider how to represent lists in a distributed manner.
Set of pairs as list
We use integer as the index’s type, the list [a, b, c, d, e] is
represented by {(3, d), (1, b), (2, c), (0, a), (4, e)}.
Set of pairs as distributed List
We can represent the above list as two sub-sets
{((0, 1), b), ((0, 2), c), ((0, 0), a)} and {((1, 3), d), ((1, 4), e)}, each
in different data-nodes
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
The implementation of homomorphism framework upon
Hadoop
The first homomorphism theorem implies that a list
homomorphism can be implemented by MapReduce, at least two
passes of MapReduce.
Defination of homMR
homMR :: (α → β) → (β → β → β) → {(ID, α)} → β
homMR f (⊕) = getValue ◦ MapReduce mapper2 reducer2
◦ MapReduce mapper1 reducer1
where
mapper1 :: (ID, α)) → [((ID, ID), β))]
mapper1 (i, a) = [((pid, i), b)]
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
The implementation of homomorphism framework upon
Hadoop
Defination of homMR
reducer1 :: (ID, ID) → [β] → β
reducer1 ((p, j), ias)) = hom f (⊕) ias
mapper2 :: ((ID, ID), β) → [((ID, ID), β)]
mapper2 ((p, j), b) = [((0, j), b)]
reducer2 :: (ID, ID) → [β] → β
reducer2 ((0, k), jbs) = hom (⊕) jbs
getValue {(0, b)} = b
Where, hom f (⊕) denotes a sequential version of ([f , ⊕]).
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
The leftwards and rightwardsfunction
Derivation by right inverse
leftwards([a] ++ x) = fold([a] ++ unfold(fold(x)))
rightwards(x ++ [a]) = fold(unfold(fold x) ++ [a]).
Now if for all xs,
rightwards xs = leftwards xs, (1)
then a list homomorphism ([f , ⊕]) that computes fold can be
obtained automatically, where f and ⊕ are defined as follows:
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
The leftwards and rightwardsfunction
Derivation by right inverse
f a = fold([a])
a ⊕ b = fold(unfold a ++ unfold b).
Equation (1) should be satisfied.
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
Programming with this homomorphism framework
MPS
A sequential program
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Automatic Parallelization
Case Study
Programming with this homomorphism framework
MPS
A tupled function
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
MPS
(mps sum) [a] = (a ↑ 0, a)
(mps sum) (x + +[a]) = let (m, s) = (mps sum) x in (m ↑ (s + a
We use this tupled function as the fold function. The right inverse
of the tupled function, (mps sum)◦:
(mps sum)◦
(m, s) = [m, s − m]
Noting that for the any result (m, s) of the tupled function the
inequality m s hold,
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
The implementation of homomorphism framework upon
Hadoop
performance tests
Environment:Hardware
COE cluster in Tokyo University which has 192 computing nodes.
We choose 16 , 8 , 4 , 2 and 1 node to run the MapReduce-MPS
program. Each node has 2 Xeon(Nocona) CPU with 2GB RAM.
Environment:Software
Linux2.6.26 ,Hadoop0.20.2 +HDFS
Hadoop configuration: heap size= 1024MB
maximum mapper pre node: 2
maximum reducer pre node: 2
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Performance
The input data
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Performance
The time consuming of calculate 100 million-long list
(SequenceFile, Pair < Long >):
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Performance
The speedup of 2-16 nodes:
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Performance
Comparison of 2 version SUM
Comparison of 2-16 nodes:
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
Performance
Conclusions
The time curve indicate the system scalability with the number of
computing nodes. The curve between x-axis 2 and 8 has biggest
slope, when the curve reaches to 16, the slope decreased, that is
because when there are more nodes, the overhead of
communication increased. Totally, the curve shows the scalability
is near-linear.
Overhead of 2 phases Map-Reduce.
Overhead of Java reflection.
Not support local reduction now (not implemented yet).
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Outline
Motivation
Brief introduction of background
The Design of Homomorphism-based Framework on MapReduce
Case Study
Performance Evaluation
The end
Questions?
?
Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
Ad

More Related Content

What's hot (19)

Query optimisation
Query optimisationQuery optimisation
Query optimisation
WBUTTUTORIALS
 
Lec5 Pagerank
Lec5 PagerankLec5 Pagerank
Lec5 Pagerank
Jeff Hammerbacher
 
RBM from Scratch
RBM from Scratch RBM from Scratch
RBM from Scratch
Hadi Sinaee
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
tugrulh
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
J Singh
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
Yu Liu
 
Pagerank (from Google)
Pagerank (from Google)Pagerank (from Google)
Pagerank (from Google)
Sri Prasanna
 
Lec5 pagerank
Lec5 pagerankLec5 pagerank
Lec5 pagerank
Carlos
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)
Bablu Shofi
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
MIT
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
graphulo
 
Astar algorithm
Astar algorithmAstar algorithm
Astar algorithm
Shuqing Zhang
 
Lecture 11 Informed Search
Lecture 11 Informed SearchLecture 11 Informed Search
Lecture 11 Informed Search
Hema Kashyap
 
Heuristic search
Heuristic searchHeuristic search
Heuristic search
Soheil Khodayari
 
ch13
ch13ch13
ch13
KITE www.kitecolleges.com
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19
ngamou
 
Lgm pakdd2011 public
Lgm pakdd2011 publicLgm pakdd2011 public
Lgm pakdd2011 public
Yasuo Tabei
 
Dstar Lite
Dstar LiteDstar Lite
Dstar Lite
Adrian Sotelo
 
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
QUT_SEF
 
RBM from Scratch
RBM from Scratch RBM from Scratch
RBM from Scratch
Hadi Sinaee
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
tugrulh
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
J Singh
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
Yu Liu
 
Pagerank (from Google)
Pagerank (from Google)Pagerank (from Google)
Pagerank (from Google)
Sri Prasanna
 
Lec5 pagerank
Lec5 pagerankLec5 pagerank
Lec5 pagerank
Carlos
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)
Bablu Shofi
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
MIT
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
graphulo
 
Lecture 11 Informed Search
Lecture 11 Informed SearchLecture 11 Informed Search
Lecture 11 Informed Search
Hema Kashyap
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19
ngamou
 
Lgm pakdd2011 public
Lgm pakdd2011 publicLgm pakdd2011 public
Lgm pakdd2011 public
Yasuo Tabei
 
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
QUT_SEF
 

Similar to A Homomorphism-based MapReduce Framework for Systematic Parallel Programming (20)

Towards Systematic Parallel Programming over MapReduce
Towards Systematic Parallel Programming over MapReduceTowards Systematic Parallel Programming over MapReduce
Towards Systematic Parallel Programming over MapReduce
Yu Liu
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
Mohammad Mustaqeem
 
Map/Reduce intro
Map/Reduce introMap/Reduce intro
Map/Reduce intro
CARLOS III UNIVERSITY OF MADRID
 
Map reduce and the art of Thinking Parallel - Dr. Shailesh Kumar
Map reduce and the art of Thinking Parallel   - Dr. Shailesh KumarMap reduce and the art of Thinking Parallel   - Dr. Shailesh Kumar
Map reduce and the art of Thinking Parallel - Dr. Shailesh Kumar
Hyderabad Scalability Meetup
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & Storage
Ilayaraja P
 
MapReduce
MapReduceMapReduce
MapReduce
Amir Payberah
 
Map reduce
Map reduceMap reduce
Map reduce
xydii
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
coolmirza143
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
Nishant Gandhi
 
Big data shim
Big data shimBig data shim
Big data shim
tistrue
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
Shubham Bansal
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
anh tuan
 
Map reduce
Map reduceMap reduce
Map reduce
Shahbaz Sidhu
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
Victor Giannakouris
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
TSANKARARAO
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
NelakurthyVasanthRed1
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
Harisankar H
 
Lec2 Mapred
Lec2 MapredLec2 Mapred
Lec2 Mapred
mobius.cn
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
David Gleich
 
Data_Prep_Techniques_Challenges_Methods.pdf
Data_Prep_Techniques_Challenges_Methods.pdfData_Prep_Techniques_Challenges_Methods.pdf
Data_Prep_Techniques_Challenges_Methods.pdf
Shailja Thakur
 
Towards Systematic Parallel Programming over MapReduce
Towards Systematic Parallel Programming over MapReduceTowards Systematic Parallel Programming over MapReduce
Towards Systematic Parallel Programming over MapReduce
Yu Liu
 
Map reduce and the art of Thinking Parallel - Dr. Shailesh Kumar
Map reduce and the art of Thinking Parallel   - Dr. Shailesh KumarMap reduce and the art of Thinking Parallel   - Dr. Shailesh Kumar
Map reduce and the art of Thinking Parallel - Dr. Shailesh Kumar
Hyderabad Scalability Meetup
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & Storage
Ilayaraja P
 
Map reduce
Map reduceMap reduce
Map reduce
xydii
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
coolmirza143
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
Nishant Gandhi
 
Big data shim
Big data shimBig data shim
Big data shim
tistrue
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
anh tuan
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
Victor Giannakouris
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
TSANKARARAO
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
David Gleich
 
Data_Prep_Techniques_Challenges_Methods.pdf
Data_Prep_Techniques_Challenges_Methods.pdfData_Prep_Techniques_Challenges_Methods.pdf
Data_Prep_Techniques_Challenges_Methods.pdf
Shailja Thakur
 
Ad

More from Yu Liu (20)

A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
 
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsCloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Yu Liu
 
Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDoc
Yu Liu
 
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
Yu Liu
 
Survey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search EnginesSurvey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search Engines
Yu Liu
 
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded TreewidthPaper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Yu Liu
 
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Yu Liu
 
An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013
Yu Liu
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
Yu Liu
 
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
Yu Liu
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)
Yu Liu
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
Yu Liu
 
Introduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming FrameworkIntroduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming Framework
Yu Liu
 
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Yu Liu
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
Yu Liu
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
Yu Liu
 
Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
Yu Liu
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
Yu Liu
 
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on HadoopScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
Yu Liu
 
Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on HadoopImplementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
Yu Liu
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
 
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsCloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Yu Liu
 
Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDoc
Yu Liu
 
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
Yu Liu
 
Survey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search EnginesSurvey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search Engines
Yu Liu
 
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded TreewidthPaper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Yu Liu
 
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Yu Liu
 
An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013
Yu Liu
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
Yu Liu
 
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
Yu Liu
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)
Yu Liu
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
Yu Liu
 
Introduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming FrameworkIntroduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming Framework
Yu Liu
 
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Yu Liu
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
Yu Liu
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
Yu Liu
 
Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
Yu Liu
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
Yu Liu
 
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on HadoopScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
Yu Liu
 
Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on HadoopImplementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
Yu Liu
 
Ad

Recently uploaded (20)

DNF 2.0 Implementations Challenges in Nepal
DNF 2.0 Implementations Challenges in NepalDNF 2.0 Implementations Challenges in Nepal
DNF 2.0 Implementations Challenges in Nepal
ICT Frame Magazine Pvt. Ltd.
 
Stretching CloudStack over multiple datacenters
Stretching CloudStack over multiple datacentersStretching CloudStack over multiple datacenters
Stretching CloudStack over multiple datacenters
ShapeBlue
 
UX Change Fatigue: Building Resilient Teams in Times of Transformation by Mal...
UX Change Fatigue: Building Resilient Teams in Times of Transformation by Mal...UX Change Fatigue: Building Resilient Teams in Times of Transformation by Mal...
UX Change Fatigue: Building Resilient Teams in Times of Transformation by Mal...
UXPA Boston
 
Breaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP DevelopersBreaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP Developers
pmeth1
 
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
UXPA Boston
 
CloudStack + KVM: Your Local Cloud Lab
CloudStack + KVM:   Your Local Cloud LabCloudStack + KVM:   Your Local Cloud Lab
CloudStack + KVM: Your Local Cloud Lab
ShapeBlue
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
SOFTTECHHUB
 
Reducing Bugs With Static Code Analysis php tek 2025
Reducing Bugs With Static Code Analysis php tek 2025Reducing Bugs With Static Code Analysis php tek 2025
Reducing Bugs With Static Code Analysis php tek 2025
Scott Keck-Warren
 
Pushing the Limits: CloudStack at 25K Hosts
Pushing the Limits: CloudStack at 25K HostsPushing the Limits: CloudStack at 25K Hosts
Pushing the Limits: CloudStack at 25K Hosts
ShapeBlue
 
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptxUiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
anabulhac
 
I’d like to resell your CloudStack services, but...
I’d like to resell your CloudStack services, but...I’d like to resell your CloudStack services, but...
I’d like to resell your CloudStack services, but...
ShapeBlue
 
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UXPA Boston
 
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Building Connected Agents:  An Overview of Google's ADK and A2A ProtocolBuilding Connected Agents:  An Overview of Google's ADK and A2A Protocol
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Suresh Peiris
 
TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc Webinar: Cross-Border Data Transfers in 2025TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc
 
Best 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat PlatformsBest 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat Platforms
Soulmaite
 
Planetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile BrochurePlanetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile Brochure
Planetek Italia Srl
 
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Gary Arora
 
TAFs on WebDriver API - By - Pallavi Sharma.pdf
TAFs on WebDriver API - By - Pallavi Sharma.pdfTAFs on WebDriver API - By - Pallavi Sharma.pdf
TAFs on WebDriver API - By - Pallavi Sharma.pdf
Pallavi Sharma
 
MuleSoft RTF & Flex Gateway on AKS – Setup, Insights & Real-World Tips
MuleSoft RTF & Flex Gateway on AKS – Setup, Insights & Real-World TipsMuleSoft RTF & Flex Gateway on AKS – Setup, Insights & Real-World Tips
MuleSoft RTF & Flex Gateway on AKS – Setup, Insights & Real-World Tips
Patryk Bandurski
 
Stretching CloudStack over multiple datacenters
Stretching CloudStack over multiple datacentersStretching CloudStack over multiple datacenters
Stretching CloudStack over multiple datacenters
ShapeBlue
 
UX Change Fatigue: Building Resilient Teams in Times of Transformation by Mal...
UX Change Fatigue: Building Resilient Teams in Times of Transformation by Mal...UX Change Fatigue: Building Resilient Teams in Times of Transformation by Mal...
UX Change Fatigue: Building Resilient Teams in Times of Transformation by Mal...
UXPA Boston
 
Breaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP DevelopersBreaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP Developers
pmeth1
 
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
UXPA Boston
 
CloudStack + KVM: Your Local Cloud Lab
CloudStack + KVM:   Your Local Cloud LabCloudStack + KVM:   Your Local Cloud Lab
CloudStack + KVM: Your Local Cloud Lab
ShapeBlue
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
SOFTTECHHUB
 
Reducing Bugs With Static Code Analysis php tek 2025
Reducing Bugs With Static Code Analysis php tek 2025Reducing Bugs With Static Code Analysis php tek 2025
Reducing Bugs With Static Code Analysis php tek 2025
Scott Keck-Warren
 
Pushing the Limits: CloudStack at 25K Hosts
Pushing the Limits: CloudStack at 25K HostsPushing the Limits: CloudStack at 25K Hosts
Pushing the Limits: CloudStack at 25K Hosts
ShapeBlue
 
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptxUiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
anabulhac
 
I’d like to resell your CloudStack services, but...
I’d like to resell your CloudStack services, but...I’d like to resell your CloudStack services, but...
I’d like to resell your CloudStack services, but...
ShapeBlue
 
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UXPA Boston
 
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Building Connected Agents:  An Overview of Google's ADK and A2A ProtocolBuilding Connected Agents:  An Overview of Google's ADK and A2A Protocol
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Suresh Peiris
 
TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc Webinar: Cross-Border Data Transfers in 2025TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc
 
Best 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat PlatformsBest 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat Platforms
Soulmaite
 
Planetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile BrochurePlanetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile Brochure
Planetek Italia Srl
 
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Gary Arora
 
TAFs on WebDriver API - By - Pallavi Sharma.pdf
TAFs on WebDriver API - By - Pallavi Sharma.pdfTAFs on WebDriver API - By - Pallavi Sharma.pdf
TAFs on WebDriver API - By - Pallavi Sharma.pdf
Pallavi Sharma
 
MuleSoft RTF & Flex Gateway on AKS – Setup, Insights & Real-World Tips
MuleSoft RTF & Flex Gateway on AKS – Setup, Insights & Real-World TipsMuleSoft RTF & Flex Gateway on AKS – Setup, Insights & Real-World Tips
MuleSoft RTF & Flex Gateway on AKS – Setup, Insights & Real-World Tips
Patryk Bandurski
 

A Homomorphism-based MapReduce Framework for Systematic Parallel Programming

  • 1. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation A Homomorphism-based MapReduce Framework for Systematic Parallel Programming Yu Liu The Graduate University for Advanced Studies Jan 12, 2011 Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 2. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Outline 1 Motivations 2 Brief introduction of MapReduce 3 The Homomorphism-based Framework 4 Case Study: Parallel sum, Maximum prefix sum, Variance of numbers 5 Experimental Results Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 3. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Motivation of This Talk Show how to make programming with MapReduce easier. Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 4. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Motivation of This Talk Show how to make programming with MapReduce easier. Introduce an approach of automatic parallel program generating. Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 5. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Programming Paradigm of MapReduce List Homomorphism and Homomorphism Theorems MapReduce Programming model The Computation of MapReduce Framework Google’s MapReduce is a parallel-distributed programming model, together with an associated implementation, for processing very large data sets in a massively parallel manner. Components of a MapReduce program (Hadoop) A Mapper; A Partitioner that can be used shuffling data; A Combiner that can be used doing local reduction; A Reducer ; A Comparator can be used while sorting or grouping; Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 6. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Programming Paradigm of MapReduce List Homomorphism and Homomorphism Theorems MapReduce Programming model MapReduce Data-processing flow Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 7. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Programming Paradigm of MapReduce List Homomorphism and Homomorphism Theorems MapReduce Programming model A simple functional specifcation of the MapReduce framework Function mapS is a set version of the map function. Function groupByKey :: {[(k, v)]} → {(k, [v])} takes a set of list of key-value pairs (each pair is called a record) and groups the values of the same key into a list. Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 8. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Programming Paradigm of MapReduce List Homomorphism and Homomorphism Theorems Maximum Prefix Sum problem The Maximum Prefix Sum problem (mps) is to find the maximum prefix-summation in a list: 3, −1, 4, 1, −5, 9, 2, −6, 5 This problem seems not obvious to solve this problem efficiently with MapReduce. Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 9. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Programming Paradigm of MapReduce List Homomorphism and Homomorphism Theorems List Homomorphism Function h is said to be a list homomorphism If there are a function f and an associated operator such that for any list x and list y h [a] = f a h (x ++ y) = h(x) h(y). Where ++ is the list concatenation. For instance, the function sum can be described as a list homomorphism sum [a] = a sum (x ++ y) = sum x + sum y. Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 10. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Programming Paradigm of MapReduce List Homomorphism and Homomorphism Theorems List Homomorphism and Homomorphism Theorems Leftwards function Function h is leftwards if it is defined in the following form with function f and operator ⊕, h [a] = f a h ([a] ++ x) = a ⊕ h x. Rightwards function Function h is rightwards if it is defined in the following form with function f and operator ⊗, h [a] = f a h (x ++ [a]) = h x ⊗ a. Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 11. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Programming Paradigm of MapReduce List Homomorphism and Homomorphism Theorems List Homomorphism and Homomorphism Theorems Map and Reduce For a given function f , the function of the form ([[·] ◦ f , ++ ]) is a map function, and is written as map f . ———————————————————————————— The function of the form ([id, ]) for some is a reduce function, and is written as reduce ( ). The First Homomorphism Theorem Any homomorphism can be written as the composition of a map and a reduce: ([f , ]) = reduce ( ) ◦ map f . Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 12. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Programming Paradigm of MapReduce List Homomorphism and Homomorphism Theorems List Homomorphism and Homomorphism Theorems The Third Homomorphism Theorem Function h can be described as a list homomorphism, iff ∃ and ∃ f such that: h = ([f , ]) if and only if there exist f , ⊕, and ⊕ such that h [a] = f a h ([a] ++ x) = a ⊕ h x h (x ++ [b]) = h x ⊗ b. The third homomorphism gives a necessary and sufficient condition for the existence of a list homomorphism. Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 13. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study A homomorphism-based framework wrapping MapReduce To make it easy for resolving problems such as mps by MapReduce. We using the knowledge of homomorphism especially the third homomorphism theorem to wrapping MapReduce model. Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 14. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study A homomorphism-based framework wrapping MapReduce Basic Homomorphism-Programming Interface filter :: a → b aggregator :: b → b → b. The implementlation on Hadoop Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 15. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study A homomorphism-based framework wrapping MapReduce A simple example of using this interface for computing the sum of a list The implementlation on Hadoop Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 16. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study A homomorphism-based framework wrapping MapReduce Programming Interface with Right Inverse fold :: [a] → b unfold :: b → [a]. The implementlation on Hadoop Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 17. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study A homomorphism-based framework wrapping MapReduce A simple example of using this interface for computing the sum of a list The implementlation on Hadoop Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 18. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study A homomorphism-based framework wrapping MapReduce Requirements of using this interface in addition to the right-inverse property of unfold over fold. Both leftwards and rightwards functions exist fold([a] ++ x) = fold([a] ++ unfold(fold(x))) fold(x ++ [a]) = fold(unfold(fold(x)) ++ [a]). Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 19. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study The implementation of homomorphism framework upon Hadoop To implement our programming interface with Hadoop, we need to consider how to represent lists in a distributed manner. Set of pairs as list We use integer as the index’s type, the list [a, b, c, d, e] is represented by {(3, d), (1, b), (2, c), (0, a), (4, e)}. Set of pairs as distributed List We can represent the above list as two sub-sets {((0, 1), b), ((0, 2), c), ((0, 0), a)} and {((1, 3), d), ((1, 4), e)}, each in different data-nodes Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 20. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study The implementation of homomorphism framework upon Hadoop The first homomorphism theorem implies that a list homomorphism can be implemented by MapReduce, at least two passes of MapReduce. Defination of homMR homMR :: (α → β) → (β → β → β) → {(ID, α)} → β homMR f (⊕) = getValue ◦ MapReduce mapper2 reducer2 ◦ MapReduce mapper1 reducer1 where mapper1 :: (ID, α)) → [((ID, ID), β))] mapper1 (i, a) = [((pid, i), b)] Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 21. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study The implementation of homomorphism framework upon Hadoop Defination of homMR reducer1 :: (ID, ID) → [β] → β reducer1 ((p, j), ias)) = hom f (⊕) ias mapper2 :: ((ID, ID), β) → [((ID, ID), β)] mapper2 ((p, j), b) = [((0, j), b)] reducer2 :: (ID, ID) → [β] → β reducer2 ((0, k), jbs) = hom (⊕) jbs getValue {(0, b)} = b Where, hom f (⊕) denotes a sequential version of ([f , ⊕]). Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 22. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study The leftwards and rightwardsfunction Derivation by right inverse leftwards([a] ++ x) = fold([a] ++ unfold(fold(x))) rightwards(x ++ [a]) = fold(unfold(fold x) ++ [a]). Now if for all xs, rightwards xs = leftwards xs, (1) then a list homomorphism ([f , ⊕]) that computes fold can be obtained automatically, where f and ⊕ are defined as follows: Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 23. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study The leftwards and rightwardsfunction Derivation by right inverse f a = fold([a]) a ⊕ b = fold(unfold a ++ unfold b). Equation (1) should be satisfied. Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 24. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study Programming with this homomorphism framework MPS A sequential program Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 25. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Automatic Parallelization Case Study Programming with this homomorphism framework MPS A tupled function Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 26. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation MPS (mps sum) [a] = (a ↑ 0, a) (mps sum) (x + +[a]) = let (m, s) = (mps sum) x in (m ↑ (s + a We use this tupled function as the fold function. The right inverse of the tupled function, (mps sum)◦: (mps sum)◦ (m, s) = [m, s − m] Noting that for the any result (m, s) of the tupled function the inequality m s hold, Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 27. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation The implementation of homomorphism framework upon Hadoop performance tests Environment:Hardware COE cluster in Tokyo University which has 192 computing nodes. We choose 16 , 8 , 4 , 2 and 1 node to run the MapReduce-MPS program. Each node has 2 Xeon(Nocona) CPU with 2GB RAM. Environment:Software Linux2.6.26 ,Hadoop0.20.2 +HDFS Hadoop configuration: heap size= 1024MB maximum mapper pre node: 2 maximum reducer pre node: 2 Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 28. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Performance The input data Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 29. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Performance The time consuming of calculate 100 million-long list (SequenceFile, Pair < Long >): Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 30. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Performance The speedup of 2-16 nodes: Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 31. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Performance Comparison of 2 version SUM Comparison of 2-16 nodes: Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 32. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation Performance Conclusions The time curve indicate the system scalability with the number of computing nodes. The curve between x-axis 2 and 8 has biggest slope, when the curve reaches to 16, the slope decreased, that is because when there are more nodes, the overhead of communication increased. Totally, the curve shows the scalability is near-linear. Overhead of 2 phases Map-Reduce. Overhead of Java reflection. Not support local reduction now (not implemented yet). Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  • 33. Outline Motivation Brief introduction of background The Design of Homomorphism-based Framework on MapReduce Case Study Performance Evaluation The end Questions? ? Yu Liu A Homomorphism-based MapReduce Framework for Systematic P
  翻译: