SlideShare a Scribd company logo
Map-Reduce
• Why Map Reduce?
• The rise of aggregate-oriented databases is in large part due to the
growth of clusters.
• Running on a cluster means you have to make your tradeoffs in data
• storage differently than when running on a single machine.
• Clusters don’t just change the rules for data storage—they also change
the rules for computation.
• If you store lots of data on a cluster, processing that data efficiently
means you have to think differently about how you organize your
processing.
Centralized Database
• With a centralized database, there are generally two ways you can run the
processing logic against it: either on the database server itself or on a client
machine.
• Running it on a client machine gives you more flexibility in choosing a
programming environment, which usually makes for programs that are easier
to create or extend.
• This comes at the cost of having to remove lots of data from the database
• server.
• If you need to hit a lot of data, then it makes sense to do the processing on
the server, paying the price in programming convenience and increasing the
load on the database server.
Cluster
• When you have a cluster, there is good news immediately—you have
lots of machines to spread the computation over.
• However, you also still need to try to reduce the amount of data that
needs to be transferred across the network by doing as much
processing as you can on the same node as the data it needs.
Map-Reduce
• The map-reduce pattern is a way to organize processing in such a way as
to take advantage of multiple machines on a cluster while keeping as
much processing and the data it needs together on the same machine.
• It first gained prominence with Google’s MapReduce framework.
• A widely used open-source implementation is part of the Hadoop project,
• although several databases include their own implementations.
• As with most patterns, there are differences in detail between these
implementations
• The name “map-reduce” reveals its inspiration from the map and reduce
Basic Map-Reduce
•Basic Idea: Let’s assume we have chosen orders as our aggregate, with each
order having line items. Each line item has a product ID, quantity, and the price
charged. This aggregate makes a lot of sense as usually people want to see the
whole order in one access. We have lots of orders, so we’ve sharded the dataset
over many machines.
• However, sales analysis people want to see a product and its total
revenue for the last seven days. This report doesn’t fit the aggregate
structure that we have—which is the downside of using aggregates. In
order to get the product revenue report, you’ll have to visit every
machine in the cluster and examine many records on each machine
Contd..
• The first stage in a map-reduce job is the map.
• A map is a function whose input is a single aggregate and whose output is a
bunch of key value pairs. In this case, the input would be an order.
• The output would be key-value pairs corresponding to the line items. Each
one would have the product ID as the key and an embedded map with the
quantity and price as the values
Map Function
Reduce Function
Partitioning
Combining
Composing Map-Reduce Calculations
Contd..
A Two Stage Map-Reduce Example
Creating records for monthly sales of a
product
The second stage mapper creates base
records for year-on-year comparisons
The reduction step is a merge of
incomplete records
Incremental Map-Reduce
• The examples we’ve discussed so far are complete map-reduce
computations, where we start with raw inputs and create a final output.
• Many map-reduce computations take a while to perform, even with
clustered hardware, and new data keeps coming in which means we need to
rerun the computation to keep the output up to date.
• Starting from scratch each time can take too long, so often it’s useful to
structure a map-reduce computation to allow incremental updates, so that
only the minimum computation needs to be done.
• The map stages of a map-reduce are easy to handle incrementally—only if
the input data changes does the mapper need to be rerun. Since maps are
isolated from each other, incremental updates are straightforward
Contd..
• The more complex case is the reduce step, since it pulls together the
outputs from many maps and any change in the map outputs could
trigger a new reduction.
• This recomputation can be lessened depending on how parallel the
reduce step is.
• If we are partitioning the data for reduction, then any partition that’s
unchanged does not need to be re-reduced.
• Similarly, if there’s a combiner step, it doesn’t need to be rerun if its
source data hasn’t changed.
Contd..
• If our reducer is combinable, there’s some more opportunities for
computation avoidance.
• If the changes are additive—that is, if we are only adding new records
but are not changing or deleting any old records—then we can just
run the reduce with the existing result and the new additions.
• If there are destructive changes, that is updates and deletes, then we
can avoid some recomputation by breaking up the reduce operation
into steps and only recalculating those steps whose inputs have
changed-essentially, using a Dependency Network to organize the
computation.
Key points
⚫Map-reduce is a pattern to allow computations to be parallelized over a cluster.
⚫The map task reads data from an aggregate and boils it down to relevant key-value pairs.
Maps only read a single record at a time and can thus be parallelized and run on the node
that stores the record.
⚫Reduce tasks take many values for a single key output from map tasks and summarize them
into a single output. Each reducer operates on the result of a single key, so it can be
parallelized by key.
⚫Reducers that have the same form for input and output can be combined into pipelines.
This improves parallelism and reduces the amount of data to be transferred.
⚫Map-reduce operations can be composed into pipelines where the output of one reduce is
the input to another operation’s map.
⚫If the result of a map-reduce computation is widely used, it can be stored as a materialized
view.
⚫Materialized views can be updated through incremental map-reduce operations that only
compute changes to the view instead of recomputing everything from scratch
Key-Value Databases
• A key-value store is a simple hash table, primarily used when all
access to the database is via primary key.
• Think of a table in a traditional RDBMS with two columns, such as ID
and NAME, the ID column being the key and NAME column storing
the value.
• In an RDBMS, the NAME column is restricted to storing data of type
string. The application can provide an ID and VALUE and persist the
pair; if the ID already exists the current value is overwritten,
otherwise a new entry is created.
Oracle vs Riak
What Is a Key-Value Store
• Key-value stores are the simplest NoSQL data stores to use from an
API perspective.
• The client can either get the value for the key, put a value for a key, or
delete a key from the data store.
• The value is a blob that the data store just stores, without caring or
knowing what’s inside; it’s the responsibility of the application to
understand what was stored.
• Since key-value stores always use primary-key access, they generally
have great performance and can be easily scaled.
Popular key-value databases
• Riak [Riak]
• Redis (often referred to as Data Structure server) [Redis]
• Memcached DB and its flavors [Memcached]
• Berkeley DB [Berkeley DB]
• HamsterDB (especially suited for embedded use) [HamsterDB]
• Amazon DynamoDB [Amazon’s Dynamo] (not open-source), and
• Project Voldemort [Project Voldemort] (an open-source implementation of
Amazon DynamoDB).
Storing all the data in a single bucket
Domain Buckets
• We could also create buckets which store specific data. In Riak, they are
known as domain buckets allowing the serialization and deserialization to
be handled by the client driver.
• Using domain buckets or different buckets for different objects (such as
UserProfile and ShoppingCart) segments the data across different buckets
allowing you to read only the object you need without having to change key
design.
• Bucket
• bucket=client.fetchBucket(bucketName).execute();
DomainBucket<UserProfile>profileBucket= DomainBucket.builder(bucket,
UserProfile.class).build();
Key-Value Store Features
• Consistency
• Transactions
• Query Features
• Structure of Data
• Scaling
Consistency
• Consistency is applicable only for operations on a single key, since these
operations are either a get, put, or delete on a single key.
• Optimistic writes can be performed, but are very expensive to implement,
because a change in value cannot be determined by the data store.
• In distributed key-value store implementations like Riak, the eventually
consistent model of consistency is implemented.
• Since the value may have already been replicated to other nodes, Riak has
two ways of resolving update conflicts: either the newest write wins and
older writes loose, or both (all) values are returned allowing the client to
resolve the conflict
Contd..
• In Riak, these options can be set up during the bucket creation.
Buckets are just a way to namespace keys so that key collisions can
be reduced—for example, all customer keys may reside in the
customer bucket.
• When creating a bucket, default values for consistency can be
provided, for example that a write is considered good only when
the data is consistent across all the nodes where the data is stored.
Transactions
• Different products of the key-value store kind have different specifications of
transactions. Generally speaking, there are no guarantees on the writes. Many data
stores do implement transactions in different ways.
• Riak uses the concept of quorum implemented by using the W value —replication
factor—during the write API call.
• Assume we have a Riak cluster with a replication factor of 5 and we supply the W
value of 3.
• When writing, the write is reported as successful only when it is written and reported
as a success on at least three of the nodes.
• This allows Riak to have write tolerance; in our example, with N equal to 5and with a
W value of 3, the cluster can tolerate N - W = 2 nodes being down for write
operations, though we would still have lost some data on those nodes for read.
Query Features
• All key-value stores can query by the key—and that’s about it.
• If you have requirements to query by using some attribute of the value column, it’s
not possible to use the database: Your application needs to read the value to figure
out if the attribute meets the conditions.
• Query by key also has an interesting side effect.
• What if we don’t know the key, especially during ad-hoc querying during debugging?
Most of the data stores will not give you a list of all the primary keys; even if they
did, retrieving lists of keys and then querying for the value would be very
cumbersome.
• Some key-value databases get around this by providing the ability to search inside
the value, such as Riak Search that allows you to query the data just like you would
query it using indexes.
Contd..
• While using key-value stores, lots of thought has to be given to the design of
the key.
• Can the key be generated using some algorithm? Can the key be provided by
the user (user ID, email, etc.)? Or derived from timestamps or other data that
can be derived outside of the database?
• These query characteristics make key-value stores likely candidates for storing
session data (with the session ID as the key), shopping cart data, user profiles,
and so on.
• The expiry_secs property can be used to expire keys after a certain time
interval, especially for session/shopping cart objects.
Structure of Data
• Key-value databases don’t care what is stored in the value
part of the key-value pair.
• The value can be a blob, text, JSON, XML, and so on.
• In Riak, we can use the Content-Type in the POST request
to specify the data type.
Scaling
• Many key-value stores scale by using sharding.
• With sharding, the value of the key determines on which node the key is
• stored.
• Let’s assume we are sharding by the first character of the key; if the key is
f4b19d79587d, which starts with an f, it will be sent to different node than the
key ad9c7a396542.
• This kind of sharding setup can increase performance as more nodes are
• added to the cluster.
• Sharding also introduces some problems. If the node used to store f goes down,
the data stored on that node becomes unavailable, nor can new data be written
with keys that start with f.
Contd..
• Data stores such as Riak allow you to control the aspects
of the CAP Theorem:
• N (number of nodes to store the key-value replicas),
• R (number of nodes that have to have the data being
fetched before the read is considered successful), and
• W (the number of nodes the write has to be written to
before it is considered successful).
Contd..
• Let’s assume we have a 5-node Riak cluster.
• Setting N to 3 means that all data is replicated to at least three nodes, setting R
to 2 means any two nodes must reply to a GET request for it to be considered
successful, and setting W to 2 ensures that the PUT request is written to two
nodes before the write is considered successful.
• These settings allow us to fine-tune node failures for read or write operations.
Based on our need, we can change these values for better read availability or
write availability.
• Generally speaking choose a W value to match your consistency needs; these
• values can be set as defaults during bucket creation.
Suitable Use Cases
• Storing Session Information
• User Profiles, Preferences
• Shopping Cart Data
Storing Session Information
• Generally, every web session is unique and is assigned a unique
session id value.
• Applications that store the session id on disk or in an RDBMS will
greatly benefit from moving to a key-value store, since everything
about the session can be stored by a single PUT request or retrieved
using GET.
• This single-request operation makes it very fast, as everything about
the session is stored in a single object.
• Solutions such as Mem cached are used by many web applications,
and Riak can be used when availability is important.
User Profiles, Preferences
• Almost every user has a unique user Id, username, or some other
attribute, as well as preferences such as language, color, time zone,
which products the user has access to, and so on.
• This can all be put into an object, so getting preferences of a user
takes a single GET operation.
• Similarly, product profiles can be stored.
Shopping Cart Data
• E-commerce websites have shopping carts tied to the user.
• As we want the shopping carts to be available all the time, across browsers,
machines, and sessions, all the shopping information can be put into the
value where the key is the user id.
• A Riak cluster would be best suited for these kinds of applications.
When Not to Use
• Relationships among Data: If you need to have relationships between different sets of
data, or correlate the data between different sets of keys, key-value stores are not the
best solution to use, even though some key-value stores provide link-walking features.
• Multioperation Transactions: If you’re saving multiple keys and there is a failure to save
any one of them, and you want to revert or roll back the rest of the operations, key-value
stores are not the best solution to be used.
• Query by Data: If you need to search the keys based on something found in the value
part of the key-value pairs, then key-value stores are not going to perform well for you.
There is no way to inspect the value on the database side, with the exception of some
products like Riak Search or indexing engines like Lucene or Solr.
• Operations by Sets: Since operations are limited to one key at a time, there is no way to
operate upon multiple keys at the same time. If you need to operate upon multiple keys,
you have to handle this from the client side.
Ad

More Related Content

Similar to NOSQL introduction for big data analytics (20)

Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
Ozgun Erdogan
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
DataWorks Summit
 
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
Matt Kuklinski
 
UNIT3 DBMS.pptx operation nd management of data base
UNIT3 DBMS.pptx operation nd management of data baseUNIT3 DBMS.pptx operation nd management of data base
UNIT3 DBMS.pptx operation nd management of data base
shindhe1098cv
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
Stefanie Zhao
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
DATAVERSITY
 
Multidimensional or tabular points to consider
Multidimensional or tabular  points to considerMultidimensional or tabular  points to consider
Multidimensional or tabular points to consider
Deepak Kumar
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
Subhas Kumar Ghosh
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
TSANKARARAO
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
NelakurthyVasanthRed1
 
Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)
Lucas Jellema
 
closd computing 4th updated MODULE-4.pptx
closd computing 4th updated MODULE-4.pptxclosd computing 4th updated MODULE-4.pptx
closd computing 4th updated MODULE-4.pptx
MaruthiPrasad96
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
Michael Ming Lei
 
Cloud dwh
Cloud dwhCloud dwh
Cloud dwh
Alexander Tokarev
 
arcgis-enterprise-caching-vector-and-raster-tiles.pptx
arcgis-enterprise-caching-vector-and-raster-tiles.pptxarcgis-enterprise-caching-vector-and-raster-tiles.pptx
arcgis-enterprise-caching-vector-and-raster-tiles.pptx
ShashibhushanKumar61
 
KeyAchivementsMimecast
KeyAchivementsMimecastKeyAchivementsMimecast
KeyAchivementsMimecast
Vera Ekimenko
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
Fabio Fumarola
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
ORACLE 12C-New-Features
ORACLE 12C-New-FeaturesORACLE 12C-New-Features
ORACLE 12C-New-Features
Navneet Upneja
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
Ozgun Erdogan
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
DataWorks Summit
 
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
Matt Kuklinski
 
UNIT3 DBMS.pptx operation nd management of data base
UNIT3 DBMS.pptx operation nd management of data baseUNIT3 DBMS.pptx operation nd management of data base
UNIT3 DBMS.pptx operation nd management of data base
shindhe1098cv
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
Stefanie Zhao
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
DATAVERSITY
 
Multidimensional or tabular points to consider
Multidimensional or tabular  points to considerMultidimensional or tabular  points to consider
Multidimensional or tabular points to consider
Deepak Kumar
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
Subhas Kumar Ghosh
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
TSANKARARAO
 
Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)
Lucas Jellema
 
closd computing 4th updated MODULE-4.pptx
closd computing 4th updated MODULE-4.pptxclosd computing 4th updated MODULE-4.pptx
closd computing 4th updated MODULE-4.pptx
MaruthiPrasad96
 
arcgis-enterprise-caching-vector-and-raster-tiles.pptx
arcgis-enterprise-caching-vector-and-raster-tiles.pptxarcgis-enterprise-caching-vector-and-raster-tiles.pptx
arcgis-enterprise-caching-vector-and-raster-tiles.pptx
ShashibhushanKumar61
 
KeyAchivementsMimecast
KeyAchivementsMimecastKeyAchivementsMimecast
KeyAchivementsMimecast
Vera Ekimenko
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
Fabio Fumarola
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
ORACLE 12C-New-Features
ORACLE 12C-New-FeaturesORACLE 12C-New-Features
ORACLE 12C-New-Features
Navneet Upneja
 

More from Radhika R (20)

ppt1.pptx introduction to cloud computing
ppt1.pptx introduction to cloud computingppt1.pptx introduction to cloud computing
ppt1.pptx introduction to cloud computing
Radhika R
 
cloud consumer perspective from IAAS security
cloud consumer perspective from IAAS securitycloud consumer perspective from IAAS security
cloud consumer perspective from IAAS security
Radhika R
 
module4 cloud provider and consumer persp
module4 cloud provider and consumer perspmodule4 cloud provider and consumer persp
module4 cloud provider and consumer persp
Radhika R
 
Lecture_1.1.pptx introduction to cloud ar
Lecture_1.1.pptx introduction to cloud arLecture_1.1.pptx introduction to cloud ar
Lecture_1.1.pptx introduction to cloud ar
Radhika R
 
containers-in-cloud.pptx introduction part
containers-in-cloud.pptx introduction partcontainers-in-cloud.pptx introduction part
containers-in-cloud.pptx introduction part
Radhika R
 
containerization1. introduction to contain
containerization1. introduction to containcontainerization1. introduction to contain
containerization1. introduction to contain
Radhika R
 
moThese platforms hide the complexity and details of the underlying infrastru...
moThese platforms hide the complexity and details of the underlying infrastru...moThese platforms hide the complexity and details of the underlying infrastru...
moThese platforms hide the complexity and details of the underlying infrastru...
Radhika R
 
Cloud computing is a model for enabling convenient, on-demand network access ...
Cloud computing is a model for enabling convenient, on-demand network access ...Cloud computing is a model for enabling convenient, on-demand network access ...
Cloud computing is a model for enabling convenient, on-demand network access ...
Radhika R
 
a collection/group of integrated and networked hardware, software and Interne...
a collection/group of integrated and networked hardware, software and Interne...a collection/group of integrated and networked hardware, software and Interne...
a collection/group of integrated and networked hardware, software and Interne...
Radhika R
 
Cloud Computing is a general term used to describe a new class of network bas...
Cloud Computing is a general term used to describe a new class of network bas...Cloud Computing is a general term used to describe a new class of network bas...
Cloud Computing is a general term used to describe a new class of network bas...
Radhika R
 
containers-in-cloud introduction to ppts
containers-in-cloud introduction to  pptscontainers-in-cloud introduction to  ppts
containers-in-cloud introduction to ppts
Radhika R
 
containerization with example module and
containerization with example module andcontainerization with example module and
containerization with example module and
Radhika R
 
containers-in-cloud introduction example
containers-in-cloud introduction examplecontainers-in-cloud introduction example
containers-in-cloud introduction example
Radhika R
 
Virtualization technology and virtualization
Virtualization technology and virtualizationVirtualization technology and virtualization
Virtualization technology and virtualization
Radhika R
 
introduction to web technology and web application
introduction to web technology and web applicationintroduction to web technology and web application
introduction to web technology and web application
Radhika R
 
Introduction to cloud delivery models of paas
Introduction to cloud delivery models of paasIntroduction to cloud delivery models of paas
Introduction to cloud delivery models of paas
Radhika R
 
Virtualization Technology Hardware Independence Server Consolidation
Virtualization Technology Hardware Independence Server ConsolidationVirtualization Technology Hardware Independence Server Consolidation
Virtualization Technology Hardware Independence Server Consolidation
Radhika R
 
Computing hardware technologies include Rackamount form factor server with in...
Computing hardware technologies include Rackamount form factor server with in...Computing hardware technologies include Rackamount form factor server with in...
Computing hardware technologies include Rackamount form factor server with in...
Radhika R
 
Data centers are typically comprised of the following technologies and compon...
Data centers are typically comprised of the following technologies and compon...Data centers are typically comprised of the following technologies and compon...
Data centers are typically comprised of the following technologies and compon...
Radhika R
 
A number of characteristics define cloud data, applications services and infr...
A number of characteristics define cloud data, applications services and infr...A number of characteristics define cloud data, applications services and infr...
A number of characteristics define cloud data, applications services and infr...
Radhika R
 
ppt1.pptx introduction to cloud computing
ppt1.pptx introduction to cloud computingppt1.pptx introduction to cloud computing
ppt1.pptx introduction to cloud computing
Radhika R
 
cloud consumer perspective from IAAS security
cloud consumer perspective from IAAS securitycloud consumer perspective from IAAS security
cloud consumer perspective from IAAS security
Radhika R
 
module4 cloud provider and consumer persp
module4 cloud provider and consumer perspmodule4 cloud provider and consumer persp
module4 cloud provider and consumer persp
Radhika R
 
Lecture_1.1.pptx introduction to cloud ar
Lecture_1.1.pptx introduction to cloud arLecture_1.1.pptx introduction to cloud ar
Lecture_1.1.pptx introduction to cloud ar
Radhika R
 
containers-in-cloud.pptx introduction part
containers-in-cloud.pptx introduction partcontainers-in-cloud.pptx introduction part
containers-in-cloud.pptx introduction part
Radhika R
 
containerization1. introduction to contain
containerization1. introduction to containcontainerization1. introduction to contain
containerization1. introduction to contain
Radhika R
 
moThese platforms hide the complexity and details of the underlying infrastru...
moThese platforms hide the complexity and details of the underlying infrastru...moThese platforms hide the complexity and details of the underlying infrastru...
moThese platforms hide the complexity and details of the underlying infrastru...
Radhika R
 
Cloud computing is a model for enabling convenient, on-demand network access ...
Cloud computing is a model for enabling convenient, on-demand network access ...Cloud computing is a model for enabling convenient, on-demand network access ...
Cloud computing is a model for enabling convenient, on-demand network access ...
Radhika R
 
a collection/group of integrated and networked hardware, software and Interne...
a collection/group of integrated and networked hardware, software and Interne...a collection/group of integrated and networked hardware, software and Interne...
a collection/group of integrated and networked hardware, software and Interne...
Radhika R
 
Cloud Computing is a general term used to describe a new class of network bas...
Cloud Computing is a general term used to describe a new class of network bas...Cloud Computing is a general term used to describe a new class of network bas...
Cloud Computing is a general term used to describe a new class of network bas...
Radhika R
 
containers-in-cloud introduction to ppts
containers-in-cloud introduction to  pptscontainers-in-cloud introduction to  ppts
containers-in-cloud introduction to ppts
Radhika R
 
containerization with example module and
containerization with example module andcontainerization with example module and
containerization with example module and
Radhika R
 
containers-in-cloud introduction example
containers-in-cloud introduction examplecontainers-in-cloud introduction example
containers-in-cloud introduction example
Radhika R
 
Virtualization technology and virtualization
Virtualization technology and virtualizationVirtualization technology and virtualization
Virtualization technology and virtualization
Radhika R
 
introduction to web technology and web application
introduction to web technology and web applicationintroduction to web technology and web application
introduction to web technology and web application
Radhika R
 
Introduction to cloud delivery models of paas
Introduction to cloud delivery models of paasIntroduction to cloud delivery models of paas
Introduction to cloud delivery models of paas
Radhika R
 
Virtualization Technology Hardware Independence Server Consolidation
Virtualization Technology Hardware Independence Server ConsolidationVirtualization Technology Hardware Independence Server Consolidation
Virtualization Technology Hardware Independence Server Consolidation
Radhika R
 
Computing hardware technologies include Rackamount form factor server with in...
Computing hardware technologies include Rackamount form factor server with in...Computing hardware technologies include Rackamount form factor server with in...
Computing hardware technologies include Rackamount form factor server with in...
Radhika R
 
Data centers are typically comprised of the following technologies and compon...
Data centers are typically comprised of the following technologies and compon...Data centers are typically comprised of the following technologies and compon...
Data centers are typically comprised of the following technologies and compon...
Radhika R
 
A number of characteristics define cloud data, applications services and infr...
A number of characteristics define cloud data, applications services and infr...A number of characteristics define cloud data, applications services and infr...
A number of characteristics define cloud data, applications services and infr...
Radhika R
 
Ad

Recently uploaded (20)

Routing Riverdale - A New Bus Connection
Routing Riverdale - A New Bus ConnectionRouting Riverdale - A New Bus Connection
Routing Riverdale - A New Bus Connection
jzb7232
 
SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
Slide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptxSlide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptx
vvsasane
 
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjjseninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
AjijahamadKhaji
 
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
Nanometer Metal-Organic-Framework Literature Comparison
Nanometer Metal-Organic-Framework  Literature ComparisonNanometer Metal-Organic-Framework  Literature Comparison
Nanometer Metal-Organic-Framework Literature Comparison
Chris Harding
 
Machine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATIONMachine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATION
DarrinBright1
 
Generative AI & Large Language Models Agents
Generative AI & Large Language Models AgentsGenerative AI & Large Language Models Agents
Generative AI & Large Language Models Agents
aasgharbee22seecs
 
Novel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth ControlNovel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth Control
Chris Harding
 
Artificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptxArtificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptx
rakshanatarajan005
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
PRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Academy - Functional Modeling In Action with PRIZ.pdfPRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Guru
 
DED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedungDED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedung
nabilarizqifadhilah1
 
Computer Security Fundamentals Chapter 1
Computer Security Fundamentals Chapter 1Computer Security Fundamentals Chapter 1
Computer Security Fundamentals Chapter 1
remoteaimms
 
Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
Taqyea
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
Control Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptxControl Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptx
vvsasane
 
Routing Riverdale - A New Bus Connection
Routing Riverdale - A New Bus ConnectionRouting Riverdale - A New Bus Connection
Routing Riverdale - A New Bus Connection
jzb7232
 
SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
Slide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptxSlide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptx
vvsasane
 
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjjseninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
AjijahamadKhaji
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
Nanometer Metal-Organic-Framework Literature Comparison
Nanometer Metal-Organic-Framework  Literature ComparisonNanometer Metal-Organic-Framework  Literature Comparison
Nanometer Metal-Organic-Framework Literature Comparison
Chris Harding
 
Machine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATIONMachine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATION
DarrinBright1
 
Generative AI & Large Language Models Agents
Generative AI & Large Language Models AgentsGenerative AI & Large Language Models Agents
Generative AI & Large Language Models Agents
aasgharbee22seecs
 
Novel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth ControlNovel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth Control
Chris Harding
 
Artificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptxArtificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptx
rakshanatarajan005
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
PRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Academy - Functional Modeling In Action with PRIZ.pdfPRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Guru
 
DED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedungDED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedung
nabilarizqifadhilah1
 
Computer Security Fundamentals Chapter 1
Computer Security Fundamentals Chapter 1Computer Security Fundamentals Chapter 1
Computer Security Fundamentals Chapter 1
remoteaimms
 
Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
Taqyea
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
Control Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptxControl Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptx
vvsasane
 
Ad

NOSQL introduction for big data analytics

  • 2. • Why Map Reduce? • The rise of aggregate-oriented databases is in large part due to the growth of clusters. • Running on a cluster means you have to make your tradeoffs in data • storage differently than when running on a single machine. • Clusters don’t just change the rules for data storage—they also change the rules for computation. • If you store lots of data on a cluster, processing that data efficiently means you have to think differently about how you organize your processing.
  • 3. Centralized Database • With a centralized database, there are generally two ways you can run the processing logic against it: either on the database server itself or on a client machine. • Running it on a client machine gives you more flexibility in choosing a programming environment, which usually makes for programs that are easier to create or extend. • This comes at the cost of having to remove lots of data from the database • server. • If you need to hit a lot of data, then it makes sense to do the processing on the server, paying the price in programming convenience and increasing the load on the database server.
  • 4. Cluster • When you have a cluster, there is good news immediately—you have lots of machines to spread the computation over. • However, you also still need to try to reduce the amount of data that needs to be transferred across the network by doing as much processing as you can on the same node as the data it needs.
  • 5. Map-Reduce • The map-reduce pattern is a way to organize processing in such a way as to take advantage of multiple machines on a cluster while keeping as much processing and the data it needs together on the same machine. • It first gained prominence with Google’s MapReduce framework. • A widely used open-source implementation is part of the Hadoop project, • although several databases include their own implementations. • As with most patterns, there are differences in detail between these implementations • The name “map-reduce” reveals its inspiration from the map and reduce
  • 6. Basic Map-Reduce •Basic Idea: Let’s assume we have chosen orders as our aggregate, with each order having line items. Each line item has a product ID, quantity, and the price charged. This aggregate makes a lot of sense as usually people want to see the whole order in one access. We have lots of orders, so we’ve sharded the dataset over many machines. • However, sales analysis people want to see a product and its total revenue for the last seven days. This report doesn’t fit the aggregate structure that we have—which is the downside of using aggregates. In order to get the product revenue report, you’ll have to visit every machine in the cluster and examine many records on each machine
  • 7. Contd.. • The first stage in a map-reduce job is the map. • A map is a function whose input is a single aggregate and whose output is a bunch of key value pairs. In this case, the input would be an order. • The output would be key-value pairs corresponding to the line items. Each one would have the product ID as the key and an embedded map with the quantity and price as the values
  • 14. A Two Stage Map-Reduce Example
  • 15. Creating records for monthly sales of a product
  • 16. The second stage mapper creates base records for year-on-year comparisons
  • 17. The reduction step is a merge of incomplete records
  • 18. Incremental Map-Reduce • The examples we’ve discussed so far are complete map-reduce computations, where we start with raw inputs and create a final output. • Many map-reduce computations take a while to perform, even with clustered hardware, and new data keeps coming in which means we need to rerun the computation to keep the output up to date. • Starting from scratch each time can take too long, so often it’s useful to structure a map-reduce computation to allow incremental updates, so that only the minimum computation needs to be done. • The map stages of a map-reduce are easy to handle incrementally—only if the input data changes does the mapper need to be rerun. Since maps are isolated from each other, incremental updates are straightforward
  • 19. Contd.. • The more complex case is the reduce step, since it pulls together the outputs from many maps and any change in the map outputs could trigger a new reduction. • This recomputation can be lessened depending on how parallel the reduce step is. • If we are partitioning the data for reduction, then any partition that’s unchanged does not need to be re-reduced. • Similarly, if there’s a combiner step, it doesn’t need to be rerun if its source data hasn’t changed.
  • 20. Contd.. • If our reducer is combinable, there’s some more opportunities for computation avoidance. • If the changes are additive—that is, if we are only adding new records but are not changing or deleting any old records—then we can just run the reduce with the existing result and the new additions. • If there are destructive changes, that is updates and deletes, then we can avoid some recomputation by breaking up the reduce operation into steps and only recalculating those steps whose inputs have changed-essentially, using a Dependency Network to organize the computation.
  • 21. Key points ⚫Map-reduce is a pattern to allow computations to be parallelized over a cluster. ⚫The map task reads data from an aggregate and boils it down to relevant key-value pairs. Maps only read a single record at a time and can thus be parallelized and run on the node that stores the record. ⚫Reduce tasks take many values for a single key output from map tasks and summarize them into a single output. Each reducer operates on the result of a single key, so it can be parallelized by key. ⚫Reducers that have the same form for input and output can be combined into pipelines. This improves parallelism and reduces the amount of data to be transferred. ⚫Map-reduce operations can be composed into pipelines where the output of one reduce is the input to another operation’s map. ⚫If the result of a map-reduce computation is widely used, it can be stored as a materialized view. ⚫Materialized views can be updated through incremental map-reduce operations that only compute changes to the view instead of recomputing everything from scratch
  • 22. Key-Value Databases • A key-value store is a simple hash table, primarily used when all access to the database is via primary key. • Think of a table in a traditional RDBMS with two columns, such as ID and NAME, the ID column being the key and NAME column storing the value. • In an RDBMS, the NAME column is restricted to storing data of type string. The application can provide an ID and VALUE and persist the pair; if the ID already exists the current value is overwritten, otherwise a new entry is created.
  • 24. What Is a Key-Value Store • Key-value stores are the simplest NoSQL data stores to use from an API perspective. • The client can either get the value for the key, put a value for a key, or delete a key from the data store. • The value is a blob that the data store just stores, without caring or knowing what’s inside; it’s the responsibility of the application to understand what was stored. • Since key-value stores always use primary-key access, they generally have great performance and can be easily scaled.
  • 25. Popular key-value databases • Riak [Riak] • Redis (often referred to as Data Structure server) [Redis] • Memcached DB and its flavors [Memcached] • Berkeley DB [Berkeley DB] • HamsterDB (especially suited for embedded use) [HamsterDB] • Amazon DynamoDB [Amazon’s Dynamo] (not open-source), and • Project Voldemort [Project Voldemort] (an open-source implementation of Amazon DynamoDB).
  • 26. Storing all the data in a single bucket
  • 27. Domain Buckets • We could also create buckets which store specific data. In Riak, they are known as domain buckets allowing the serialization and deserialization to be handled by the client driver. • Using domain buckets or different buckets for different objects (such as UserProfile and ShoppingCart) segments the data across different buckets allowing you to read only the object you need without having to change key design. • Bucket • bucket=client.fetchBucket(bucketName).execute(); DomainBucket<UserProfile>profileBucket= DomainBucket.builder(bucket, UserProfile.class).build();
  • 28. Key-Value Store Features • Consistency • Transactions • Query Features • Structure of Data • Scaling
  • 29. Consistency • Consistency is applicable only for operations on a single key, since these operations are either a get, put, or delete on a single key. • Optimistic writes can be performed, but are very expensive to implement, because a change in value cannot be determined by the data store. • In distributed key-value store implementations like Riak, the eventually consistent model of consistency is implemented. • Since the value may have already been replicated to other nodes, Riak has two ways of resolving update conflicts: either the newest write wins and older writes loose, or both (all) values are returned allowing the client to resolve the conflict
  • 30. Contd.. • In Riak, these options can be set up during the bucket creation. Buckets are just a way to namespace keys so that key collisions can be reduced—for example, all customer keys may reside in the customer bucket. • When creating a bucket, default values for consistency can be provided, for example that a write is considered good only when the data is consistent across all the nodes where the data is stored.
  • 31. Transactions • Different products of the key-value store kind have different specifications of transactions. Generally speaking, there are no guarantees on the writes. Many data stores do implement transactions in different ways. • Riak uses the concept of quorum implemented by using the W value —replication factor—during the write API call. • Assume we have a Riak cluster with a replication factor of 5 and we supply the W value of 3. • When writing, the write is reported as successful only when it is written and reported as a success on at least three of the nodes. • This allows Riak to have write tolerance; in our example, with N equal to 5and with a W value of 3, the cluster can tolerate N - W = 2 nodes being down for write operations, though we would still have lost some data on those nodes for read.
  • 32. Query Features • All key-value stores can query by the key—and that’s about it. • If you have requirements to query by using some attribute of the value column, it’s not possible to use the database: Your application needs to read the value to figure out if the attribute meets the conditions. • Query by key also has an interesting side effect. • What if we don’t know the key, especially during ad-hoc querying during debugging? Most of the data stores will not give you a list of all the primary keys; even if they did, retrieving lists of keys and then querying for the value would be very cumbersome. • Some key-value databases get around this by providing the ability to search inside the value, such as Riak Search that allows you to query the data just like you would query it using indexes.
  • 33. Contd.. • While using key-value stores, lots of thought has to be given to the design of the key. • Can the key be generated using some algorithm? Can the key be provided by the user (user ID, email, etc.)? Or derived from timestamps or other data that can be derived outside of the database? • These query characteristics make key-value stores likely candidates for storing session data (with the session ID as the key), shopping cart data, user profiles, and so on. • The expiry_secs property can be used to expire keys after a certain time interval, especially for session/shopping cart objects.
  • 34. Structure of Data • Key-value databases don’t care what is stored in the value part of the key-value pair. • The value can be a blob, text, JSON, XML, and so on. • In Riak, we can use the Content-Type in the POST request to specify the data type.
  • 35. Scaling • Many key-value stores scale by using sharding. • With sharding, the value of the key determines on which node the key is • stored. • Let’s assume we are sharding by the first character of the key; if the key is f4b19d79587d, which starts with an f, it will be sent to different node than the key ad9c7a396542. • This kind of sharding setup can increase performance as more nodes are • added to the cluster. • Sharding also introduces some problems. If the node used to store f goes down, the data stored on that node becomes unavailable, nor can new data be written with keys that start with f.
  • 36. Contd.. • Data stores such as Riak allow you to control the aspects of the CAP Theorem: • N (number of nodes to store the key-value replicas), • R (number of nodes that have to have the data being fetched before the read is considered successful), and • W (the number of nodes the write has to be written to before it is considered successful).
  • 37. Contd.. • Let’s assume we have a 5-node Riak cluster. • Setting N to 3 means that all data is replicated to at least three nodes, setting R to 2 means any two nodes must reply to a GET request for it to be considered successful, and setting W to 2 ensures that the PUT request is written to two nodes before the write is considered successful. • These settings allow us to fine-tune node failures for read or write operations. Based on our need, we can change these values for better read availability or write availability. • Generally speaking choose a W value to match your consistency needs; these • values can be set as defaults during bucket creation.
  • 38. Suitable Use Cases • Storing Session Information • User Profiles, Preferences • Shopping Cart Data
  • 39. Storing Session Information • Generally, every web session is unique and is assigned a unique session id value. • Applications that store the session id on disk or in an RDBMS will greatly benefit from moving to a key-value store, since everything about the session can be stored by a single PUT request or retrieved using GET. • This single-request operation makes it very fast, as everything about the session is stored in a single object. • Solutions such as Mem cached are used by many web applications, and Riak can be used when availability is important.
  • 40. User Profiles, Preferences • Almost every user has a unique user Id, username, or some other attribute, as well as preferences such as language, color, time zone, which products the user has access to, and so on. • This can all be put into an object, so getting preferences of a user takes a single GET operation. • Similarly, product profiles can be stored.
  • 41. Shopping Cart Data • E-commerce websites have shopping carts tied to the user. • As we want the shopping carts to be available all the time, across browsers, machines, and sessions, all the shopping information can be put into the value where the key is the user id. • A Riak cluster would be best suited for these kinds of applications.
  • 42. When Not to Use • Relationships among Data: If you need to have relationships between different sets of data, or correlate the data between different sets of keys, key-value stores are not the best solution to use, even though some key-value stores provide link-walking features. • Multioperation Transactions: If you’re saving multiple keys and there is a failure to save any one of them, and you want to revert or roll back the rest of the operations, key-value stores are not the best solution to be used. • Query by Data: If you need to search the keys based on something found in the value part of the key-value pairs, then key-value stores are not going to perform well for you. There is no way to inspect the value on the database side, with the exception of some products like Riak Search or indexing engines like Lucene or Solr. • Operations by Sets: Since operations are limited to one key at a time, there is no way to operate upon multiple keys at the same time. If you need to operate upon multiple keys, you have to handle this from the client side.
  翻译: