SlideShare a Scribd company logo
Using MongoDB for BigData
May, 2017
... In 20 minutes
Agenda
• What is MongoDB?
• Replica Sets
• Sharding and Sharded Cluster
• MapReduce
• Spark and MongoDB
What is MongoDB?
• Open-source
• NoSQL
• Community / Enterprise versions
• Developed by MongoDB Inc. (formerly 10gen) in C++, C and JavaScript
• Cross-platform: Windows, Linux, OS X, Solaris, FreeBSD
• Document-oriented: stores extended binary JSON = BSON documents
• Stores any binary data like videos, pictures ... in GridFS
• Database development in JavaScript (standard libraries and user defined functions)
• Deploy, monitor, back up and scale MongoDB: Ops Manager
• Use MongoDB as a data source for your SQL-based BI: MongoDB Connector for BI,
SlamData
• Cross-platform UI for development: Robomongo
• Hosted MongoDB as a service: MongoDB Atlas
• Hosted platform for managing MongoDB: MongoDB Cloud Manager
• An another cloud provider: mLab
Some Organizations Rely on MongoDB
DB-Engines.com Top 10
Most popular NoSQL database
Terminology
RDBMS MongoDB
Database Database
Table Collection
Row Document
Index Index
Join Embedding
Partition Shard
Partition Key Shard Key
use blogdb
db.blog.insert({
"title" : "My Blog Post",
"content" : "Here's my blog post.",
"date" : ISODate("2016-08-24T21:12:09.982Z")
});
db.blog.find()
{ "_id" : ObjectId("591be0f4c79fa21e08c2e24e"), "title" : "My Blog Post", "content" :
"Here's my blog post.", "date" : ISODate("2016-08-24T21:12:09.982Z") }
Some Examples
collection object
method
current database object
generated universally unique primary key
db.blog.update(
{"_id" : ObjectId("591be0f4c79fa21e08c2e24e")},
{$push:
{"comments":
{"user":"usr1", "date":ISODate("2016-08-24T22:12:09.982Z"), text:"first comment"}
}})
db.blog.find().pretty()
{
"_id" : ObjectId("591be0f4c79fa21e08c2e24e"),
"title" : "My Blog Post",
"content" : "Here's my blog post.",
"date" : ISODate("2016-08-24T21:12:09.982Z"),
"comments" : [
{
"user" : "usr1",
"date" : ISODate("2016-08-24T22:12:09.982Z"),
"text" : "first comment"
}
]
}
Some Examples
embedding
orders collection:
MapReduce Example
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 25,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
Replica Sets: high availability
Read (optional) Read (optional)
Replication
Replica sets: automatic failover
Sharded Cluster
Shards are replica sets
Sharding
Ranged
Hashed
Zone
Manually associate shard key ranges to zones (groups of shards)
MongoDB connector to Apache Spark
Can be sharded
clusters too!
Data can be filtered,
aggregated at MongoDB
level
• Speedy
• Highly available
• Flexible data model
• Simple to use
• Infinite data size
BUT
• Sharded Cluster deployment requires planning!
Summary
• Install a MongoDB server / sign up to a free hosted MongoDB service like mLab sandbox
• Load the postcodes.zip data file using the mongoimport utility. If you use a MongoDB service, you will
need to install MongoDB client on your machine first.
• Create a Btree index on place.name, postal_code, place.name + place.country and
place.country fields
• Create a 2dsphere index on place.loc
• Add the {"postal_code" : "38116", "place" : { "name" : "Graceland", "country"
:"US", "state" : "Memphis", "loc" : [ 19.0419, 47.5328 ] } } document to the
collection
• Change the place.loc field of the same document to [-90.02604930000001, 35.0476912]
• Add the field owner: Lisa Marie Presley to the same document. Observe that the structure of
the document is different from the other documents of the collection.
Send me the queries that answer to the following questions:
• What is the value of the postal code of Graceland/Memphis? We need only the {"postal_code" :
"38116"} document, fields other than postal_code are not acceptable!
• How many postal_codes are in Budapest/Hungary?
• When was the "59199cdff0269ea12235e9dc" ObjectId created?
• Top 5 countries by number of documents in descending order
• Which places are within 20km around longitude -90.02604930000001 and latitude 35.0476912
(Graceland)? The result must be sorted in alphabetical order and each place appear in the result only
once (distinct).
Homework
Questions
?
Thank You!
Ad

More Related Content

What's hot (20)

MongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBookMongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBook
MongoDB
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB
 
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Redis & MongoDB: Stop Big Data Indigestion Before It StartsRedis & MongoDB: Stop Big Data Indigestion Before It Starts
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Itamar Haber
 
Introduction to mongoDB
Introduction to mongoDBIntroduction to mongoDB
Introduction to mongoDB
Cuelogic Technologies Pvt. Ltd.
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive
 
MongoDB NYC Python
MongoDB NYC PythonMongoDB NYC Python
MongoDB NYC Python
Mike Dirolf
 
Mongodb
MongodbMongodb
Mongodb
foliba
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcript
foliba
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQL
Joe Drumgoole
 
Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation conf
Shridhar Joshi
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
Itamar Haber
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
NodeXperts
 
MongoDB Hadoop DC
MongoDB Hadoop DCMongoDB Hadoop DC
MongoDB Hadoop DC
Mike Dirolf
 
Monogo db in-action
Monogo db in-actionMonogo db in-action
Monogo db in-action
Chi Lee
 
MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & Performance
Sasidhar Gogulapati
 
Mongodb
MongodbMongodb
Mongodb
Apurva Vyas
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Introduction to MongoDB (Webinar Jan 2011)
Introduction to MongoDB (Webinar Jan 2011)Introduction to MongoDB (Webinar Jan 2011)
Introduction to MongoDB (Webinar Jan 2011)
MongoDB
 
MongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBookMongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBook
MongoDB
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB
 
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Redis & MongoDB: Stop Big Data Indigestion Before It StartsRedis & MongoDB: Stop Big Data Indigestion Before It Starts
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Itamar Haber
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive
 
MongoDB NYC Python
MongoDB NYC PythonMongoDB NYC Python
MongoDB NYC Python
Mike Dirolf
 
Mongodb
MongodbMongodb
Mongodb
foliba
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcript
foliba
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQL
Joe Drumgoole
 
Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation conf
Shridhar Joshi
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
Itamar Haber
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
NodeXperts
 
MongoDB Hadoop DC
MongoDB Hadoop DCMongoDB Hadoop DC
MongoDB Hadoop DC
Mike Dirolf
 
Monogo db in-action
Monogo db in-actionMonogo db in-action
Monogo db in-action
Chi Lee
 
MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & Performance
Sasidhar Gogulapati
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Introduction to MongoDB (Webinar Jan 2011)
Introduction to MongoDB (Webinar Jan 2011)Introduction to MongoDB (Webinar Jan 2011)
Introduction to MongoDB (Webinar Jan 2011)
MongoDB
 

Similar to Using MongoDB For BigData in 20 Minutes (20)

Mongo db and hadoop driving business insights - final
Mongo db and hadoop   driving business insights - finalMongo db and hadoop   driving business insights - final
Mongo db and hadoop driving business insights - final
MongoDB
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
Serdar Buyuktemiz
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
Brian Enochson
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js Tutorial
PHP Support
 
MongoDB
MongoDBMongoDB
MongoDB
Albin John
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
Norberto Leite
 
MongoDB
MongoDBMongoDB
MongoDB
Serdar Buyuktemiz
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
Jimmy Ray
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
Hyphen Call
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
MongoDB
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
Introduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQLIntroduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQL
Mayur Patil
 
Back to Basics German 3: Einführung ins Sharding
Back to Basics German 3: Einführung ins ShardingBack to Basics German 3: Einführung ins Sharding
Back to Basics German 3: Einführung ins Sharding
MongoDB
 
MongoDB and Ruby on Rails
MongoDB and Ruby on RailsMongoDB and Ruby on Rails
MongoDB and Ruby on Rails
rfischer20
 
MongoDB is a document database. It stores data in a type of JSON format calle...
MongoDB is a document database. It stores data in a type of JSON format calle...MongoDB is a document database. It stores data in a type of JSON format calle...
MongoDB is a document database. It stores data in a type of JSON format calle...
amintafernandos
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
HabileLabs
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo
Michael Bright
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
Mike Bright
 
Mongo db and hadoop driving business insights - final
Mongo db and hadoop   driving business insights - finalMongo db and hadoop   driving business insights - final
Mongo db and hadoop driving business insights - final
MongoDB
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
Serdar Buyuktemiz
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
Brian Enochson
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js Tutorial
PHP Support
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
Norberto Leite
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
Jimmy Ray
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
Hyphen Call
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
MongoDB
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
Introduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQLIntroduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQL
Mayur Patil
 
Back to Basics German 3: Einführung ins Sharding
Back to Basics German 3: Einführung ins ShardingBack to Basics German 3: Einführung ins Sharding
Back to Basics German 3: Einführung ins Sharding
MongoDB
 
MongoDB and Ruby on Rails
MongoDB and Ruby on RailsMongoDB and Ruby on Rails
MongoDB and Ruby on Rails
rfischer20
 
MongoDB is a document database. It stores data in a type of JSON format calle...
MongoDB is a document database. It stores data in a type of JSON format calle...MongoDB is a document database. It stores data in a type of JSON format calle...
MongoDB is a document database. It stores data in a type of JSON format calle...
amintafernandos
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
HabileLabs
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo
Michael Bright
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
Mike Bright
 
Ad

Recently uploaded (20)

Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
Chapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptxChapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptx
PermissionTafadzwaCh
 
Voice Control robotic arm hggyghghgjgjhgjg
Voice Control robotic arm hggyghghgjgjhgjgVoice Control robotic arm hggyghghgjgjhgjg
Voice Control robotic arm hggyghghgjgjhgjg
4mg22ec401
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Ann Naser Nabil- Data Scientist Portfolio.pdf
Ann Naser Nabil- Data Scientist Portfolio.pdfAnn Naser Nabil- Data Scientist Portfolio.pdf
Ann Naser Nabil- Data Scientist Portfolio.pdf
আন্ নাসের নাবিল
 
Automation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success storyAutomation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success story
Process mining Evangelist
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
Multi-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline OrchestrationMulti-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline Orchestration
Romi Kuntsman
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
CS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docxCS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docx
nidarizvitit
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
Mining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - MicrosoftMining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - Microsoft
Process mining Evangelist
 
Agricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptxAgricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptx
mostafaahammed38
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
Controlling Financial Processes at a Municipality
Controlling Financial Processes at a MunicipalityControlling Financial Processes at a Municipality
Controlling Financial Processes at a Municipality
Process mining Evangelist
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
Chapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptxChapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptx
PermissionTafadzwaCh
 
Voice Control robotic arm hggyghghgjgjhgjg
Voice Control robotic arm hggyghghgjgjhgjgVoice Control robotic arm hggyghghgjgjhgjg
Voice Control robotic arm hggyghghgjgjhgjg
4mg22ec401
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Automation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success storyAutomation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success story
Process mining Evangelist
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
Multi-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline OrchestrationMulti-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline Orchestration
Romi Kuntsman
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
CS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docxCS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docx
nidarizvitit
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
Mining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - MicrosoftMining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - Microsoft
Process mining Evangelist
 
Agricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptxAgricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptx
mostafaahammed38
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
Controlling Financial Processes at a Municipality
Controlling Financial Processes at a MunicipalityControlling Financial Processes at a Municipality
Controlling Financial Processes at a Municipality
Process mining Evangelist
 
Ad

Using MongoDB For BigData in 20 Minutes

  • 1. Using MongoDB for BigData May, 2017 ... In 20 minutes
  • 2. Agenda • What is MongoDB? • Replica Sets • Sharding and Sharded Cluster • MapReduce • Spark and MongoDB
  • 3. What is MongoDB? • Open-source • NoSQL • Community / Enterprise versions • Developed by MongoDB Inc. (formerly 10gen) in C++, C and JavaScript • Cross-platform: Windows, Linux, OS X, Solaris, FreeBSD • Document-oriented: stores extended binary JSON = BSON documents • Stores any binary data like videos, pictures ... in GridFS • Database development in JavaScript (standard libraries and user defined functions) • Deploy, monitor, back up and scale MongoDB: Ops Manager • Use MongoDB as a data source for your SQL-based BI: MongoDB Connector for BI, SlamData • Cross-platform UI for development: Robomongo • Hosted MongoDB as a service: MongoDB Atlas • Hosted platform for managing MongoDB: MongoDB Cloud Manager • An another cloud provider: mLab
  • 5. DB-Engines.com Top 10 Most popular NoSQL database
  • 6. Terminology RDBMS MongoDB Database Database Table Collection Row Document Index Index Join Embedding Partition Shard Partition Key Shard Key
  • 7. use blogdb db.blog.insert({ "title" : "My Blog Post", "content" : "Here's my blog post.", "date" : ISODate("2016-08-24T21:12:09.982Z") }); db.blog.find() { "_id" : ObjectId("591be0f4c79fa21e08c2e24e"), "title" : "My Blog Post", "content" : "Here's my blog post.", "date" : ISODate("2016-08-24T21:12:09.982Z") } Some Examples collection object method current database object generated universally unique primary key
  • 8. db.blog.update( {"_id" : ObjectId("591be0f4c79fa21e08c2e24e")}, {$push: {"comments": {"user":"usr1", "date":ISODate("2016-08-24T22:12:09.982Z"), text:"first comment"} }}) db.blog.find().pretty() { "_id" : ObjectId("591be0f4c79fa21e08c2e24e"), "title" : "My Blog Post", "content" : "Here's my blog post.", "date" : ISODate("2016-08-24T21:12:09.982Z"), "comments" : [ { "user" : "usr1", "date" : ISODate("2016-08-24T22:12:09.982Z"), "text" : "first comment" } ] } Some Examples embedding
  • 9. orders collection: MapReduce Example { _id: ObjectId("50a8240b927d5d8b5891743c"), cust_id: "abc123", ord_date: new Date("Oct 04, 2012"), status: 'A', price: 25, items: [ { sku: "mmm", qty: 5, price: 2.5 }, { sku: "nnn", qty: 5, price: 2.5 } ] } var mapFunction1 = function() { emit(this.cust_id, this.price); }; var reduceFunction1 = function(keyCustId, valuesPrices) { return Array.sum(valuesPrices); }; db.orders.mapReduce( mapFunction1, reduceFunction1, { out: "map_reduce_example" } )
  • 10. Replica Sets: high availability Read (optional) Read (optional) Replication
  • 13. Sharding Ranged Hashed Zone Manually associate shard key ranges to zones (groups of shards)
  • 14. MongoDB connector to Apache Spark Can be sharded clusters too! Data can be filtered, aggregated at MongoDB level
  • 15. • Speedy • Highly available • Flexible data model • Simple to use • Infinite data size BUT • Sharded Cluster deployment requires planning! Summary
  • 16. • Install a MongoDB server / sign up to a free hosted MongoDB service like mLab sandbox • Load the postcodes.zip data file using the mongoimport utility. If you use a MongoDB service, you will need to install MongoDB client on your machine first. • Create a Btree index on place.name, postal_code, place.name + place.country and place.country fields • Create a 2dsphere index on place.loc • Add the {"postal_code" : "38116", "place" : { "name" : "Graceland", "country" :"US", "state" : "Memphis", "loc" : [ 19.0419, 47.5328 ] } } document to the collection • Change the place.loc field of the same document to [-90.02604930000001, 35.0476912] • Add the field owner: Lisa Marie Presley to the same document. Observe that the structure of the document is different from the other documents of the collection. Send me the queries that answer to the following questions: • What is the value of the postal code of Graceland/Memphis? We need only the {"postal_code" : "38116"} document, fields other than postal_code are not acceptable! • How many postal_codes are in Budapest/Hungary? • When was the "59199cdff0269ea12235e9dc" ObjectId created? • Top 5 countries by number of documents in descending order • Which places are within 20km around longitude -90.02604930000001 and latitude 35.0476912 (Graceland)? The result must be sorted in alphabetical order and each place appear in the result only once (distinct). Homework
  翻译: