SlideShare a Scribd company logo
MongoDB and
the Oplog
EVAN BRODER @ebroder
AGENDA
INTRO TO THE OPLOG
EXAMPLE APPLICATIONS
INTRO
TO THE OPLOG
PRIMARY
SECONDARIES
APPLICATION
APPLICATION
save
{_id: 1, a: 2}
THINGS I’VE DONE:
- save {_id: 1, a: 2}
APPLICATION
update where
{a: 2},
{$set: {a: 3}}
THINGS I’VE DONE:
- save {_id: 1, a: 2}
- update {_id: 1},
{$set: {a: 3}}
THINGS I’VE DONE:
- save {_id: 1, a: 2}
- update {_id: 1},
{$set: {a: 3}}
- insert…
- delete…
- delete…
- save…
- update…
THINGS I’VE
DONE:
save
…
THINGS I’VE
DONE:
THINGS I’VE
DONE:
save
…
THINGS I’VE
DONE:
save
…
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
TRIGGERS
GOAL:
EVENT PROCESSING
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
GOAL:
DETECT INSERTIONS
Building Real Time Systems on MongoDB Using the Oplog at Stripe
WARNING
THIS CODE IS NOT
PRODUCTION-READY
oplog = mongo_connection['local']['oplog.rs']
ns = 'eventdb.events'
oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|
cursor.each do |op|
puts op['o']['_id']
end
end
oplog = mongo_connection['local']['oplog.rs']
ns = 'eventdb.events'
oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|
cursor.each do |op|
puts op['o']['_id']
end
end
oplog = mongo_connection['local']['oplog.rs']
ns = 'eventdb.events'
oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|
cursor.each do |op|
puts op['o']['_id']
end
end
oplog = mongo_connection['local']['oplog.rs']
ns = 'eventdb.events'
oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|
cursor.each do |op|
puts op['o']['_id']
end
end
oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)
cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)
loop do
cursor.each do |op|
puts op['o']['_id']
end
end
end
oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)
cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)
loop do
cursor.each do |op|
puts op['o']['_id']
end
end
end
oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)
cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)
loop do
cursor.each do |op|
puts op['o']['_id']
end
end
end
oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)
cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)
loop do
cursor.each do |op|
puts op['o']['_id']
end
end
end
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
start_entry = oplog.find_one({},
{:sort => {'$natural' => -1}})
start = start_entry['ts']
oplog.find({'ts' => {'$gt' => start},
'op' => 'i',
'ns' => ns}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)
cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)
loop do
cursor.each do |op|
puts op['o']['_id']
end
start_entry = oplog.find_one({},
{:sort => {'$natural' => -1}})
start = start_entry['ts']
oplog.find({'ts' => {'$gt' => start},
'op' => 'i',
'ns' => ns}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)
cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)
cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)
loop do
cursor.each do |op|
puts op['o']['_id']
start_entry = oplog.find_one({},
{:sort => {'$natural' => -1}})
start = start_entry['ts']
oplog.find({'ts' => {'$gt' => start},
'op' => 'i',
'ns' => ns}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)
cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)
cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)
loop do
cursor.each do |op|
puts op['o']['_id']
end
end
end
start_entry = oplog.find_one({},
{:sort => {'$natural' => -1}})
start = start_entry['ts']
oplog.find({'ts' => {'$gt' => start},
'op' => 'i',
'ns' => ns}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)
cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)
cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)
loop do
cursor.each do |op|
puts op['o']['_id']
end
end
end
DATA
TRANSFORMATIONS
GOAL:
MONGODB TO POSTGRESQL
start_entry = oplog.find_one({},
{:sort => {'$natural' => -1}})
start = start_entry['ts']
oplog.find({'ts' => {'$gt' => start},
'op' => 'i',
'ns' => ns}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)
cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)
cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)
loop do
cursor.each do |op|
puts op['o']['_id']
end
end
end
start_entry = oplog.find_one({},
{:sort => {'$natural' => -1}})
start = start_entry['ts']
oplog.find({'ts' => {'$gt' => start}}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)
cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)
cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)
loop do
cursor.each do |op|
puts op['o']['_id']
end
end
end
cursor.each do |op|
puts op['o']['_id']
end
cursor.each do |op|
case op['op']
when 'i'
puts op['o']['_id']
else
# ¯_(ツ)_/¯
end
end
cursor.each do |op|
case op['op']
when 'i'
query = "INSERT INTO #{op['ns']} (" +
op['o'].keys.join(', ') +
') VALUES (' +
op['o'].values.map(&:inspect).join(', ') + ')'
else
# ¯_(ツ)_/¯
end
end
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
cursor.each do |op|
case op['op']
when 'i'
query = "INSERT INTO #{op['ns']} (" +
op['o'].keys.join(', ') +
') VALUES (' +
op['o'].values.map(&:inspect).join(', ') + ')'
when 'd'
query = "DELETE FROM #{op['ns']} WHERE _id=" +
op['o']['_id'].inspect
else
# ¯_(ツ)_/¯
end
end
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
query = "DELETE FROM #{op['ns']} WHERE _id=" +
op['o']['_id'].inspect
when 'u'
query = "UPDATE #{op['ns']} SET"
updates = op['o']['$set'] ? op['o']['$set'] : op['o']
updates.each do |k, v|
query += " #{k}=#{v.inspect}"
end
query += " WHERE _id="
query += op['o2']['_id'].inspect
else
# ¯_(ツ)_/¯
end
end
cursor.each do |op|
case op['op']
when 'i'
query = "INSERT INTO #{op['ns']} (" +
op['o'].keys.join(', ') + ') VALUES (' +
op['o'].values.map(&:inspect).join(', ') + ')'
when 'd'
query = "DELETE FROM #{op['ns']} WHERE _id=" +
op['o']['_id'].inspect
when 'u'
query = "UPDATE #{op['ns']} SET"
updates = op['o']['$set'] ? op['o']['$set'] : op['o']
updates.each do |k, v|
query += " #{k}=#{v.inspect}"
end
query += " WHERE _id=" + op['o2']['_id'].inspect
else
# ¯_(ツ)_/¯
end
end
github.com/stripe/mosql
github.com/stripe/zerowing
cursor.each do |op|
case op['op']
when 'i'
query = "INSERT INTO #{op['ns']} (" +
op['o'].keys.join(', ') + ') VALUES (' +
op['o'].values.map(&:inspect).join(', ') + ')'
when 'd'
query = "DELETE FROM #{op['ns']} WHERE _id=" +
op['o']['_id'].inspect
when 'u'
query = "UPDATE #{op['ns']} SET"
updates = op['o']['$set'] ? op['o']['$set'] : op['o']
updates.each do |k, v|
query += " #{k}=#{v.inspect}"
end
query += " WHERE _id=" + op['o2']['_id'].inspect
else
# ¯_(ツ)_/¯
end
end
DISASTER
RECOVERY
task = collection.find_one({'finished' => nil}
# do something with task…
collection.update({'_id' => task.id},
{'$set' => {'finished' => Time.now.to_i}})
loop do
collection.remove(
{'finished' => {'$lt' => Time.now.to_i - 30}})
sleep(10)
end
evan@caron:~$ mongo
MongoDB shell version: 2.4.10
connecting to: test
normal:PRIMARY> null < (Date.now() / 1000) - 30
true
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
THINGS I’VE
DONE:
insert
delete
…
THINGS I’VE
DONE:
> db.getReplicationInfo()
{
"logSizeMB" : 48964.3541015625,
"usedMB" : 46116.4,
"timeDiff" : 316550,
"timeDiffHours" : 87.93,
"tFirst" : "Thu Apr 11 2013 07:24:29 GMT+0000 (UTC)",
"tLast" : "Sun Apr 14 2013 23:20:19 GMT+0000 (UTC)",
"now" : "Sat May 24 2014 07:52:35 GMT+0000 (UTC)"
}
> db.getReplicationInfo()
{
"logSizeMB" : 48964.3541015625,
"usedMB" : 46116.4,
"timeDiff" : 316550,
"timeDiffHours" : 87.93,
"tFirst" : "Thu Apr 11 2013 07:24:29 GMT+0000 (UTC)",
"tLast" : "Sun Apr 14 2013 23:20:19 GMT+0000 (UTC)",
"now" : "Sat May 24 2014 07:52:35 GMT+0000 (UTC)"
}
new_oplog.find({'ts' => {'$gt' => start}}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)
cursor.each do |op|
if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks'
old_task = old_tasks.find_one({'_id' => op['o']['_id']})
if old_task['finished'] == nil
# found one!
# save old_task to a file, and we'll re-queue it later
end
end
old_connection['admin'].command({'applyOps' => [op]})
end
end
new_oplog.find({'ts' => {'$gt' => start}}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)
cursor.each do |op|
if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks'
old_task = old_tasks.find_one({'_id' => op['o']['_id']})
if old_task['finished'] == false
# found one!
# save old_task to a file, and we'll re-queue it later
end
end
old_connection['admin'].command({'applyOps' => [op]})
end
end
new_oplog.find({'ts' => {'$gt' => start}}) do |cursor|
cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)
cursor.each do |op|
if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks'
old_task = old_tasks.find_one({'_id' => op['o']['_id']})
if old_task['finished'] == false
# found one!
# save old_task to a file, and we'll re-queue it later
end
end
old_connection['admin'].command({'applyOps' => [op]})
end
end
THINGS I’VE
DONE:
save
…
THINGS I’VE
DONE:
save
…
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
QUESTIONS?
Ad

More Related Content

What's hot (7)

Abstract data types (adt) intro to data structure part 2
Abstract data types (adt)   intro to data structure part 2Abstract data types (adt)   intro to data structure part 2
Abstract data types (adt) intro to data structure part 2
Self-Employed
 
Stack using Linked List
Stack using Linked ListStack using Linked List
Stack using Linked List
Sayantan Sur
 
Stack push pop
Stack push popStack push pop
Stack push pop
A. S. M. Shafi
 
Heaps
HeapsHeaps
Heaps
Hafiz Atif Amin
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
Queue as data_structure
Queue as data_structureQueue as data_structure
Queue as data_structure
eShikshak
 
Insertion sort bubble sort selection sort
Insertion sort bubble sort  selection sortInsertion sort bubble sort  selection sort
Insertion sort bubble sort selection sort
Ummar Hayat
 
Abstract data types (adt) intro to data structure part 2
Abstract data types (adt)   intro to data structure part 2Abstract data types (adt)   intro to data structure part 2
Abstract data types (adt) intro to data structure part 2
Self-Employed
 
Stack using Linked List
Stack using Linked ListStack using Linked List
Stack using Linked List
Sayantan Sur
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
Queue as data_structure
Queue as data_structureQueue as data_structure
Queue as data_structure
eShikshak
 
Insertion sort bubble sort selection sort
Insertion sort bubble sort  selection sortInsertion sort bubble sort  selection sort
Insertion sort bubble sort selection sort
Ummar Hayat
 

Viewers also liked (20)

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
Dan Harvey
 
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at StripeBuilding Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
MongoDB
 
Stripe CTF3 wrap-up
Stripe CTF3 wrap-upStripe CTF3 wrap-up
Stripe CTF3 wrap-up
Stripe
 
Payments Made Easy with Stripe
Payments Made Easy with StripePayments Made Easy with Stripe
Payments Made Easy with Stripe
Shawn Hooper
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
confluent
 
Exploring the replication in MongoDB
Exploring the replication in MongoDBExploring the replication in MongoDB
Exploring the replication in MongoDB
Igor Donchovski
 
Introduction to Reactive Microservices Architecture.
Introduction to Reactive Microservices Architecture.Introduction to Reactive Microservices Architecture.
Introduction to Reactive Microservices Architecture.
Richard Langlois P. Eng.
 
New merchantapplication
New merchantapplicationNew merchantapplication
New merchantapplication
rgater
 
Walking the Walk: Developing the MongoDB Backup Service with MongoDB
Walking the Walk: Developing the MongoDB Backup Service with MongoDBWalking the Walk: Developing the MongoDB Backup Service with MongoDB
Walking the Walk: Developing the MongoDB Backup Service with MongoDB
MongoDB
 
Unified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logUnified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified log
Alexander Dean
 
Lightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBLightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDB
MongoDB
 
Magento Stripe Payments
Magento Stripe PaymentsMagento Stripe Payments
Magento Stripe Payments
Webkul Software Pvt. Ltd.
 
Web payment system
Web payment system Web payment system
Web payment system
Syed Waqar Hussain Shah
 
Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
Reaching Out Through Live Streaming
Reaching Out Through Live StreamingReaching Out Through Live Streaming
Reaching Out Through Live Streaming
Inner Ear
 
Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012
hungarianhc
 
Webinar: MongoDB for Content Management
Webinar: MongoDB for Content ManagementWebinar: MongoDB for Content Management
Webinar: MongoDB for Content Management
MongoDB
 
Constructing Web APIs with Rack, Sinatra and MongoDB
Constructing Web APIs with Rack, Sinatra and MongoDBConstructing Web APIs with Rack, Sinatra and MongoDB
Constructing Web APIs with Rack, Sinatra and MongoDB
Oisin Hurley
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
MongoDB
 
[daddly] Stripe勉強会 運用編 2016/11/30
[daddly] Stripe勉強会 運用編 2016/11/30[daddly] Stripe勉強会 運用編 2016/11/30
[daddly] Stripe勉強会 運用編 2016/11/30
Naoshi ONO
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
Dan Harvey
 
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at StripeBuilding Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
MongoDB
 
Stripe CTF3 wrap-up
Stripe CTF3 wrap-upStripe CTF3 wrap-up
Stripe CTF3 wrap-up
Stripe
 
Payments Made Easy with Stripe
Payments Made Easy with StripePayments Made Easy with Stripe
Payments Made Easy with Stripe
Shawn Hooper
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
confluent
 
Exploring the replication in MongoDB
Exploring the replication in MongoDBExploring the replication in MongoDB
Exploring the replication in MongoDB
Igor Donchovski
 
Introduction to Reactive Microservices Architecture.
Introduction to Reactive Microservices Architecture.Introduction to Reactive Microservices Architecture.
Introduction to Reactive Microservices Architecture.
Richard Langlois P. Eng.
 
New merchantapplication
New merchantapplicationNew merchantapplication
New merchantapplication
rgater
 
Walking the Walk: Developing the MongoDB Backup Service with MongoDB
Walking the Walk: Developing the MongoDB Backup Service with MongoDBWalking the Walk: Developing the MongoDB Backup Service with MongoDB
Walking the Walk: Developing the MongoDB Backup Service with MongoDB
MongoDB
 
Unified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logUnified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified log
Alexander Dean
 
Lightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBLightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDB
MongoDB
 
Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
Reaching Out Through Live Streaming
Reaching Out Through Live StreamingReaching Out Through Live Streaming
Reaching Out Through Live Streaming
Inner Ear
 
Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012
hungarianhc
 
Webinar: MongoDB for Content Management
Webinar: MongoDB for Content ManagementWebinar: MongoDB for Content Management
Webinar: MongoDB for Content Management
MongoDB
 
Constructing Web APIs with Rack, Sinatra and MongoDB
Constructing Web APIs with Rack, Sinatra and MongoDBConstructing Web APIs with Rack, Sinatra and MongoDB
Constructing Web APIs with Rack, Sinatra and MongoDB
Oisin Hurley
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
MongoDB
 
[daddly] Stripe勉強会 運用編 2016/11/30
[daddly] Stripe勉強会 運用編 2016/11/30[daddly] Stripe勉強会 運用編 2016/11/30
[daddly] Stripe勉強会 運用編 2016/11/30
Naoshi ONO
 
Ad

Similar to Building Real Time Systems on MongoDB Using the Oplog at Stripe (20)

Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at StripeBuilding Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
MongoDB
 
Designing Opeation Oriented Web Applications / YAPC::Asia Tokyo 2011
Designing Opeation Oriented Web Applications / YAPC::Asia Tokyo 2011Designing Opeation Oriented Web Applications / YAPC::Asia Tokyo 2011
Designing Opeation Oriented Web Applications / YAPC::Asia Tokyo 2011
Masahiro Nagano
 
Functional Programming with Groovy
Functional Programming with GroovyFunctional Programming with Groovy
Functional Programming with Groovy
Arturo Herrero
 
Hacking ansible
Hacking ansibleHacking ansible
Hacking ansible
bcoca
 
CoffeeScript
CoffeeScriptCoffeeScript
CoffeeScript
Scott Leberknight
 
Rails-like JavaScript Using CoffeeScript, Backbone.js and Jasmine
Rails-like JavaScript Using CoffeeScript, Backbone.js and JasmineRails-like JavaScript Using CoffeeScript, Backbone.js and Jasmine
Rails-like JavaScript Using CoffeeScript, Backbone.js and Jasmine
Raimonds Simanovskis
 
Elm: give it a try
Elm: give it a tryElm: give it a try
Elm: give it a try
Eugene Zharkov
 
Functional Pe(a)rls version 2
Functional Pe(a)rls version 2Functional Pe(a)rls version 2
Functional Pe(a)rls version 2
osfameron
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
MongoSF
 
Mongoskin - Guilin
Mongoskin - GuilinMongoskin - Guilin
Mongoskin - Guilin
Jackson Tian
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
 
ZIO: Powerful and Principled Functional Programming in Scala
ZIO: Powerful and Principled Functional Programming in ScalaZIO: Powerful and Principled Functional Programming in Scala
ZIO: Powerful and Principled Functional Programming in Scala
Wiem Zine Elabidine
 
Introduction to Groovy
Introduction to GroovyIntroduction to Groovy
Introduction to Groovy
André Faria Gomes
 
Pre-Bootcamp introduction to Elixir
Pre-Bootcamp introduction to ElixirPre-Bootcamp introduction to Elixir
Pre-Bootcamp introduction to Elixir
Paweł Dawczak
 
Damn Fine CoffeeScript
Damn Fine CoffeeScriptDamn Fine CoffeeScript
Damn Fine CoffeeScript
niklal
 
Code Generation in PHP - PHPConf 2015
Code Generation in PHP - PHPConf 2015Code Generation in PHP - PHPConf 2015
Code Generation in PHP - PHPConf 2015
Lin Yo-An
 
Composition in JavaScript
Composition in JavaScriptComposition in JavaScript
Composition in JavaScript
Josh Mock
 
CoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love AffairCoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love Affair
Mark
 
Advanced php testing in action
Advanced php testing in actionAdvanced php testing in action
Advanced php testing in action
Jace Ju
 
From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)
Night Sailer
 
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at StripeBuilding Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
MongoDB
 
Designing Opeation Oriented Web Applications / YAPC::Asia Tokyo 2011
Designing Opeation Oriented Web Applications / YAPC::Asia Tokyo 2011Designing Opeation Oriented Web Applications / YAPC::Asia Tokyo 2011
Designing Opeation Oriented Web Applications / YAPC::Asia Tokyo 2011
Masahiro Nagano
 
Functional Programming with Groovy
Functional Programming with GroovyFunctional Programming with Groovy
Functional Programming with Groovy
Arturo Herrero
 
Hacking ansible
Hacking ansibleHacking ansible
Hacking ansible
bcoca
 
Rails-like JavaScript Using CoffeeScript, Backbone.js and Jasmine
Rails-like JavaScript Using CoffeeScript, Backbone.js and JasmineRails-like JavaScript Using CoffeeScript, Backbone.js and Jasmine
Rails-like JavaScript Using CoffeeScript, Backbone.js and Jasmine
Raimonds Simanovskis
 
Functional Pe(a)rls version 2
Functional Pe(a)rls version 2Functional Pe(a)rls version 2
Functional Pe(a)rls version 2
osfameron
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
MongoSF
 
Mongoskin - Guilin
Mongoskin - GuilinMongoskin - Guilin
Mongoskin - Guilin
Jackson Tian
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
 
ZIO: Powerful and Principled Functional Programming in Scala
ZIO: Powerful and Principled Functional Programming in ScalaZIO: Powerful and Principled Functional Programming in Scala
ZIO: Powerful and Principled Functional Programming in Scala
Wiem Zine Elabidine
 
Pre-Bootcamp introduction to Elixir
Pre-Bootcamp introduction to ElixirPre-Bootcamp introduction to Elixir
Pre-Bootcamp introduction to Elixir
Paweł Dawczak
 
Damn Fine CoffeeScript
Damn Fine CoffeeScriptDamn Fine CoffeeScript
Damn Fine CoffeeScript
niklal
 
Code Generation in PHP - PHPConf 2015
Code Generation in PHP - PHPConf 2015Code Generation in PHP - PHPConf 2015
Code Generation in PHP - PHPConf 2015
Lin Yo-An
 
Composition in JavaScript
Composition in JavaScriptComposition in JavaScript
Composition in JavaScript
Josh Mock
 
CoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love AffairCoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love Affair
Mark
 
Advanced php testing in action
Advanced php testing in actionAdvanced php testing in action
Advanced php testing in action
Jace Ju
 
From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)
Night Sailer
 
Ad

Recently uploaded (20)

A Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business StageA Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
SynapseIndia
 
Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??
Web Designer
 
Solar-wind hybrid engery a system sustainable power
Solar-wind  hybrid engery a system sustainable powerSolar-wind  hybrid engery a system sustainable power
Solar-wind hybrid engery a system sustainable power
bhoomigowda12345
 
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World ExamplesMastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
jamescantor38
 
wAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptxwAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptx
SimonedeGijt
 
AEM User Group DACH - 2025 Inaugural Meeting
AEM User Group DACH - 2025 Inaugural MeetingAEM User Group DACH - 2025 Inaugural Meeting
AEM User Group DACH - 2025 Inaugural Meeting
jennaf3
 
What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?
HireME
 
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdfTop Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
evrigsolution
 
Medical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk ScoringMedical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk Scoring
ICS
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 
Autodesk Inventor Crack (2025) Latest
Autodesk Inventor    Crack (2025) LatestAutodesk Inventor    Crack (2025) Latest
Autodesk Inventor Crack (2025) Latest
Google
 
sequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineeringsequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineering
aashrithakondapalli8
 
Adobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 linkAdobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 link
mahmadzubair09
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
Adobe Media Encoder Crack FREE Download 2025
Adobe Media Encoder  Crack FREE Download 2025Adobe Media Encoder  Crack FREE Download 2025
Adobe Media Encoder Crack FREE Download 2025
zafranwaqar90
 
Beyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraftBeyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraft
Dmitrii Ivanov
 
Why Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card ProvidersWhy Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card Providers
Tapitag
 
Unit Two - Java Architecture and OOPS
Unit Two  -   Java Architecture and OOPSUnit Two  -   Java Architecture and OOPS
Unit Two - Java Architecture and OOPS
Nabin Dhakal
 
How to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber PluginHow to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber Plugin
eGrabber
 
How I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetryHow I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetry
Cees Bos
 
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business StageA Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
SynapseIndia
 
Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??
Web Designer
 
Solar-wind hybrid engery a system sustainable power
Solar-wind  hybrid engery a system sustainable powerSolar-wind  hybrid engery a system sustainable power
Solar-wind hybrid engery a system sustainable power
bhoomigowda12345
 
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World ExamplesMastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
jamescantor38
 
wAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptxwAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptx
SimonedeGijt
 
AEM User Group DACH - 2025 Inaugural Meeting
AEM User Group DACH - 2025 Inaugural MeetingAEM User Group DACH - 2025 Inaugural Meeting
AEM User Group DACH - 2025 Inaugural Meeting
jennaf3
 
What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?
HireME
 
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdfTop Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
evrigsolution
 
Medical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk ScoringMedical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk Scoring
ICS
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 
Autodesk Inventor Crack (2025) Latest
Autodesk Inventor    Crack (2025) LatestAutodesk Inventor    Crack (2025) Latest
Autodesk Inventor Crack (2025) Latest
Google
 
sequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineeringsequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineering
aashrithakondapalli8
 
Adobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 linkAdobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 link
mahmadzubair09
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
Adobe Media Encoder Crack FREE Download 2025
Adobe Media Encoder  Crack FREE Download 2025Adobe Media Encoder  Crack FREE Download 2025
Adobe Media Encoder Crack FREE Download 2025
zafranwaqar90
 
Beyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraftBeyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraft
Dmitrii Ivanov
 
Why Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card ProvidersWhy Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card Providers
Tapitag
 
Unit Two - Java Architecture and OOPS
Unit Two  -   Java Architecture and OOPSUnit Two  -   Java Architecture and OOPS
Unit Two - Java Architecture and OOPS
Nabin Dhakal
 
How to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber PluginHow to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber Plugin
eGrabber
 
How I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetryHow I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetry
Cees Bos
 

Building Real Time Systems on MongoDB Using the Oplog at Stripe

  • 1. MongoDB and the Oplog EVAN BRODER @ebroder
  • 2. AGENDA INTRO TO THE OPLOG EXAMPLE APPLICATIONS
  • 6. THINGS I’VE DONE: - save {_id: 1, a: 2}
  • 8. THINGS I’VE DONE: - save {_id: 1, a: 2} - update {_id: 1}, {$set: {a: 3}}
  • 9. THINGS I’VE DONE: - save {_id: 1, a: 2} - update {_id: 1}, {$set: {a: 3}} - insert… - delete… - delete… - save… - update…
  • 27. WARNING THIS CODE IS NOT PRODUCTION-READY
  • 28. oplog = mongo_connection['local']['oplog.rs'] ns = 'eventdb.events' oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.each do |op| puts op['o']['_id'] end end
  • 29. oplog = mongo_connection['local']['oplog.rs'] ns = 'eventdb.events' oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.each do |op| puts op['o']['_id'] end end
  • 30. oplog = mongo_connection['local']['oplog.rs'] ns = 'eventdb.events' oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.each do |op| puts op['o']['_id'] end end
  • 31. oplog = mongo_connection['local']['oplog.rs'] ns = 'eventdb.events' oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.each do |op| puts op['o']['_id'] end end
  • 32. oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  • 33. oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  • 34. oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  • 35. oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  • 42. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}, 'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end
  • 43. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}, 'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id']
  • 44. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}, 'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  • 45. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}, 'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  • 48. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}, 'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  • 49. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  • 50. cursor.each do |op| puts op['o']['_id'] end
  • 51. cursor.each do |op| case op['op'] when 'i' puts op['o']['_id'] else # ¯_(ツ)_/¯ end end
  • 52. cursor.each do |op| case op['op'] when 'i' query = "INSERT INTO #{op['ns']} (" + op['o'].keys.join(', ') + ') VALUES (' + op['o'].values.map(&:inspect).join(', ') + ')' else # ¯_(ツ)_/¯ end end
  • 56. cursor.each do |op| case op['op'] when 'i' query = "INSERT INTO #{op['ns']} (" + op['o'].keys.join(', ') + ') VALUES (' + op['o'].values.map(&:inspect).join(', ') + ')' when 'd' query = "DELETE FROM #{op['ns']} WHERE _id=" + op['o']['_id'].inspect else # ¯_(ツ)_/¯ end end
  • 60. query = "DELETE FROM #{op['ns']} WHERE _id=" + op['o']['_id'].inspect when 'u' query = "UPDATE #{op['ns']} SET" updates = op['o']['$set'] ? op['o']['$set'] : op['o'] updates.each do |k, v| query += " #{k}=#{v.inspect}" end query += " WHERE _id=" query += op['o2']['_id'].inspect else # ¯_(ツ)_/¯ end end
  • 61. cursor.each do |op| case op['op'] when 'i' query = "INSERT INTO #{op['ns']} (" + op['o'].keys.join(', ') + ') VALUES (' + op['o'].values.map(&:inspect).join(', ') + ')' when 'd' query = "DELETE FROM #{op['ns']} WHERE _id=" + op['o']['_id'].inspect when 'u' query = "UPDATE #{op['ns']} SET" updates = op['o']['$set'] ? op['o']['$set'] : op['o'] updates.each do |k, v| query += " #{k}=#{v.inspect}" end query += " WHERE _id=" + op['o2']['_id'].inspect else # ¯_(ツ)_/¯ end end
  • 64. cursor.each do |op| case op['op'] when 'i' query = "INSERT INTO #{op['ns']} (" + op['o'].keys.join(', ') + ') VALUES (' + op['o'].values.map(&:inspect).join(', ') + ')' when 'd' query = "DELETE FROM #{op['ns']} WHERE _id=" + op['o']['_id'].inspect when 'u' query = "UPDATE #{op['ns']} SET" updates = op['o']['$set'] ? op['o']['$set'] : op['o'] updates.each do |k, v| query += " #{k}=#{v.inspect}" end query += " WHERE _id=" + op['o2']['_id'].inspect else # ¯_(ツ)_/¯ end end
  • 66. task = collection.find_one({'finished' => nil} # do something with task… collection.update({'_id' => task.id}, {'$set' => {'finished' => Time.now.to_i}})
  • 67. loop do collection.remove( {'finished' => {'$lt' => Time.now.to_i - 30}}) sleep(10) end
  • 68. evan@caron:~$ mongo MongoDB shell version: 2.4.10 connecting to: test normal:PRIMARY> null < (Date.now() / 1000) - 30 true
  • 72. > db.getReplicationInfo() { "logSizeMB" : 48964.3541015625, "usedMB" : 46116.4, "timeDiff" : 316550, "timeDiffHours" : 87.93, "tFirst" : "Thu Apr 11 2013 07:24:29 GMT+0000 (UTC)", "tLast" : "Sun Apr 14 2013 23:20:19 GMT+0000 (UTC)", "now" : "Sat May 24 2014 07:52:35 GMT+0000 (UTC)" }
  • 73. > db.getReplicationInfo() { "logSizeMB" : 48964.3541015625, "usedMB" : 46116.4, "timeDiff" : 316550, "timeDiffHours" : 87.93, "tFirst" : "Thu Apr 11 2013 07:24:29 GMT+0000 (UTC)", "tLast" : "Sun Apr 14 2013 23:20:19 GMT+0000 (UTC)", "now" : "Sat May 24 2014 07:52:35 GMT+0000 (UTC)" }
  • 74. new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == nil # found one! # save old_task to a file, and we'll re-queue it later end end old_connection['admin'].command({'applyOps' => [op]}) end end
  • 75. new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == false # found one! # save old_task to a file, and we'll re-queue it later end end old_connection['admin'].command({'applyOps' => [op]}) end end
  • 76. new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == false # found one! # save old_task to a file, and we'll re-queue it later end end old_connection['admin'].command({'applyOps' => [op]}) end end

Editor's Notes

  • #2: Hi folks Today I’m going to talk about the MongoDB oplog, and how you can use the oplog as a tool for application development
  • #3: This talk can basically be broken down into 2 parts First, I’m going to explain what the oplog is. We’ll bring everyone up to speed on the basics so that hopefully the rest of the talk will be accessible, even to people who have never looked at the oplog before. Then, I’ve got 3 different oplog-based applications that we’ve built and used in production at Stripe. We’ll recreate them here, and show the basic code structure that makes them tick. My goal is that by the end, you’ll learn enough of the basics that you can take this knowledge and build your own oplog-based applications.
  • #4: So first, let’s do a quick crash course on what the oplog is, and how MongoDB uses it.
  • #5: Because you’re replicating your production data, (...you are replicating your production data, right? Great) then you all know, with MongoDB replication, applications write changes to a node in the replset designated primary and the primary replicates those changes to the secondaries. If the primary fails, one of the secondaries takes over as the new primaries, and all is right with the world. You know that this happens, but you may not know how it happens. The answer is that it happens via the oplog.
  • #6: When the MongoDB primary receives a write from the application, it first updates its own copy of the data...
  • #7: ...but then adds a record of that write to its operations log, or oplog. You can think of the oplog as a “list of things I’ve done”. That’s not entirely accurate, but it’s a good starting point.
  • #8: As the application continues to write data and make changes...
  • #9: ...those operations are added to the oplog...
  • #10: ...until eventually there’s a long history of changes to your data. How does this help with replication?
  • #11: As it turns out, a log of “things I’ve done to my data” is basically a TODO list for the secondary. Secondaries look at that list, find things that they haven’t done, and, well, do them.
  • #12: Really quickly, let’s cover a couple details that will become important later: - The oplog is “row-based” replication, so multi-inserts, -updates, and -deletes become multiple oplog entries. - The oplog represents actual changes to data, not the operation requested, so things like findAndModify will get drastically simplified. We’ll see more of that later But that is, at a high level, how the oplog fits into replication. But it’s still not very useful to you as an application developer, so let’s dig a bit deeper into the format of the oplog.
  • #13: Here’s what an oplog looks like. This is from a replset that I had just created, so it’s pretty short - only 3 entries
  • #14: Let’s zoom in on the last few operations. If you look, you might recognize some of the information. For instance, the first operation is...
  • #15: ...an insert...
  • #16: ...into the “test” collection of the “test” namespace...
  • #17: ...of an object with “_id” of 1 and the “a” property set to 2
  • #18: That is a self-contained instruction that a secondary could follow.
  • #19: But the cool thing about the oplog is that I didn’t have to fight with any internals of MongoDB to pull this up. Unlike, say, MySQL, where you need direct disk access to access the binlog...
  • #20: ...the oplog is just a (mostly) normal collection It’s accessible to any client, local or remote, using the normal MongoDB drivers So when I describe the oplog entry as a self-contained instruction that a secondary could follow, it’s really a self-contained instruction that *we* could follow. And that’s basically what we’re going to do for the rest of this talk: building applications that take the instructions in the oplog and follow them, just maybe not in the way it was originally intended. So let’s build some apps.
  • #21: For the first example, we’re going to build a simple trigger system. A trigger system is probably the simplest thing you can build using the oplog If you haven’t used triggers, they are basically a piece of code executed in response to certain DB events They’re useful for decoupling code which changes data and code that wants to respond to those changes. A lot of SQL servers support triggers, but because they’re hosted on the SQL server, the scalability story isn’t great, and sometimes triggers are synchronous, slowing down other operations. MongoDB doesn’t have built-in triggers, but we can use the oplog to build our own. Because they’re standalone applications and not built-in to the server, the scalability story is a lot better. What kind of trigger system do we want to build?
  • #22: Well, I modeled this example after a problem we have at Stripe. We model a lot of our batch processing using a queue consumer model with queue consumers that are asynchronous but close-to-realtime We record events for a lot of actions within Stripe Any time someone logs in, we create an event, and write it to MongoDB. Any time someone makes a payment, we create an event, and write it to MongoDB. In response to those events, we do things that don’t need to complete synchronously with the payment, or operation For instance, we asynchronously...
  • #23: ...update the graphs and totals on the user’s dashboard,...
  • #24: ...or send emails to the customer that just purchased something. Plus a host of other actions, including some internal applications which won’t mean much to you. We call this event system...
  • #25: ...monster And it’s proven very popular at Stripe. Queues are, in general, a great model. They’re easy for developers to think about. As generalized abstractions go, they’re comparatively easy to manage and scale. We have over 250 different types of events representing all sorts of automated and human-generated activity. And over 150 consumers look at them. We built monster using the oplog. Now, monster is not a simple system, and we don’t have enough time to build a full pub-sub system. So instead, I’m going to focus on the core oplog-driven system in monster.
  • #26: So here’s the core problem: monster needs to detect when new events are inserted We could poll every few seconds, but that introduces an almost guaranteed delay (and it wouldn’t make for a good talk about the oplog) So instead, we’ll use the oplog, which allows us to react more or less immediately. Specifically, we’ll build the simplest thing I could come up with: a script which watches the oplog for insertions and prints the _id of the resulting record The application is simple, but it lets us cover the basics of building against the oplog.
  • #27: As a reminder before we start, here’s the kind of oplog record we’re going to be interacting with. Conceptually, we want records with “op” set to “i”, “ns” set to our collection name and then we’ll look at the “o” field and extract the “_id”
  • #28: Before I show you any code, want to be *very* clear that none of this code should be run as-is Some of the code I show today will have extremely dangerous security vulnerabilities or it might just take your system down All of the code is for pedagogy, not production.
  • #29: Ok, with that out of the way, let’s look at some code. Remember, we want to find all inserts into a particular collection. This is how we’d do it assuming the oplog was just a normal collection I’ve written the code in Ruby, using the raw mongodb-ruby-driver, but everything I do should be supported by all the official MongoDB drivers. Hopefully this looks familiar. All we’re doing is...
  • #30: ...making a query...
  • #31: ...getting a cursor back for that query, and iterating over the results.
  • #32: Because the oplog is just a normal collection, this code will work as written. But by accessing the oplog as a normal collection, you’ll only see things that have already happened Once you get to the end of the oplog (i.e. the present time), the cursor returns. That’s not actually what we want, so we’ll have to invoke a few unusual MongoDB features...
  • #33: Specifically, we’re going to need to use query options. Query options are flags you set on a cursor before you start pulling results which modify the behavior of the query
  • #34: Our first flag is the “tailable” flag, which only works with capped collections. This tells MongoDB to keep the query cursor open and hold its place in the cursor, even if it thinks there are no more results. Later on, you can attempt to pull more results from the cursor, and if any new results have come in, you’ll get them. In general, cursor iterators will break when there are no more results, even if the cursor is left open So we have to wrap looping over the cursor in another loop.
  • #35: The next flag is the “await data” flag This tells MongoDB that, before returning, if you think there are no results, wait a few seconds just in case any show up. This means that instead of super aggressively polling your mongod (and likely spiking CPU utilization on both the server and client) you’ll get proactively notified when new data shows up.
  • #36: With those options, we have a cursor that stays open indefinitely, and continues to push notifications of insertions as they happen. But there’s still one problem: where do we start? None of these options has fundamentally changed how queries work So we’ll still start at the beginning of time, every time When what we actually want, is to never show historical data, and only show events that happen after our script starts
  • #37: In order to do that, we’re going to have to explore some fields in the oplog we glossed over earlier
  • #38: Specifically, the “ts”, or timestamp, field The timestamp uses BSON’s, well, timestamp type I you’ve ever read the docs about types in MongoDB, you might remember the timestamp type from its note that timestamps are for internal MongoDB use only Fortunately, we’re mucking around with MongoDB internals When writing the oplog, the MongoDB primary assigns a timestamp to each operation These timestamps are constructed to be monotonically increasing, so that entries in the oplog are in timestamp order. The timestamp type is broken into two pieces...
  • #39: ...a UNIX timestamp, which ticks up once every second...
  • #40: ...and in “ordinal”, which resets to 1 at the beginning of the second, and is incremented for every operation within that second.
  • #41: This results in a timestamp that (a) is monotonically increasing, and (b) uniquely identifies a specific op
  • #42: Because it is, effectively, both an ordering and a unique identifier,
  • #43: ...we can use the timestamp as a placeholder What I’ve done here is find the last entry in the oplog, grab the timestamp from that entry, and then only show oplog entries greater than that timestamp. In general, instead of starting at the end of the oplog, you might want to have your program store the timestamp for the last oplog entry it looked at, and start from there next time. But unfortunately, this code won’t work very well on a database with a long oplog - it’ll just hang Because the oplog has no indexes, it has to do a full collection scan to find records that match our “$gt” query In order to fix this, we need to add our last magic query option...
  • #44: The “oplog replay” flag is something of a special case for, well, oplog replays It tells the query planner to assume that entries are in oplog order, and instead of scanning the whole collection, it starts at the end and scans backwards until it finds the beginning of the range you were looking for It makes the oplog scan much more efficient (especially since oplog tailers generally start from near the end)
  • #45: Once we put everything together, we have a whole program that can tail the oplog and print when new records are inserted. Again, this code isn’t really production ready. There are a bunch of things you’d want to do - You might want to handle failures - You might want to store progress so that you could resume rather than always skipping to the end. But this is really about what all oplog-driven applications look like. The only thing that really changes from app to app is...
  • #46: ...this bit - the actual action you take in response to an oplog entry. The implementation here is about as simple as I could imagine. I think the only thing simpler would be to do nothing. But what if we allow ourselves some more complexity? Can we solve more interesting problems? Can I ask more leading questions?
  • #47: As Stripe has grown, we’ve found that there are certain usage patterns that MongoDB is less well-suited for Bulk analytics that you want to express with joins and aggregation is the most obvious use case. While a lot of our MongoDB-driven applications depend on the reliability and uptime of replica sets and failovers, analytics applications don’t usually have the same requirements. We found ourselves saying, “what if we wanted all of our data from MongoDB, but in PostgreSQL instead” Or HBase Or ElasticSearch In general, as long as you can tolerate some slight lag, having multiple copies of your data in different datastores lets you trade off the advantages and disadvantages depending on the needs of your application
  • #48: So let’s say we want to get our entire MongoDB database into PostgreSQL. How would we do that? We could do an ETL process with nightly imports. But you already know that’s not what this talk is about. Instead, we’re going to follow the oplog, and every time there’s a change in MongoDB, we’ll make an equivalent change in Postgres.
  • #49: We’ll start with the trigger code from the last example, but we’ll clean up some of the stuff that’s not needed here...
  • #50: ...like the constraints on collection or op type. And we are going to need some more space, so I’m going to go ahead and clear a lot of this code out of the way
  • #51: In every iteration of this loop, we get passed some change to the MongoDB database. We want to make a corresponding change to our Postgres database. Let’s look at inserts first, since those are simple...
  • #52: When we get an insert in MongoDB…
  • #53: ...we do an insert in Postgres It might look like this There are a bunch of assumptions here: - All data types are scalars (there are no embedded objects or arrays) - Your schema is consistent (all records have the same keys, and all values for a key are of the same type) - Someone figured out that schema and pre-created the Postgres table. All of these are simplifying assumptions, but they’re not required. This code is also grossly negligent about escaping. Like, it doesn’t do any Please don’t write code that doesn’t do any escaping. And don’t use this code - there’s better code coming up soon. In any case, we can handle inserts. What do other event types look like?
  • #54: Here’s a new oplog type we’ve never seen before.
  • #55: It’s a delete op, as evidenced by the letter “d”
  • #56: Unlike insert ops, which included the full record being inserted, delete ops include the _id of the object to delete. Now, you might think this looks like a query and in some sense it is But MongoDB always uses the _id field in updates and deletes, so even if you express a delete with a complex query, MongoDB will record it in the oplog by the _id of the record that it finds. This is important for idempotency - a complex query might not always match the same record if applied multiple times But an _id query will.
  • #57: Using that knowledge, we can write the logic to handle deletions. This means that inserts and deletes will be propagated from our MongoDB database to our Postgres database. That’s 2 of the 3 core data operations in the oplog.
  • #58: Which finally brings us to updates. They’re definitely the most complicated of the 3 basic operations, mostly because they have the most variation. First off, update ops have a field we haven’t seen before...
  • #59: ...the “o2” field, or the “selector” This determines which record to update. Like deletes, it always finds the record to modify by _id for idempotency.
  • #60: The “o” field, which previously contained the record to insert, or the query to delete now contains the updates to apply. It looks like something you could pass to the 2nd argument of MongoDB’s update function. If you’re saving a whole object, the object shows up in the “o” field. If you’re instead doing partial updates with $set or $unset, you’ll see those fields in the “o” property. If you use $inc, $push, $pop, or any other special operations, those will *also* show up as $set and $unset Again, that’s to ensure the oplog is idempotent - applying the operation multiple times should have the same affect. This turns out to make our lives a lot easier as application developers, too.
  • #61: Here’s a quick sketch of what we might do with an update op in order to apply it to our Postgres database Again, this cuts some pretty significant corners, but hopefully it gets you the broad strokes. Fundamentally, it shows how you can take an MongoDB oplog entry and transform it into an operation on a different datastore.
  • #62: Unfortunately, I have to shrink the font more to fit everything on here So again, I’m sorry if you’re at the back and this is hard to read. But still - this is 20 lines of code, and with it we have something that can (at least in theory) apply changes from a MongoDB database to a Postgres database in real time. Now, if this is actually something you want to do...
  • #63: ...you should check out MoSQL, which we open-sourced last year. It’s all of the logic I’ve described in the last few slides plus, well, correctness plus robustness plus some configuration hooks plus support for non-trivial schemas But, again, the cool thing here isn’t just that you can take MongoDB and replicate it into Postgres It’s that you can take MongoDB and replicate it into *anything* Say, for instance...
  • #64: HBase. We’ve also open-sourced a project called zerowing (so-called because “all your HBase are belong to us” - I’m sorry, it’s not my fault) Zerowing uses the oplog to mirror MongoDB into HBase. We also do the same thing with ElasticSearch, and even though we haven’t open-sourced any of that code, the concepts are all identical to what we’ve been doing.
  • #65: The idea that you can transform your data ...that you can inject your own logic into the stream of changes to your database is, to me, really powerful. And to show how powerful it can be, I want to tell one last story about a time at Stripe where I used the oplog for...
  • #66: ...recovering from a bit of an operational disaster. This involved the monster system I talked about earlier. Like I mentioned, monster as we use it is a bit more complicated than the simple trigger system we designed.
  • #67: There was a component that looked something like this. We had a collection with a list of tasks, and workers which would grab unfinished tasks (tasks with the “finished” field unset) and mark them as finished when they were done. For various reasons, we wanted to keep those task records around for a bit, but clean them up eventually. We had a sweeper that would periodically delete sufficiently old records
  • #68: I rolled out a small change to that cleanup sweeper that looked something like this. It seems simple enough - every 10 seconds, find and delete all tasks that finished more than 30 seconds ago. Unfortunately, this code has a bug, and it took me 10 hours to spot. Do you see it?
  • #69: As it turns out, in MongoDB’s sorting order, “null” is less than all integers...including the integer timestamp of “30 seconds ago”
  • #70: So unfortunately, that sweeper went on a somewhat Cybermen-esque deletion spree deleting *all* tasks, not just those that had been completed. Now, as I mentioned, we use monster for a huge number of our batch processes.
  • #71: Which meant that this bug made for...not a great day. Fortunately, though, we had hourly backups of our database going back for a week. How is that helpful? Well, you mostly just need to think about replication a little differently.
  • #72: Instead of thinking about primaries and secondaries, let’s think about new backups and old backups The new backup has a bunch of changes that the old backup doesn’t The new backup is like a primary, while the older backup is like a secondary. In particular, the older backup doesn’t have the deletions. So if we could “replicate” from the new backup to the old backup, but, before applying any deletions, see whether we actually *wanted* to delete the record we can find records that shouldn’t have been deleted.
  • #73: Now, in order for this to work, we have to have the entire opload for the outage By default, MongoDB allocates 5% of your disk space to the oplog. In our experience, that’s a lot of oplog (although every once and a while we’ll bump it) I grabbed a snapshot from that time period to see how much time the oplog covered. Here’s the bit that matters:
  • #74: A single backup has about 4 days of oplog history But we caught the bug in about 10 hours. That’s more than enough time!
  • #75: Here’s a slightly simplified version of the code I used to clean up. We’re going to scan through events in the newer oplog (starting a little before the last oplog entry in the older snapshot), and for each oplog entry, if it’s a delete from the collection we care about, we load the record from the *old* database (which hasn’t “caught up” to the deletion yet) and figure out whether that record was deleted incorrectly. If it was, we save it somewhere Either way, in order to keep the old database consistent, you then need to apply that change (even if it was wrong) to the old database To do that, we’re going to introduce one new command...
  • #76: ...the “applyOps” admin command It does basically what it says on the tin: you give it a list of oplog entries, and it applies them, in order, to its local copy of the database.
  • #77: So, again, this took remarkably little code - about 14 lines But combining that with some good backups, we were able to recover from what would have been a huge disaster in about a weekend
  • #78: In summary, the oplog was built to support replication But because of the way it’s exposed to users, it’s not just limited to that At Stripe, we’ve used the oplog in many of our internal applications, including...
  • #79: ...trigger systems...
  • #80: ...data transformations and replication into alternative data stores...
  • #81: ...and even data recovery.
  • #82: I put this talk together because I think that the oplog is underappreciated I hope that some of the examples will give other people ideas for new and exciting things to do with the oplog. If I can share any other information, or if you have applications for the oplog, let me know! I always like geeking out over this stuff.
  • #83: But for now, can I answer any questions?
  翻译: