MONGO DB

MONGO DB

MongoDB is a source-availablecross-platformdocument-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public License (SSPL).

Main features

Ad-hoc queries

MongoDB supports field, range query, and regular-expression searches.[28] Queries can return specific fields of documents and also include user-defined JavaScript functions. Queries can also be configured to return a random sample of results of a given size.

Indexing

Fields in a MongoDB document can be indexed with primary and secondary indices or index.

Replication

MongoDB provides high availability with replica sets.[29] A replica set consists of two or more copies of the data. Each replica-set member may act in the role of primary or secondary replica at any time. All writes and reads are done on the primary replica by default. Secondary replicas maintain a copy of the data of the primary using built-in replication. When a primary replica fails, the replica set automatically conducts an election process to determine which secondary should become the primary. Secondaries can optionally serve read operations, but that data is only eventually consistent by default.

If the replicated MongoDB deployment only has a single secondary member, a separate daemon called an arbiter must be added to the set. It has a single responsibility, which is to resolve the election of the new primary.[30] As a consequence, an idealized distributed MongoDB deployment requires at least three separate servers, even in the case of just one primary and one secondary.[30]

Load balancing

MongoDB scales horizontally using sharding.[31] The user chooses a shard key, which determines how the data in a collection will be distributed. The data is split into ranges (based on the shard key) and distributed across multiple shards. (A shard is a master with one or more replicas.). Alternatively, the shard key can be hashed to map to a shard – enabling an even data distribution.

MongoDB can run over multiple servers, balancing the load or duplicating data to keep the system up and running in case of hardware failure.

File storage

MongoDB can be used as a file system, called GridFS, with load balancing and data replication features over multiple machines for storing files.

This function, called grid file system,[32] is included with MongoDB drivers. MongoDB exposes functions for file manipulation and content to developers. GridFS can be accessed using mongofiles utility or plugins for Nginx[33] and lighttpd.[34] GridFS divides a file into parts, or chunks, and stores each of those chunks as a separate document.[35]

Aggregation

MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single-purpose aggregation methods.[36]

Map-reduce can be used for batch processing of data and aggregation operations. But according to MongoDB's documentation, the Aggregation Pipeline provides better performance for most aggregation operations.[37]

The aggregation framework enables users to obtain the kind of results for which the SQL GROUP BY clause is used. Aggregation operators can be strung together to form a pipeline – analogous to Unix pipes. The aggregation framework includes the $lookup operator which can join documents from multiple collections, as well as statistical operators such as standard deviation.

Server-side JavaScript execution

JavaScript can be used in queries, aggregation functions (such as MapReduce), and sent directly to the database to be executed.

Capped collections

MongoDB supports fixed-size collections called capped collections. This type of collection maintains insertion order and, once the specified size has been reached, behaves like a circular queue.

Transactions

MongoDB claims to support multi-document ACID transactions since the 4.0 release in June 2018.[38] This claim was found to not be true as MongoDB violates snapshot isolation.[39]

Editions

MongoDB Community Server

The MongoDB Community Edition is free and available for Windows, Linux, and OS X.[40]

MongoDB Enterprise Server

MongoDB Enterprise Server is the commercial edition of MongoDB, available as part of the MongoDB Enterprise Advanced subscription.[41]

MongoDB Atlas

MongoDB is also available as an on-demand fully managed service. MongoDB Atlas runs on AWS, Microsoft Azure, and Google Cloud Platform.[42]

Architecture

Programming language accessibility

MongoDB has official drivers for major programming languages and development environments.[43] There are also a large number of unofficial or community-supported drivers for other programming languages and frameworks.

Serverless access

Management and graphical front-ends


Record insertion in MongoDB with Robomongo 0.8.5

The primary interface to the database has been the mongo shell. Since MongoDB 3.2, MongoDB Compass is introduced as the native GUI. There are products and third-party projects that offer user interfaces for administration and data viewing.[44]

Licensing

MongoDB Community Server

As of October 2018, MongoDB is released under the Server Side Public License (SSPL), a license developed by the project. It replaces the GNU Affero General Public License, and is nearly identical to the GNU General Public License version 3, but requires that those making the software publicly available as part of a "service" must make the service's entire source code available under this license.[45][46] The SSPL was submitted for certification to the Open Source Initiative but later withdrawn.[47] The language drivers are available under an Apache License. In addition, MongoDB Inc. offers proprietary licenses for MongoDB. The last versions licensed as AGPL version 3 are 4.0.3 (stable) and 4.1.4.

MongoDB has been removed from the Debian, Fedora and Red Hat Enterprise Linux distributions due to the licensing change. Fedora determined that the SSPL version 1 is not a free software license because it is "intentionally crafted to be aggressively discriminatory" towards commercial users.[48][49]

Bug reports and criticisms

Security

Due to the default security configuration of MongoDB, allowing anyone to have full access to the database, data from tens of thousands of MongoDB installations has been stolen. Furthermore, many MongoDB servers have been held for ransom.[50][51]

In September 2017; updated January 2018, in an official response Davi Ottenheimer, lead Product Security at MongoDB, proclaimed that measures have been taken by MongoDB to defend against these risks. [52]

From the MongoDB 2.6 release onwards, the binaries from the official MongoDB RPM and DEB packages bind to localhost by default. From MongoDB 3.6, this default behavior was extended to all MongoDB packages across all platforms. As a result, all networked connections to the database will be denied unless explicitly configured by an administrator.[53]

Technical criticisms

In some failure scenarios where an application can access two distinct MongoDB processes, but these processes cannot access each other, it is possible for MongoDB to return stale reads. In this scenario it is also possible for MongoDB to roll back writes that have been acknowledged.[54] This issue was addressed since version 3.4.0 released in November 2016[55] (and back-ported to v3.2.12).[56]

Before version 2.2, locks were implemented on a per-server process basis. With version 2.2, locks were implemented at the database level.[57] Since version 3.0,[58] pluggable storage engines were introduced, and each storage engine may implement locks differently.[58] With MongoDB 3.0 locks are implemented at the collection level for the MMAPv1 storage engine,[59] while the WiredTiger storage engine uses an optimistic concurrency protocol that effectively provides document-level locking.[60] Even with versions prior to 3.0, one approach to increase concurrency is to use sharding.[61] In some situations, reads and writes will yield their locks. If MongoDB predicts a page is unlikely to be in memory, operations will yield their lock while the pages load. The use of lock yielding expanded greatly in 2.2.[62]

Up until version 3.3.11, MongoDB could not do collation-based sorting and was limited to byte-wise comparison via memcmp which would not provide correct ordering for many non-English languages when used with a Unicode encoding. The issue was fixed on August 23, 2016.

Prior to MongoDB 4.0, queries against an index were not atomic. Documents which were being updated while the query was running could be missed.[63] The introduction of the snapshot read concern in MongoDB 4.0 eliminated this phenomenon.[64]

Although MongoDB claims in an undated article entitled "MongoDB and Jepsen"[65] that their database passed Distributed Systems Safety Research company Jepsen's tests, which it called “the industry’s toughest data safety, correctness, and consistency Tests”, Jepsen published an article in May 2020 stating that MongoDB 3.6.4 had in fact failed their tests, and that the newer MongoDB 4.2.6 has more problems including “retrocausal transactions” where a transaction reverses order so that a read can see the result of a future write.[66][67] Jepsen noted in their report that MongoDB omitted any mention of these findings on MongoDB's "MongoDB and Jepsen" page.

To view or add a comment, sign in

More articles by Smriti Saini

  • What Is Portfolio Analytics?

    The term portfolio analytics may be interpreted and implemented in many different ways. The first order of business…

  • Annuity

    An annuity is a series of payments made at equal intervals. Examples of annuities are regular deposits to a savings…

  • What is Actuarial Modeling?

    Actuarial modeling is the name for a set of techniques used in the insurance industry. These models are composed of…

    1 Comment
  • Supervised vs. Unsupervised Learning: What’s the Difference?

    The world is getting “smarter” every day, and to keep up with consumer expectations, companies are increasingly using…

  • APACHE HIVE

    Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and…

  • Acceptance testing

    In engineering and its various subdisciplines, acceptance testing is a test conducted to determine if the requirements…

  • SAP HANA

    SAP HANA (high-performance analytic appliance) is an in-memory, column-oriented, relational database management system…

  • Machine Learning Architecture

    Introduction to Machine Learning Architecture Machine Learning architecture is defined as the subject that has evolved…

  • AZURE DEVOPS

    What is Azure DevOps? Azure DevOps is a Software as a service (SaaS) platform from Microsoft that provides an…

  • Report Building

    Elemental development means high productivity for report developers. To enable end-users to see, understand and act…

Insights from the community

Others also viewed

Explore topics