Observations for Working with NoSQL Technologies

Observations for Working with NoSQL Technologies

All that glitters is not gold…  ‘William Shakespeare’

Coming from a background where relational (RDBMS) systems were a given, and low level raw disk storage databases were a part of CS curriculum, my first exposure to non relational stores was somewhat of a mind blowing experience. It’s like throwing out much of what you have learned about and starting over.

In this article I will share a few thoughts on non relational stores, including a few areas where 

Let’s model a Customer entity  to serve as an example for this discussion. The Customer entity is the ‘document’ that we are storing, as well as several child collections of related contact information such as phone numbers and email addresses. Additionally we will store preferences, and an order history, which has the potential to be quite large, at least it would be for me if this was how Amazon models my personal data! And finally, we will introduce a series of attributes which identify what subscription level the customer level belongs to. This will be a point where we will make a series of changes to how the data is interpreted over the lifecycle of the project.


Article content
Customer model implementation with a series of child collections modeled as child document collections

Efficient data fetching

My first hands-on exposure to MongoDB was on a side project I had signed up for. I elected to run with a Ruby on Rails and MongoDB stack. The project was a web based application used to display products and their associated community generated reviews. The rapid development development cycles seemed like a no-brainer when you are trying to get an MVP up over long nights and a few long weekends. 

The development experience was great, until we performed an data dump from our source into the development environment, and performance tanked. My bad for not working with the data set from day one. A long story short, the Document used to represent a product, it’s description, and all the associated reviews turned out to be a rather large document. Operations such as listing all products, or products based on some type of filter criteria performed the worst as the default approach towards performing these types of fetch operations would grab the entire object, whereas we typically would only need a subset in most cases. 

The correct approach would have been to indicate which actual attributes we need for the operation, reducing the amount of data returned. An alternative would also be to restructure the data so that perhaps the description and associated reviews would actually reside in their own dedicated collections.  

There are a lot of discussions about how to model data in non relational stores, whether do normalie, or denormalize. Denormalized, potentially large documents are a convenience, and normalized models tend to create some additional overhead when it comes to maintaining relations between the associated documents. There are a number of factors that can impact your design based on your particular scenario that I won’t attempt to cover here.

The important takeaway here is that you should be limiting the data retrieved from fetch operations, and not just blindly grabbing the entire document. 

A Lesson Learned from Others

While I can look back at my previous adventure and laugh, it turns out that other people seem to be learning as well. A few years later I jumped on a project where mongo was used as a persistent store. The site was actually one of the two largest community driven sites for an industry niche. I was a bit surprised when I first observed the object model, very course grained, with several collections containing what I would describe as unnecessarily large. For example, the company document collection contained all of the associated reviews as sub collections. 

Fetch operations were performed in a granular fashion, and for the React component of the application, data was simply dumped into the Redux store. Inefficient, however this provided performance down the line as a lot of data was cached on the client, which provided the overall illusion of performance. 

From this experience I had learned that a lot of initially well intended efforts to model data in a NoSQL solution can easily lead to bloated documents, and sub optimal approaching to retrieve data.

Working with changing data

It is a reasonable expectation that over time a given Document model may change. This is easy to do as there really isn’t a strict schema enforced. In contrast, when a schema is modified in an RDBMs, explicit schema changes are required, and quite possibly to refactoring of previous records. From a developer’s perspective, when attempting to grab some data for development purposes you might be tempted to select a sample set of data to work with. When doing so, make sure you are grabbing from the end of the collection in your development environment. This is where the data that is part of the new document structure is most likely to live.

Grabbing data from the oldest records first

db.collection.find().limit(n)

Accessing more recent data from the end of the collection

db.yourCollectionName.find().sort({ $natural: -1 }).limit(3)

I bit of a rookie move, however on rare occasions I even find myself making this mistake ;)

Refactoring (or not) the Document Structure

So given the original Customer model in the diagram above, let’s focus on the customer model, specifically on the information used to define the membership level for any given Customer. Let’s imagine that the model originally started with two levels, ‘public’ and ‘member’, with the ladder implemented as a paid subscription, with some type of discount incentive. 

Now let’s say over time the requirements change and a new condition is added to 

And at the application level, we now establish logic to enable features based on the customer’s subscription level. So far so good. 

Now let’s fast forward a few years, developers come and go, a few changes have been introduced that impact subscription information as follows.

  • A new flag was introduced to determine if the use has access to enter promotion codes
  • A new business tier was introduced, with access to additional features
  • Finally an enterprise tier was introduced. Upon the initial offering, all business tier customers are grandfathered into the enterprise tier at the same price point, however their membership level is still defined as the ‘business’ tier, although they have access to the next tier’s features.

In the above example, data related to membership levels, and the corresponding logic was changed in such a manner that minimal refactoring was required, and why not? This is easy to implement in a non schema based store. 

The problem here is what I refer to as attribute scope creep, which I define as when a structure essentially degragates over  time with a series of changes. The original logic to determine if  a user has access to a certain feature looked something like the following

IF USER HAS MEMBERSHIP LEVEL SUBSCRIPTION
[ENABLE FEATURE]
END IF        

But since the changes were applied in such a manner where the was not universally updated for older records, we simply bolt on more attributes to support the end desired behavior, and the logic to conditionally enable a feature grows into something like the following

IF USER HAS A ‘BASIC MEMBER’ OR ‘BUSINESS MEMBER’ OR ‘ENTERPRISE LEVEL’ MEMBERSHIP
	IF (MEMBERSHIP LEVEL == ‘BUSINESS’ AND THE EXISTING_CUST FLAG IS ‘TRUE’) OR (MEMBERSHIP LEVEL == ‘ENTERPRISE) THEN
		[ENABLE FEATURE]
     END 
END
        

We can see the complexity begin to quickly grow out of hand, where the nature of the flags and membership levels become ambiguous. Attribute scope creep in action! I typically observe this type of modeling in older systems (AS400) or implemented where that type of hacking was considered a standard. 

The above example is actually based on a past experience on a project. Changes to logic were applied, records were updated on a moving forward basis, but not retroactively. In addition to conditional checks, null checks needed to be applied. And to make matters worse, there actually about 3 additional flags that were applied over time, all of which were a knee jerk reaction to correcting the previous technical debt. It is easy to imagine how quickly this can get out of hand. Add in the fact that the above logic was distributed across the application, and not centralized, and you have a hot mess on your hands.

What raises a flag with me personally is when nobody can consistently explain the desired logic from memory, or when you need to find an example of the source to validate how it is implemented, only to find out the logic is duplicated in order parts of the application and implemented slightly differently for whatever reason.

A recommended approach

When modifying an implied schema for a given document collection, spend some cycles reviewing the long term impact of the changes. It might be entirely appropriate to modify the structure, and then write a process to retroactively update older records to have a consistent approach towards modeling a given attribute. In RDBMS systems, they tend to raise the issue as schema changes impact existing records. 

Yes, this is additional work, which is probably why it was never performed over the evolution of the logic over the life of the project, however not taking this into consideration is like kicking the can down the road, and definitely contributes to technical debt..

Encapsulating the document

The barrier to entry to storing and working with Javascript and Mongo is pretty darn low! You define objects in JSON, consume them as such, and there is not a lot of work involved. Should you find yourself working in a Typescript of Java based world, additional work is required to strongly type the document. 

In the above example we discussed storing information related to membership levels in several attributes. We can see that there is complexity in the representation. When working with the data, at the application level you will be really performing operations such as the following.

  • Determining if the user is a member
  • Determine if the user has a trial subscription, and if the trial has expired
  • Determine if the user has access to a feature only available for top tier enterprise members, and those whose accounts have been grandfathered in.

These operations translate well into business requirements. As a developer, you really want to interact with the data in such as way you can ask these questions at the coding level, and have that logic encapsulated within the object. Additionally you may want to implement a few convenience features. 

Wrapping in the data as an object is an excellent option, which provides the ability to expose convenience methods for other developers to execute logic at a more abstract level. For example, determining if a customer is a member can be determined by establishing a method such as ‘isCustomer()’. Clear, concise, and an approach which eliminates the need to worry about the low level state of the data that drives the decision.

Summary

  •  Specify concise criteria when querying data, limiting 
  • Refactor your object models as appropriate when new usage patterns emerge.
  • Consider historically updating attributes should the schema change in order to avoid scenarios where the client needs to perform additional work to interpret the data.
  • Wrapper potentially complex operations depending on the state of certain flags, and translate them into more concise business level checks by wrapping the document.

Working with NoSQL based persistent stores is a great solution for certain scenarios. I personally feel that this is often not the right approach for a project, and the technology is misused, especially when you escape the requirements of working with a store that requires a schema definition, and then up with document soup, objects that morph so much that the inconsistency makes interpreting the data more difficult than it should be.

Check out more on my blog here:

https://www.matthewdalby.dev

#Software engineering #MongoDB #NoSQL


With over two decades of experience in software engineering, Matthew Dalby is a seasoned professional who has consistently leveraged cutting-edge technology to solve complex problems and deliver impactful solutions. Matthew is always seeking new opportunities to innovate and collaborate on forward-thinking projects. As of recently, he has placed focus on sharing his personal experience and knowledge with the larger community.


To view or add a comment, sign in

More articles by Matthew Dalby

Insights from the community

Explore topics