Relational and Document Data Modeling (50 years ago, 25 years ago, and 2025)
Relational data modeling or document data modeling? With different terms, this question has existed for 50 years of database technologies, and the response mirrors the wisdom of database consultants: it depends. The best data modeling method depends on how you organize your application development. When building a central repository within your enterprise, you should create a data-centric model that many current and future applications will utilize. Alternatively, you can adopt an application-centric model when you've divided your software factory into various domains and services, as in modern software architecture.
50 years ago
In the 1970s, the debate was between Codasyl (by the Data Base Task Group) and Relational (by the IBMers C.J. Date and E.F. Codd). The consensus was that a logical schema should be presented to the application. This interface must describe the logical data structure, the schema or model, and a language for manipulating data through this view. The question was whether it was a static model from the business entities or a dynamic model for application data access.
The relational model proposed by E.F. Codd serves as a mathematical representation of data which does not consider present or future applications and is independent of physical properties. This model is agnostic about how data is accessed: users query tables, not indexes, and join sets of rows instead of navigating through pointers. It is also independent of physical storage: the SQL database user is unaware of the physical layout, colocation, partitioning or ordering of data, and the language used is declarative, without specifying how to access data. The relational database system optimizes data access using metadata stored in a central catalog to describe the mapping of the logical relational model to the physical layout.
Interestingly, even if the relational model is independent of the physical implementation, the choice of representing data as independent two-dimensional tables (normalization) was partly driven by what the user was used to: output data to a small screen (you didn't drill down into nested structures by clicking a button with the mouse at that time) and input from flat files (there was no XML or JSON at that time). The Date/Codd paper The relational and Network Approaches: a comparison of the application programming languages mentions:
One other point which may be made with respect to the simplicity of the relational model is its closeness to traditional 'flat files'
At that time, we used a data-centric approach for data modeling. When I began my career in IT in 1994, this model was an ERD (or rather MERISE, since I was in France) that was designed before looking at the application functions. Later, we used UML class diagrams, but even though they were intended for objects, the data was analyzed without considering methods, navigation, and use cases. The application classes were described using a different model, and object-relational mapping was used to declare the model transformation.
25 years ago
In the early 2000s, I worked with the Oracle Database and read AskTom daily. On asktom.oracle.com, Tom Kyte answered users' questions, and this is where I learned a lot—not only about the database technology, which became the foundation of my expertise as a consultant, but also about the method of learning, understanding, and explaining that now define my current role as a developer advocate. There are two articles I can remember well enough to find quickly with just one word. One is about write-consistency (my search term is 'almost afraid to publish'), and the other is data modeling (my search word is 'myoptic'). This article focuses on the second one: Relational vs. Object-Oriented Database Design. You can replace 'Object-Oriented' with 'Document'—they are similar: a hierarchical structure of objects where nested structures are allowed (non-1NF).
Here is the quote that summarizes the debate between storing data in relational tables versus tables that can include nested tables or object-oriented data types:
The problem is more that OO tends to be application focused, but data isn't for "an application" in general, it is for "many many many applications". Applications come, applications go - the data stays forever. We need many views of the same information and OO tends to give a rather myoptic view of that data.
The debate surrounding Codasyl databases was similar: should you design your data model for applications or data? In master data management, where the enterprise information system serves as a central database, a data-centric model is necessary, one that all current and future applications can share. That was the reasoning behind the relational model. There are still many legacy applications like that.
Today (2025)
Do you build applications with a monolithic database today, like you did 25 years ago? The adage "Applications come, applications go - data stays forever" still holds, but today, data rarely exists in a single database shared by all applications.
In the discussion, Tom Kyte provided an example with a banking application. From the teller to service a customer, the following document is a perfect model:
Recommended by LinkedIn
The Object perfect for that application is a "person" object identified by name or account number. That person object has all of the attributes of a person in a bank, a name, an address, a list of accounts with all of the account attributes, a list of recent transactions, and so on
However, a relational model is justified by the other use cases working on the same database:
at the end of the day, there is someone else who has a totally different view of the data -- they need to count how many transactions happened that day. How much money is in the bank. They have a transaction oriented, not person oriented, perspective of the data.
This still holds if you build a monolithic core database that supports all your business use cases. However, when modernizing your IT, especially during the transition to the cloud, it's preferable to divide it into several decoupled services. Tracking daily transactions and the amount of money in the bank is another service that operates on a separate database designed for end-of-day analytics. Naturally, this necessitates transferring data between the operational database, which contains customer-oriented documents, and an analytical service, which is transaction-focused. You might be concerned about data gravity and the costs of moving data to another database service, but this is how modern applications function. Even with a monolithic database, I've seen many banks duplicate it at the end of the day to obtain a pre-midnight snapshot where they can execute extensive analytical queries. Another reason is to keep the operational database on-premises for data privacy, but run the analytical database in the cloud, as it can skip the customer's data.
Your application services rely on various databases. You don't need to design them with a uniform application-agnostic relational schema. And remember that technology has advanced, and moving data for all analytic requirements is unnecessary. A document database can offer multiple perspectives on data due to its powerful indexing capabilities. Recently, I've shown how the document model can handle queries with filters across numerous dimensions in MongoDB:
Enter Domain-Driven Design and Microservices, which aim to decouple applications, so multiple teams within your organization can progress independently. A shared data model is no longer practical. Each domain has its database with a distinct schema, and data sharing between services is achieved through application interfaces rather than by querying the same database.
The data model is optimized for the domain and application services' use cases. The "myopic view of that data" is now a feature designed to achieve maximum efficiency for those applications. It also enables development teams to build with their application data structures rather than mapping them to a normalized, use-case-agnostic one.
Document data modeling can apply to all databases, as most SQL databases can store objects and nested tables. However, it is crucial that the database system is optimized for those access patterns and provides indexing on both document and sub-document attributes, like with MongoDB. These database systems must also offer efficient functionalities for transferring data to other services, often asynchronous streaming, since the goal is to decouple the services.
Modern applications built per domain and services
Modern applications function well on a document data modeling approach, unless a monolithic central database is used. This document model stores data and acts as the API connecting the application to the database. Unlike SQL, which is designed for interactive use—allowing the application to send SQL statements to initiate transactions at specific isolation levels and execute operations (INSERT, SELECT, UPDATE, DELETE) sequentially across multiple relational schema tables, combined with application logic, before committing changes to ensure visibility and persistence—the document API follows a different approach, using one database call for each business transaction.
Another term to search on AskTom that returns interesting discussions regarding the API between the application and the database, is 'TAPI'. It means 'Table API' and refers to a model where CRUD (Create, Retrieve, Update, Delete) operations are performed on each table row. Tom Kyte critiques this approach and advocates for 'XAPI' (Transaction API), where a single business transaction encapsulates multiple operations (INSERT, SELECT, UPDATE, DELETE) within a stored procedure on the database server. This resembles a document database API more than SQL even if some SQL databases can expose a REST endpoint.
Developers today often prefer not to use proprietary database languages like PL/SQL to deploy business logic, and avoid stored procedures. Document data modeling provides the same atomicity as a transaction API. With a document API like MongoDB, a single call describes the transaction logic and can update a document that maps to one business transaction. The call is integrated with the application language. With an application-centric data modeling approach, sending multiple text statements, like with a set of normalized tables on SQL databases, is unnecessary.
In conclusion, database modeling has evolved over five decades to meet changing software development needs. The relational model, for centralized databases, emphasized normalization. Developers could use the database without understanding physical storage and concurrent access, but they had to learn a language and concepts distinct from their application code.
Today, microservices and domain-driven design favor decoupled, application-centric document modeling, and developers prefer more control over data access, performance, and consistency. The document model encourages them to view a database service as a single CRUD operation with all data from one business transaction rather than engaging in row-by-row and table-by-table interactive statements that require additional caching and locking to guarantee performance and consistency. Powerful indexing in document databases helps optimize multiple micro-services in a domain.
The debate centers around two models: a relational data model for a central monolithic database serving all applications, versus a document data model tailored for specific domains, development teams, and application services.
Data Engineer @ MyDigipay
3wGreat article Franck 👌