OSDU(3): To SQL or NoSQL, That is the Question
SQL vs. NoSQL: The Industry Landscape
OSDU is based on ElasticSearch, a well-known NoSQL database. Meanwhile, most industry applications still rely on traditional SQL databases like MySQL, MS SQL Server, or PostgreSQL. These SQL solutions are mature and robust, providing a very high level of security for authentication and authorization.
SQL Databases: Strength in Structure
SQL databases excel at managing structured data, where tables and relationships are clearly defined. However, recent computer vision and natural language processing progress have enabled us to represent more complex data types (such as images and text) in semi-structured formats, like tokens or JSON. As a result, the need to support semi-structured or unstructured data has increased rapidly, fueling the rise of NoSQL databases.
A Real-World Example: Managing Wellbore Data
In practice, the landscape is more complex than this simple distinction. Consider a wellbore—a critical asset for oil & gas operators. Each borehole may have several drilling runs, each associated with a Gamma Ray curve measurement. Gamma Ray tools measure the random natural radiation emissions of the formation, capturing readings over time. For a single borehole, dozens of Gamma Ray curves could be collected from different tool runs, repeated sections, varying conditions (open hole vs. cased hole), and before or after environmental corrections. This type of semi-structured data is challenging to manage with traditional SQL data models, such as PPDM. On the other hand, OSDU and ElasticSearch offer the flexibility to capture and ingest this data. However, if you want to query, "Give me the Gamma Ray curve of Well-1A," neither NoSQL nor AI alone may help with an instant answer; subject matter experts (SMEs) still require significant data cleaning work.
Recommended by LinkedIn
The Cartoon Analogy: SQL vs. NoSQL
The cartoon analogy provides a helpful visualization: SQL databases are excellent at managing toys in standardized shapes with well-defined logical relationships. NoSQL databases, by contrast, are more flexible and capable of handling toys of various shapes and categories. In this analogy, a Schema acts as a 'virtual container' that helps collect and label these diverse items for further indexing and categorization—a crucial practice in data management that makes data organized and searchable. But this doesn't mean all problems are solved. In daily usage, you may notice that the "rigidity" of an SQL database is typically higher than that of a NoSQL database.
NoSQL: Challenges for System Architects
NoSQL presents significant challenges for system architects. A standard solution is to establish a coexistence between SQL and NoSQL databases. Consider the resistance one would face when attempting to completely remove Microsoft Excel from an organization and transition to a cloud database solution; a similar mindset applies in this context. The primary direction of technological growth is leaning increasingly towards NoSQL, which includes specialized needs such as knowledge base capture (Graph databases) and AI embedding models (Vector databases.)
At the same time, there are numerous challenges in migrating operations to NoSQL, such as legacy database migration, database maintenance and recovery, and deployment costs. The transition is not straightforward.
Takeaways
The technical roadmap from SQL to NoSQL is often a discovery journey—from "knowing what we know" to "not knowing what we don't know." The great Chinese philosopher Confucius summed up this learning curve: "To know what you know and what you do not know, that is True Knowledge."
Architects must propose a minimal-change transition plan and strive to balance a "coexisting" ecosystem that leverages SQL and NoSQL databases.