Evolution of the Modern Database
This article is the first in a series wherein I will discuss the evolution of information management and how we find ourselves at the modern database platform. My intended audiences are technically-adjacent and non-technical professionals who are interested in learning more about database technology.
Clay, Bones, and Paper
For more than 6,000 years, humans have used methods to create and record information. Whether people used clay tablets, animal bones, papyrus, etc. early methods of record-keeping were both fragile and cumbersome, making organization, storage, and retrieval of information difficult. Furthermore, at some point in the history of every civilization, the number of records that needed to be kept grew so large that organizing and finding information was extremely laborious. We see some of the first attempts at establishing organization around stored data with the creation of concordances (alphabetical lists of records) during the 5th century and with page numbers and indexes in books (1470) around the invention of the printing press.
Libraries and Classification Systems
As printed information became widely accessible, a larger population of people became hungry to extract information from books, documents, etc. Eventually the size and amount of information which people wanted to access became so large that even an alphabetical list proved extremely labor intensive to find anything in a timely manner. This problem was especially true in public libraries. Therefore, the next advent in data management we see is classification systems which, when combined with a physical storage method, allow data to be found more quickly by the user. Two such systems, the Dewey Decimal System in 1876 and the Library of Congress Classification (LOCC) system in 1897, gained wide adoption in public libraries, colleges, and universities in the United States, because, while somewhat different in design, both Dewey and LOCC categorize information by topic and require information of the same topic be physically stored together. All of the sudden, a library-goer could see, all on one page, the 10 major topics of information in the library. This same person could simply walk to physical shelves in the library where books on geography or painting were stored. People saved time and accessed information more quickly.
These classification systems, while an improvement, utilize a “call number system” assigning a number to every book in a library. Each book would have a corresponding card present in a card catalog which would allow a user to “browse” books without actually going to the physical shelves. It’s quite amazing to think about now: an entire library was documented on small individual cards which took up lots of space and could be easily removed from the card catalog by library-goers seeking a book. This user-friendly system was difficult to keep up-to-date and error-free. Therefore, it’s no surprise that in the decades following the advent of computers, libraries began transcribing their card catalogs into digital databases. Many of us who attended grade school in the 80’s can remember learning how to use both the physical call number card catalog and learning how to use the digitized version later.
Recommended by LinkedIn
Digital Revolution
Surprisingly, the earliest electronic databases, created in the 1960’s, wouldn’t have worked well for the average library-goer. These early databases, dubbed hierarchical databases, were very basic and consisted of digital lists that were essentially copies of their physical counterparts. While hierarchical databases were an improvement over physical paper lists in both space, durability and consistency, they were not searchable beyond basic categories. Hierarchical databases were difficult to extract information from since a user would be limited to certain “parent” categories for searches.
I asked my colleague (and rockstar software engineer), Obioma Anomnachi , about hierarchical databases. Obioma said, “In the library analogy, [with hierarchical database structure] you could still get a list of all of the books by a single author or maybe book series that are then broken down by book - the hierarchical database has a tree like structure. However, the format can't deal with books with 2+ authors. It would either have to have an author row containing all the authors and have that data be separated from any of the authors’ individual lists or it would have to duplicate data, storing the book in each one of the contributing authors lists.” Multiple lists take up a lot of physical (disk) space, and would require any additions or changes to be made in all applicable lists.
Relational Databases and SQL
While innovative, early database models only provided a partial solution to the information management problem. In order to be truly useful, database design needed to improve to the point that retrieval and use of the data therein became easier. In the 1970’s, IBM and other companies would usher in an innovation which would do just that. With the advent of the “relational” database, a single database could be used (which made storage much more efficient), and users were able to query data. Along with this new database design, a standardized language was developed called Structured Query Language (SQL, pronounced ‘sequel’), which would become, and still is, the standard language used in relational databases.
For decades, different flavors of relational databases thrived (e.g., Oracle Database, MariaDB, Microsoft SQL Server, etc.). However, history would repeat itself as the amount of data produced quickly outpaced the storage method and design. In my next article, I will discuss the “big data” revolution, including how hardware innovations allowed for nearly limitless amounts of data storage which, when combined with a public accustomed to blazingly fast processing speeds, established the need for a new type of database design.
Written by Allison Sheppard Nokes with help from Nicholas Brackley and Obioma Anomnachi for Anant Corporation .