Dan's Take
        
        ArangoDB Combines Several Data Models in One NoSQL Database
        Is it something new or a repeat of an old idea?
        
        
			- By Dan Kusnetzky
- 10/31/2017
A representative of ArangoDB reached out to present what  the company thinks is a new idea. I was told that "We are working on the native  multi-model idea, meaning that we found a way to efficiently combine three data  models (key/value, documents, graphs) into one NoSQL database core and let  users access their data with one declarative query language: AQL."
  Reviewing the history of databases, we quickly see that  there have been many different approaches applied to address developer needs to  store and retrieve different types of data. I can recall many approaches that  were widely used. (The following list certainly can't be considered  comprehensive):
Various Database Approaches
  One approach was called a navigational database. Another  was developed by the Committee on Data System Languages, and was known both by  the name "CODASYL database" and "Network database." Yet another approach was  known as a "Key/Value database," which was used both in MUMPS and the PICK  systems. The "Relational database" was next. Now we see NoSQL databases  becoming popular. Let's look at each of these in turn.
  The earliest approach to constructing a useful data store  that would allow applications to find records based upon a number of criteria  was a navigational database. It was made up of a number of separate files that  contained indices, and a single file that contained the actual data. The index  files contained a search field and a record number in the data file.  Applications could search one of the indices to find the needed record, then go  to the data file and jump directly to the needed record. The indices were kept  sorted in order so that the data file could be read in order, based on any of  the indices. Later, the index files were brought back into the data file. This  was called an "Indexed sequential file." Purists would point out that this  really can't be considered a true database, even though many early  transactional systems used this approach for data storage.
  This approach was fairly simple and easy to implement on  what would be considered tiny systems today. I used this approach for a  hospital information system on a system that only had 16KB of main memory and three  6MB disks.
  Unfortunately, it also was easy for the indexes and main  data file to get out of synchronization.   Once this issue was discovered, it was fairly straightforward to go  through the data file and recreate the corrupted index.
  One challenge when one or more indices became corrupted  was uncovering when the corruption happened, and fixing all updates to the main  data file that may have been done erroneously. Another challenge was restoring  this type of database after a system crash. If backup procedures didn't capture  all the indices, IT staff would be forced to first discover that the index was  missing, then run through a procedure to re-create it.
  In 1959, a consortium -- the Committee on Data System  Languages -- was formed to both guide the development of a standard programming  language and the development of a standard way to store "lists." The COBOL  language was one of this committee's projects, and a "network database" was the  result of its work on a standard way to store and retrieve data.
  Without going into minute detail, the committee developed  the concept of a Data Description Language (DDL) that was used to define the  items that were to be stored and the relationships between and among them. A  Data Manipulation Language (DML) was used by developers to give commands to the  database, allowing programs to store, retrieve and update data.
Upside-Down Trees
  This database model made it possible for multiple records to  be linked to multiple owner records and vice-versa. Some described this as an  "upside-down tree" in which each record was linked to one or more owners and,  potentially, other data items in a "mesh." While powerful, this approach was  hard to understand and quite complex. I am aware of a situation in which a  malicious developer, who was planning to leave the company, created a program  that exploited a badly designed mesh. This program would make database queries  that could never be satisfied, so that all the processing power of the host  system would be consumed following links back and forth and up and down the  mesh. Not funny.
  This approach to databases was somewhat similar to the  navigational database model, but the data was stored in the indexes themselves.  This meant that data could be retrieved in sorted order without an explicit  sort process. Small files containing the index data would be developed that  pointed back to where the rest of the "record" could be retrieved. This  database model was very good for applications that stored and retrieved  individual data items. It wasn't very good for applications in which the entire  data store needed to be transversed to create a report.
  The relational database model is extremely popular today,  and can be found supporting applications in everything from embedded numerical  control applications, to applications in mobile phones and tablets, to PCs, to  every type of server.
Relational Databases
  Proposed by E.F. Codd in 1970, it organizes data into one  or more "tables" or "relations." The columns in these tables are known as  "attributes." Rows are known as "records." Each record has a unique index  value. Database developers break down the data into a special form that places  related data for one product or customer in different tables. So, a customer  database might be made up of an address table, an order table, and a credit  card table.
  This approach has proven to be very flexible, but at times  it can be cumbersome for simple search and retrieval applications or  applications that need to process non-structured data, such as documents, maps,  graphs, presentations or even operational log files.
  The NoSQL database can be seen as a reaction to the  limitations of relational databases. The data is not kept in structured tables,  but still can be searched to find relevant records.
  Multi-model databases maintain data in a format that's easy  to create, search and update and then offers mechanisms that support  relational, non-relational and key/value store access. Object-relational  database systems can be seen as early forms of multi-model databases. There are  a number of suppliers offering databases in this category, including ArangoDB,  Cosmos DB, CouchBase, CrateDB, Datastax, EnterpriseDB, MarkLogic and even  Oracle. There are a few others, but I think you get the point. 
  One prominent proponent was FoundationDB, which was  acquired by Apple after several impressive wins. Apple pulled FoundationDB from  the market shortly after acquiring it.
Dan's Take: Intriguing Technology, but Questions Need to  be Asked
  As I read through ArangoDB, I saw that the ideas being  presented by the company make sense. If it was possible for an enterprise to  standardize on a single database engine, it would be possible to simplify  development and support. If a single database engine could access data using  many different access mechanisms, the data could be moved from all the separate  databases in current enterprise use without having to change a huge portfolio  of applications.
  The key questions about this "centralization of data" is  whether the multi-model database would be as or more efficient than database  engines in current use, and if the performance of database-based applications  would be the same or better.
  ArangoDB says that "You can store your data as key/value  pairs, graphs or documents and access any or all of your data using a single  declarative query language." Does that mean that all established applications  must be changed to use this query language? If so, it's unlikely that  enterprises will change what they're doing to move from one database to  another. Any savings realized through the use of a single database would be  more than consumed by the effort to change everything.
  Another concern is how transportable this database  technology is. Enterprises currently have mainframes, midrange systems running  a number of single-vendor operating systems, midrange systems running UNIX and  many industry standard x86-based systems running Windows, Linux and UNIX. It  isn't clear if ArangoDB supports all of these platforms. If not, database  unification isn't really possible.
  I hope to learn more about the company and its products  and, perhaps, speak with users of this technology. While I think this is an  interesting idea, as usual, the devil is in the details.
        
        
        
        
        
        
        
        
        
        
        
        
            
        
        
                
                    About the Author
                    
                
                    
                    Daniel Kusnetzky, a reformed software engineer and product manager, founded Kusnetzky Group LLC in 2006. He's literally written the book on virtualization and often comments on cloud computing, mobility and systems software. He has been a business unit manager at a hardware company and head of corporate marketing and strategy at a software company.