Earlier today we published a paper Total Cost of (Non) Ownership (TCO) of a NoSQL Database Cloud Service. TCO is an important consideration when choosing your NoSQL database, and customers often overlook many factors impacting the TCO. In the paper we compare TCO of running NoSQL databases in the following scenarios:
- OSS NoSQL database like Cassandra or MongoDB hosted on-premises
- OSS NoSQL database hosted on Azure Virtual Machines
- Using a managed NoSQL database as a service such as Azure DocumentDB.
To minimize our bias, we leveraged scenarios from other publications whenever possible.
In part 1 of our TCO paper, we explore an end-to-end gaming scenario from a similar paper NoSQL TCO analysis published by Amazon. We kept scenario parameters and assumptions unchanged and used the same methodology for computing the TCO for OSS NoSQL databases on-premise and on virtual machines. Of course in our paper we used Azure Virtual Machines. The scenario explores an online game that is based on a movie, and involves three different levels of game popularity: the time before the movie is released (low usage), the first month after the movie releases (high usage), and subsequent usage (medium usage), with different volume of transactions and data stored during each stage, as listed in the chart below.
The results of our analysis are fairly consistent with AWS paper. Once all the relevant TCO considerations taken into account, the managed cloud services like DocumentDB and DynamoDB can be five to ten times more cost effective than their OSS counter-parts running on-premises or virtual machines.
The following factors make managed NoSQL cloud services like DocumentDB more cost effective than their OSS counter-parts running on-premises or virtual machines:
- No NoSQL administration dev/ops required. Because DocumentDB is a managed cloud service, you do not need to employ a dev/ops team to handle deployments, maintenance, scale, patching and other day-to-day tasks required with an OSS NoSQL cluster hosted on-premises or on cloud infrastructure.
- Superior elasticity. DocumentDB throughput can be scaled up and down within seconds, allowing you to reduce the cost of ownership during non-peak times. OSS NoSQL clusters deployed on cloud infrastructure offer limited elasticity, and on-premises deployments are not elastic.
- Economy of scale. Managed services like DocumentDB are operating really large number of nodes, and are able to pass on savings to the customer.
- Cloud optimized. Managed services like DocumentDB take full advantage of the cloud. OSS NoSQL databases at the moment are not optimized for specific cloud providers. For example, OSS NoSQL software is unaware of the differences between a node going down vs a routine image upgrade, or the fact that premium disk is already three-way replicated.
The TCO for Azure DocumentDB and AWS DynamoDB in this moderate scenario were comparable, with Azure DocumentDB slightly (~10%) cheaper due to lower costs for write requests.
One challenge with the approach taken in Amazon’s whitepaper is the number of assumptions (often not explicitly articulated) made about the cost of running OSS NoSQL database. To start with, the paper does not mention which OSS NoSQL database is being used for comparison. It is difficult to imagine that the TCO of running two very different NoSQL database engines such as Cassandra or MongoDB for the same scenario would be exactly the same. However, we think Amazon’s methodology maintains its important qualitative merit, this concern non-withstanding.
In the second section of our whitepaper we attempt to address this concern, and provide more precise quantitative comparison for more specific scenarios. We examine three scenarios:
- Ingesting one million records/second
- A balanced 50/50 read/write workload
- Ingesting one million records/second in regular bursts
We compare the TCO for these micro-scenarios when using the following NoSQL databases: Azure DocumentDB, Amazon DynamoDB, and OSS Cassandra on Azure D14v2 Linux Virtual Machines, a popular NoSQL choice for high data volume scenarios. In order to run tests with Cassandra, we utilize the open source Cassandra-stress command included in the open source PerfKit Benchmarker.
Hourly TCO results depicted in the chart above are consistent with the observations in Part 1, with few additional quantitative findings:
- DocumentDB TCO is comparable to that of OSS Cassandra running on Azure D14v2 VMs for scenarios involving high sustained pre-dominantly write workloads with low storage needs (i.e. local SSD on the Cassandra nodes is sufficient). For example, 1M writes with a time to live (TTL) less than three hours, or most writes are updates. Cassandra is famous for its good performance for such scenarios and in the early stages of product development is often seen very attractive for this reason. However, the non-trivial dev/ops cost component brings the total cost of ownership of Cassandra deployment higher.
- If more storage is needed, or the workload involves a balanced read / write mix, or the workload is bursty, DocumentDB TCO can be up to 4 time lower than OSS Cassandra running on Azure VMs. Cassandra's TCO is higher in these scenarios due to non-trivial dev/ops cost for administration of Cassandra clusters and Cassandra's lack of awareness of the underlying cloud platform. DocumentDB TCO is lower thanks to superior elasticity and lower cost for reads and queries thanks to low overhead auto-indexing.
- DocumentDB is up to two to three times cheaper than DynamoDB for high volume workloads we examined. Thanks to predictable performance guaranteed by both offerings, these numbers can be verified by simply comparing the public retail price pages. DocumentDB offers write optimized low overhead indexing by default making queries more efficient without worrying about secondary indexes. DocumentDB writes are significantly less expensive for high throughput workloads.
In conclusion, we’d like to add that TCO is only one (albeit an important one) consideration when choosing NoSQL database. Each of these products compared shines in its own way. Product capabilities, ease of development, support, community and other factors need to be taken into account when making a decision. The paper includes briend overview of DocumentDB functionality.
On the community front, we applaud MongoDB and Cassandra projects for creating significant community around their offerings. In order to make Azure a better place for these communities we recently offered protocol level support for MongoDB API as part of DocumentDB offering, and are encouraged with the feedback received to date from MongoDB developers. DocumentDB customers can now take advantage of the MongoDB API community expertise, as well as not worry about locking in into proprietary APIs, a common concern with PaaS services.