A fintech startup pivots to Azure Cosmos DB

Publisert på 17 desember, 2018

Software Architect, Microsoft Azure

The right technology choices can accelerate success for a cloud born business. This is true for the fintech start-up clearTREND Research. Their solution architecture team knew one of the most important decisions would be the database decision between SQL or NoSQL. After research, experimentation, and many design iterations the team was thrilled with their decision to deploy on Microsoft Azure Cosmos DB. This blog is about how their decision was made.

Data and AI are driving a surge of cloud business opportunities, and one technology decision that deserves careful evaluation is the choice of a cloud database. Relational databases continue to be popular and drive a significant demand with cloud-based solutions, but NoSQL databases are well suited for distributed global scale solutions.

For our partner clearTREND, the plan was to commercialize a financial trend engine and provide a subscription investment service to individuals and professionals. The team responsible for clearTREND’s SaaS solution are a veteran team of software developers and architects who have been implementing cloud-based solutions for years. They understood the business opportunity and wanted to better understand the database technology options. Through their due diligence, the architecture morphed as business priorities and data sets were refined. After a lot of research and hands-on experimentation, the architectural team decided on Azure Cosmos DB as the best fit for the solution.

Especially in the financial industry, business models are under attack. Cosmos DB is a technology that can adapt, evolve and allow a business to innovate faster and turn opportunities into strategic advantages. 

Six reasons to choose Azure Cosmos DB

Below are reasons the team at clearTREND selected Azure Cosmos DB:

  1. Schema design is much easier and flexible. With an agile development methodology, schemas change frequently and the ability to quickly and safely implement changes is a big advantage. Azure Cosmos DB is schema-agnostic so there is massive flexibility around how the data can be consumed.
  2. Database reads and writes are really fast. Azure Cosmos DB can provide less than 10 millisecond reads and writes, backed with a service level agreement (SLA).
  3. Queries run lightning fast and autoindexing is a game-changer. Reads and writes based on a primary or partition key are fast, but for many NoSQL implementations, queries executed against non-keyed document attributes may perform poorly. Secondary indexing can be a management and maintenance burden. By default, Azure Cosmos DB automatically indexes all the attributes in a document, so query performance is optimized as soon as data is loaded. Another benefit of auto-indexing is that the schema and indexes are fully synchronized so schema changes can be implemented quickly without downtime or management needed for secondary indexes.
  4. With thoughtful design Azure Cosmos DB can be very cost-effective. The Azure Cosmos DB cost model depends on how the database is designed via number of collections, partitioning key, index strategy, document size, and number of documents. Pricing for Azure Cosmos DB is based on resources that have been reserved, these resources are called request units or RUs and are described in the “Request Units in Azure Cosmos DB” documentation. The clearTREND schema design is implemented as a single document collection and the entire cost of the solution on Azure, including Azure Cosmos DB is at an affordable monthly price. Keep in mind this is a managed database service so monthly cost includes support, 99.999 percent high-availability, an SLA for read and write performance, automatic partitioning, data encrypted by default, and automatic backups.
  5. Programmatically re-size capacity for workload bursts. The clearTREND workload has a predictable daily burst pattern and RUs can be programmatically adjusted. When additional compute resources are needed for complex processing or to meet higher throughput requirements, RUs can be increased. Once the processing completes, RUs are adjusted back down. This elasticity means Azure Cosmos DB can be re-sized in order to cost-effectively adapt to workload demands.
  6. Push-button globally distributed data. Designing for future scalability of a solution can be tricky, technology and design choices can become inefficient as a solution grows beyond the initial vision. The advantage with Azure Cosmos DB is that it can become a globally configured, massively scaled out solution with just a few clicks. There are none of the operational complications of setting up and managing a cloud-scale, NoSQL distributed database.

Design and implementation tips for Azure Cosmos DB

If you are new to Azure Cosmos DB, here are some tips from the clearTREND team to consider when designing and implementing a solution:  

  • Design the schema around query and API optimization. Schema design for a NoSQL database is just as important as it is for a relational database management system (RDBMS) database, but it’s different. While a NoSQL database doesn’t require pre-defined table structures, you do have to be intentional about organizing and defining the document schema while also being aware of where and how relationships will be represented and embedded. To guide the schema design, the clearTREND team tends to group data based on the data elements that are written and retrieved by the solution’s APIs.
  • Implement a flexible partition key. Cosmos DB requires a partition key to be specified when creating a document collection over 10GB. Deciding on a partition key can be tricky because initially it may not be clear what the optimal choice is for a partition key: should it be a data category, a geographical region, an identifier, or a timescale (like monthly or yearly)? A poorly designed partition key can create a performance bottleneck called a ‘hot spot’ which concentrates read and write activity on a single partition rather than distributing activity evenly across partitions.  When the partition key for a database changes, it requires a re-indexing operation that can impact application availability as the underlying data is copied to a new collection and re-indexed.  
     
    The clearTrend team built some flexibility into the design of the partition key to mitigate the need for database re-indexing operations.  For their scenario, a common field in their document collection is type and each type has its own schema.  During design they realized an optimal partition key might be different depending upon the type of document.  Type became one of the partition key values.  For a second value in the partition key, the team defined a logical field as a string and named it PartitionID.  The idea behind PartitionID is that it can initially be set to one value (client identifier for example) and later – when a more efficient key value was determined – programmatically replaced with a new value.   With this approach, the logical definition of the partition key does not change but the partition key value can change.   CosmosDB will have to re-hash the partition key and re-locate the items in the correct logical partition but it can be used to avert a database-wide re-indexing operation when only a subset of the documents in the collection are impacted.
  • Consider a schema design based on a single collection. A common design strategy is to use one document type per collection, but there are benefits to storing multiple document types in a single collection. Collections are the basis for partitioning and indexing so it may not seem intuitive to store multiple document types in a single collection. But it can maximize functionality with no cross-collection operations needed and minimize overall cost, this is because a single collection is less expensive than multiple collections. The clearTREND solution has seven different document types, all stored in a single collection. The approach is implemented with an enumerated field called doc type from which all documents are derived. Every document has a doc type property to correspond to one of the seven document types.    
  • Tune schema design by understanding the RU costs of complex queries and stored procedure operations. It can be difficult to anticipate the costs for complex queries and stored procedures, especially if you don’t know in advance how many reads or writes Azure Cosmos DB will need to execute the operation. Capture the metrics and costs (RUs) for complex operations and use the information to streamline schema design. One way to capture these metrics is to execute the query or stored procedure from the Azure Cosmos DB dashboard on the Azure portal.
  • Consider embedding a simple or calculated expression as a document property. If there are requirements to calculate a simple aggregation like a count, sum, minimum, and maximum, or there is a need to evaluate a simple Boolean logic expression, it may make sense to define the expression as a property of the base document class. For instance, in a logging application there is likely logic to evaluating conditions and determine if an operation has been successful or not. If the logic is a simple Boolean expression like the one below, consider including it in the class definition:
public class LogStatus
{
    // C# example of a Boolean expression embedded in a class definition
      public bool Failed => !((WasReadSuccessful && WasOptimizationSuccssful && StatusMsg == “Success”) ||
(WasReadSuccessful && !IsDataCurrent));
      public string StatusMsg {get; set;}
      public bool WasReadSuccessful {get; set;}
      public bool WasOptimizationSuccessful {get;set}
      public bool IsDataCurrent {get;set}
  }

The command field showing Failed is defined as a read-only calculated property. If database usage is primarily read intensive, then this approach has the potential to reduce overall RU cost as the expression is evaluated and stored or when the document is written. This is an alternative to reducing cost each time the document is queried. 

  • Remember, referential integrity is implemented in the application layer. Referential integrity ensures that relationships between data elements are preserved, and with an RDBMS referential integrity is enforced through keys. For example, an RDBMS uses primary and foreign keys to ensure a product exists before an order for it can be created. If referential integrity is a requirement and data dependencies need to be monitored and enforced, it needs to be done at the application layer. Be rigorous about testing for referential and data integrity.
  • Use Application Insights to monitor Azure Cosmos DB activity. Application Insights is a telemetry service and for this solution was used to collect and report detailed performance, availability, and usage information about Azure Cosmos DB activities. Azure Functions provided the integration between Azure Cosmos DB and Application Insights using Metrics Explorer and the capability to capture custom events using TelemetryClient.GetMetric() .

“Integration with AppInsights is fantastic….” Tim Miller, Principal Consultant – Skyline Technologies

Recommended next steps

NoSQL is a paradigm rapidly shifting the way database solutions are implemented in the cloud. Whether you are a developer or database professional, Azure Cosmos DB is an increasingly important player in the cloud database landscape and can be a game changer for your solution. If you haven’t already, get introduced to the advantages and capabilities of Azure Cosmos DB. Take a look at the documentation, dissect the sample GitHub application, and learn more about design patterns:

Thank you to our partners clearTREND and Skyline Technologies!

One of the great things about working for Microsoft are the opportunities to work with customers and partners, and to learn through them about their creative approaches for implementing technology. The team that designed and implemented the clearTREND solution are architects and developers with Skyline Technologies. Passionate about their business clients and solving complex technical challenges, they were very early cloud adopters. We especially appreciate the team members who gave their time to this effort including Tim Miller, Greg Levenhagen, and Michael Lauer. It’s been a pleasure working with you.