On June 6, Microsoft announced HBase as a preview feature
of Azure HDInsight. On August 21, we announced the general availability of HBase (along with the preview of Azure DocumentDB and Search).
Apache HBase is a columnar NoSQL (“not only Structured Query Language”) distributed database project of the Apache Hadoop ecosystem. HBase adds transactional capabilities to the Hadoop ecosystem allowing customers to do fast record updates and lookups on large datasets in Azure Blobs. As a distributed database, HBase was architected to scale as load and performance demand increases. Thus, HBase is ideal for customers who want to do transactions on millions to billions rows of data (at GA, HBase will support up to 500 Terabytes in Azure Blobs). However, HBase wasn’t built to replace all scenarios of a standard RDBMS as it lacks features like an optimizer, secondary indexes, advanced query languages, etc. Various canonical examples of HBase usage include:
- Internet of Things – HBase could be used as the storage for the millions of real time events coming from devices, sensors, equipment/machinery and social media. Hadoop with HDInsight can then perform batch analysis on the data that was stored in Azure Blobs.
- Web Logs – store and index web logs and clickstream data using HBase. Hadoop with HDInsight can then do batch analysis on this data.
- Social Sentiment – Use HBase to write and store data from the social sentiment fire hose (example: Twitter).
We invite you to learn more about HBase through our HDInsight documentation and getting started guides:
We also invite you to learn more about Hadoop and HDInsight through the following resources: