A no-limits data lake to power intelligent action
- Store and analyse petabyte-size files and trillions of objects
- Develop massively parallel programs with simplicity
- Debug and optimise your big data programs with ease
- Enterprise-grade security, auditing and support
- Start in seconds, scale instantly and pay per job
- Built on YARN, designed for the cloud
Data Lake Analytics – a no-limits analytics job service to power intelligent action
The first cloud analytics service where you can easily develop and run massively parallel data transformation and processing programs in U-SQL, R, Python and .Net over petabytes of data. With no infrastructure to manage, process data on demand, scale instantly and only pay per job. Learn more
HDInsight – cloud Apache Spark and Hadoop® service for the enterprise
HDInsight is the only fully managed Cloud Hadoop offering that provides optimised open-source analytic clusters for Spark, Hive, Map Reduce, HBase, Storm, Kafka and R-Server backed by a 99.9% SLA. Each of these Big Data technologies, as well as ISV applications, are easily deployable as managed clusters, with enterprise-level security and monitoring. Learn more
Data Lake Store – a no-limits data lake that powers big data analytics
The first cloud data lake for enterprises that is secure, massively scalable and built in accordance with the open HDFS standard. With no limits to the size of data and the ability to run massively parallel analytics, you can now unlock value from all of your unstructured, semi-structured and structured data. Learn more
Develop, debug and optimise big data programs with ease
Finding the right tools to design and tune your big data queries can be difficult. Data Lake makes this easy through deep integration with Visual Studio, Eclipse and IntelliJ, so that you can use familiar tools to run, debug and tune your code. Visualisations of your U-SQL, Apache Spark, Apache Hive and Apache Storm jobs let you see how your code runs at scale and identify performance bottlenecks and cost optimisations, making it easier to tune your queries. Our execution environment actively analyses your programs as they run and offers recommendations to improve performance and reduce cost. Data engineers, DBAs and data architects can use existing skills, such as SQL, Apache Hadoop, Apache Spark, R, Python, Java and .NET, to become productive from day one.
Integrates seamlessly with your existing IT investments
One of the top challenges of big data is integration with existing IT investments. Data Lake is a key part of Cortana Intelligence, meaning that it works with Azure Synapse Analytics, Power BI and Data Factory for a complete cloud big data and advanced analytics platform that helps you with everything from data preparation to doing interactive analytics on large-scale data sets. Data Lake Analytics gives you the power to act on all your data with optimised data virtualisation of your relational sources, such as Azure SQL Server on virtual machines, Azure SQL Database and Azure Synapse Analytics. Queries are automatically optimised by moving processing close to the source data without data movement, thereby maximising performance and minimising latency. Finally, because Data Lake is in Azure, you can connect to any data generated by applications or ingested by devices in Internet of Things (IoT) scenarios.
Store and analyse petabyte-size files and trillions of objects
Data Lake was architected from the ground up for cloud scale and performance. With Azure Data Lake Store, your organisation can analyse all of its data in one place, with no artificial constraints. Your Data Lake Store can store trillions of files, and a single file can be greater than a petabyte in size – 200 times larger than other cloud stores. This means that you don’t have to rewrite code as you increase or decrease the size of the data stored or the amount of compute being spun up. This lets you focus on your business logic only and not on how you process and store large datasets. Data Lake also takes away the complexities normally associated with big data in the cloud, ensuring that it can meet your current and future business needs.
Affordable and cost-effective
Data Lake is a cost-effective solution to run big data workloads. You can choose between on-demand clusters or a pay-per-job model when data is processed. In both cases, no hardware, licences or service-specific support agreements are required. The system scales up or down with your business needs, meaning that you never pay for more than you need. It also lets you independently scale storage and compute, enabling more economic flexibility than traditional big data solutions. Finally, it minimises the need to hire specialised operations teams typically associated with running a big data infrastructure. Data Lake minimises your costs while maximising the return on your data investment. A recent study showed that HDInsight delivered a 63% lower TCO compared to deploying Hadoop on premises over five years.
Enterprise-grade security, auditing and support
Data Lake is fully managed and supported by Microsoft, backed by an enterprise-grade SLA and support. With 24/7 customer support, you can contact us to address any challenges that you’re facing with your entire big data solution. Our team monitors your deployment so that you don’t have to, guaranteeing that it will run continuously. Data Lake protects your data assets and extends your on-premises security and governance controls to the cloud easily. Data is always encrypted – in motion using SSL, and at rest using service or user-managed HSM-backed keys in Azure Key Vault. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built in with Azure Active Directory. You can authorise users and groups with fine-grained POSIX-based ACLs for all data in the Store, enabling role-based access controls. Finally, you can meet security and regulatory compliance needs by auditing every access or configuration change to the system.