• 4 min read

Paxata launches Self-Service Data Preparation on Azure HDInsight to accelerate Data Prep

This is announcement post about availability of Paxata on Azure HDInsight.

We are pleased to announce the expansion of HDInsight Application Platform to include Paxata, a leading self-service data preparation offering. You can get this offering now at Azure Marketplace and read more on the press announcement by Paxata.

Azure HDInsight is the industry leading fully-managed cloud Apache Hadoop and Spark offering, which gives you optimized open-source analytic clusters for Spark, Hive, MapReduce, HBase, Storm, Kafka, and Microsoft R Server backed by a 99.9% SLA. Paxata’s Adaptive Information Platform empowers business consumers to turn raw data into ready information, instantly and automatically, in order to gain fast time-to-insights, thus accelerating time to value for customers using HDInsight. This combined offering of Paxata on Azure HDInsight enables customers to gain insights faster while running on an enterprise ready platform.

Microsoft Azure HDInsight – Open Source Big Data Analytics at Enterprise grade & scale

Each of Azure HDInsight's big data technologies are easily deployable as managed clusters with enterprise-level security and monitoring. The ecosystem of productivity applications in Big data has grown with the goal of making it easier for customers to solve their big data and analytical problems faster. Today, customers often find it challenging to discover these productivity applications and then in turn struggle to install and configure these apps.

To address this gap, the HDInsight Application Platform provides a unique experience to HDInsight where Independent Software Vendors (ISV’s) can directly offer their applications to customers – and customers can easily discover, install and use these applications built for the Big data ecosystem by a single click.

The largest and most time-consuming challenge for analytics is simply getting the data ready. Roughly 80% of the time is spent bringing together data from diverse sources, cleansing, shaping, and preparing data. As part of this integration, Paxata has optimized their Spark-based Adaptive Information Platform on Azure HDInsight to simplify information management for business consumers.

Paxata Self-Service Data Preparation – Accelerates analytics and time to insight

Paxata Self-Service Data Preparation application, built on the Adaptive Information Platform, provides an intuitive solution that enables any business consumer to turn raw data into trustworthy information and gain insights from their data faster. You can combine unstructured and structured data from various sources, cleanse, shape and publish data to any destination. Business consumers work with an interactive, visual experience with complete governance and reliable performance provided with HDInsight. This truly enables a self-service model for big data where non-technical users can harness the power of big data to accelerate insights.

The following are the salient highlights of Adaptive Information Platform:

  • Easy to use: Familiar to customers, business consumers use an Excel-like, intuitive, interactive visual experience to interact with data with no coding required.
  •  Smart: Algorithmic intelligence is used to recommend how to join and append datasets and normalize values.
  • Unified Information Platform: Paxata provides a unified solution for data integration, data quality, enrichment, collaboration and governance.
  • Built-in governance and security:  Paxata provides self-documenting data lineage and support for authentication, authorization, encryption, auditing and usage tracking.
  • Built for scale: Powered by the Apache Spark™-based engine for in-memory high-performance, parallel, pipelined, distributed processing.
  • Built for the cloud: Elastic scalability to support variable workloads.

paxata

Following image shows how Paxata Adaptive Information Platform delivers comprehensive information management with governance, scalability and extensibility.

paxataarchnew

Paxata on Azure HDInsight: Simplified information management at enterprise scale.

Customers can install Paxata on HDInsight using the one-click deploy experience of HDInsight Application Platform. Paxata’s Adaptive Information Platform is deployed as an application in a secure and compliant manner and doesn’t require customers to open up any ports. All requests are routed through the secure gateway on HDInsight and users are authenticated with Paxata’s own authentication system as well.

Once provisioned, business consumers can access the data using the self-service interface in a secure manner and analyze large data volumes interactively. This is made possible because Paxata’s Adaptive Information Platform leverages Apache Spark™ running on Azure HDInsight, a managed service which is backed by enterprise grade SLA of 99.9%. This ensures that the end user, in this case a business consumer, can use Paxata’s Adaptive Information Management Platform and focus on turning raw data into information without worrying about the underlying platform.

Getting started with Paxata’s Adaptive Information Platform on HDInsight

To install Paxata’s Adaptive Information Platform on HDInsight, you have to create a HDInsight 3.6 cluster with Apache Spark 2.1. You can choose Paxata as an application when creating a new cluster or add Paxata to an existing cluster as well. If you don’t have a license key, you can get one at the Paxata Azure data prep page.

The following screenshot shows how to install Paxata on HDInsight Spark cluster.

paxataazure

Once Paxata’s Adaptive Information Platform is installed you can launch it by browsing to the applications blade inside the HDInsight cluster.

launchpaxata

Resources

Summary

We are pleased to announce the expansion of HDInsight Application Platform to include Paxata. Paxata’s Adaptive Information Platform empowers business consumers to turn raw data into ready information, instantly, in order to gain fast time-to-insights, thus accelerating time to value for customers using HDInsight. This combined offering of Paxata on Azure HDInsight enables customers to gain insights faster while running on an enterprise ready platform.