What is Hadoop on HDInsight? What is Spark? Analytics with R Server What is HBase? What is Storm? Service offerings, components & versions The Hadoop ecosystem Learning guide Start with Hadoop Start with R Server Start with Spark Start with HBase & NoSQL Start with Storm Tools for Visual Studio Storage options Hive with Hadoop Pig with Hadoop MapReduce samples On-demand clusters Submit Hadoop jobs About R Server on HDInsight Start with R Server on HDInsight Storage options Install RStudio Compute contexts Analyze data with Power BI Create a standalone app Stream with Event Hubs Machine Learning: Predict food inspection result Machine Learning: Predict building temperature Website log analysis Kernels for Jupyter Packages for Jupyter Notebooks Use local Jupyter notebook Remote jobs with Livy IntelliJ IDEA plugin IntelliJ for remote debugging Manage cluster resources Zeppelin notebook Debug Spark jobs Known issues for Spark Phoenix & SQLLine client Analyze real-time tweets Configure on Virtual Network Configure geo-replication Develop a Java app Process IoT data More Storm examples Storm topology dashboard Develop C# topologies Develop Java-based topologies Process events Use Power BI on a topology Process sensor data in real-time Analyze stored sensor data Real-time sensor data analytics Analyze stored tweets Real-time Twitter trends Analyze real-time tweets Analyze flight delay data Recommendations with Mahout Analyze website logs Process IoT data Publish HDInsight applications Install HDInsight applications on clusters Install custom applications Use REST to install applications Install applications on clusters Customize clusters with script actions Customize clusters with Bootstrap Install Zeppelin Giraph on clusters Solr on clusters R language on clusters Hue on clusters Extend with Virtual Network Connect with SSH Create clusters Manage clusters Upload data Manage Spark cluster resources Manage & monitor with Ambari Availability & reliability REST API reference Spark REST API for remote jobs PowerShell cmdlets .NET SDK for Hadoop .NET SDK for HBase .NET library for Avro Optimize Hive queries Process JSON using Hive Python with Hive & Pig Hive, Pig & user-defined functions Python streaming programs Serialize data with Avro Tips for Hadoop on Linux Release notes Cluster status & error codes YARN application logs Blob storage heap dumps Stack trace errors Get help on the forum Get started in the Hadoop ecosystem with a Hadoop sandbox on a virtual machine
Learn how to install the Hadoop sandbox from Hortonworks on a virtual machine to learn about the Hadoop ecosystem. The sandbox provides a local development environment to learn about Hadoop, Hadoop Distributed File System (HDFS), and job submission.
Once you are familiar with Hadoop, you can start using Hadoop on Azure by creating an HDInsight cluster. For more information on how to get started, see
Get started with Hadoop on HDInsight. Download and install the virtual machine
http://hortonworks.com/downloads/#sandbox, select the DOWNLOAD FOR VIRTUALBOX item for HDP 2.4 on Hortonworks Sandbox. You will be prompted to register with Hortonworks before the download begins.
From the same web page, select the
VirtualBox Install Guide for HDP 2.4 on Hortonworks Sandbox. This will download a PDF containing installation instructions for the virtual machine. Start the virtual machine
Start VirtualBox, select the Hortonworks Sandbox, select
Start, and then Normal Start.
Once the virtual machine has finished the boot process, it will display login instructions. Open a web browser and navigate to the URL displayed (usually http://127.0.0.1:8888).
get started step of the Hortonworks Sandbox page, select View Advanced Options. Use the information on this page to login to the sandbox using SSH. Use the name and password provided. Note:
If you do not have an SSH client installed, you can use the web-based SSH provided at by the virtual machine at
The first time you connect using SSH, you will be prompted to change the password for the root account. Enter a new password, which will be used when you login using SSH in the future.
Once logged in, enter the following command:
When prompted, provide a password for the Ambari admin account. This will be used when you access the Ambari Web UI.
Use the hive command
From an SSH connection to the sandbox, use the following command to start the Hive shell:
Once the shell has started, use the following to view the tables that are provided with the sandbox:
Use the following to retrieve 10 rows from the
select * from sample_07 limit 10;