{"id":1369,"date":"2019-04-29T00:00:00","date_gmt":"2019-04-29T00:00:00","guid":{"rendered":""},"modified":"2025-06-21T05:30:04","modified_gmt":"2025-06-21T12:30:04","slug":"understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook","status":"publish","type":"post","link":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/","title":{"rendered":"Understanding HDInsight Spark jobs and data through visualizations in the Jupyter Notebook"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">The Jupyter Notebook on HDInsight Spark clusters is useful when you need to quickly explore data sets, perform trend analysis, or try different machine learning models. Not being able to track the status of Spark jobs and intermediate data can make it difficult for data scientists to monitor and optimize what they are doing inside the Jupyter Notebook.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To address these challenges, we are adding cutting edge job execution and visualization experiences into the HDInsight Spark in-cluster Jupyter Notebook. Today, we are delighted to share the release of the real time <strong>Spark job progress indicator<\/strong>, <strong>native matplotlib support for PySpark DataFrame<\/strong>, and the <strong>cell execution status indicator<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"spark-job-progress-indicator\">Spark job progress indicator<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">When you run an interactive Spark job inside the notebook, a Spark job progress indicator with a real time progress bar appears to help you understand the job execution status. You can also switch tabs to see a resource utilization view for active tasks and allocated cores, or a Gantt chart of jobs, stages, and tasks for the overall workload.<\/p>\n\n\n\n<figure class=\"wp-block-image has-custom-border\"><img decoding=\"async\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/c87efc15-c26e-47d0-8f0e-07568d50ab49.gif\" alt=\"Spark job progress indicator_thumb[2]\" style=\"border-radius:0px\" title=\"Spark job progress indicator_thumb[2]\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"native-matplotlib-support-for-pyspark-dataframe\">Native matplotlib support for PySpark DataFrame<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Previously, PySpark did not support matplotlib. If you wanted to plot something, you would first need to export the PySpark DataFrame out of the Spark context, convert it into a local python session, and plot from there. In this release, we provide native matplotlib support for PySpark DataFrame. You can use matplotlib directly on the PySpark DataFrame just as it is in local. No need to transfer data back and forth between the cluster spark context and the local python session.<\/p>\n\n\n\n<figure class=\"wp-block-image has-custom-border\"><img decoding=\"async\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/88f0e5ea-deff-4d60-b1e5-b65ec5b7adaf.webp\" alt=\"Native matplotlib support for PySpark DataFrame_thumb[2]\" style=\"border-radius:0px\" title=\"Native matplotlib support for PySpark DataFrame_thumb[2]\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"cell-execution-status-indicator\">Cell execution status indicator<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Step-by-step cell execution status is displayed beneath the cell to help you see its current progress. Once the cell run is complete, an execution summary with the total duration and end time will be shown and kept there for future reference.<\/p>\n\n\n\n<figure class=\"wp-block-image has-custom-border\"><img decoding=\"async\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/48f8bf95-01f0-4a1d-a07f-9e767f9fc253.webp\" alt=\"Cell execution status indicator_thumb[4]\" style=\"border-radius:0px\" title=\"Cell execution status indicator_thumb[4]\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"getting-started\">Getting started<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">These features have been built into the HDInsight Spark Jupyter Notebook. To get started, access HDInsight from the <a href=\"https:\/\/portal.azure.com\/\" target=\"_blank\" rel=\"noopener\">Azure portal<\/a>. Open the Spark cluster and select <strong>Jupyter Notebook<\/strong> from the quick links.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"feedback\">Feedback<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We look forward to your comments and feedback. If you have any feature requests, asks, or suggestions, please send us a note to <a href=\"mailto:cosctcs@microsoft.com\">cosctcs@microsoft.com<\/a>. For bug submissions, please <a href=\"https:\/\/adsdevtool.visualstudio.com\/AdsNotebook\/_workitems\/create\/Bug?templateId=823ea6ab-c4bf-49a7-901c-992c28a3dfae&amp;ownerId=c1de5965-8400-47ec-9e5f-6018369d1f30\" target=\"_blank\" rel=\"noopener\">open a new ticket<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For more information, check out the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/hdinsight\/spark\/apache-spark-jupyter-notebook-kernels\" target=\"_blank\" rel=\"noopener\">Kernels for Jupyter notebook on Spark clusters in Azure HDInsight<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The Jupyter Notebook on HDInsight Spark clusters is useful when you need to quickly explore data sets, perform trend analysis, or try different machine learning models. Not being able to track the status of Spark jobs and intermediate data can make it difficult for data scientists to monitor and optimize what they are doing inside the Jupyter Notebook.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ms_queue_id":[],"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","_alt_title":"","footnotes":"","msx_community_cta_settings":[]},"categories":[1474],"tags":[],"audience":[3054,3057,3053],"content-type":[1511],"product":[2895],"tech-community":[],"topic":[],"coauthors":[543],"class_list":["post-1369","post","type-post","status-publish","format-standard","hentry","category-analytics","audience-business-decision-makers","audience-data-professionals","audience-it-decision-makers","content-type-best-practices","product-azure-hdinsight-on-azure-kubernetes-service-aks"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Understanding HDInsight Spark jobs and data through visualizations in the Jupyter Notebook | Microsoft Azure Blog<\/title>\n<meta name=\"description\" content=\"The Jupyter Notebook on HDInsight Spark clusters is useful when you need to quickly explore data sets, perform trend analysis, or try different machine learning models. Not being able to track the status of Spark jobs and intermediate data can make it difficult for data scientists to monitor and optimize what they are doing inside the Jupyter Notebook.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Understanding HDInsight Spark jobs and data through visualizations in the Jupyter Notebook | Microsoft Azure Blog\" \/>\n<meta property=\"og:description\" content=\"The Jupyter Notebook on HDInsight Spark clusters is useful when you need to quickly explore data sets, perform trend analysis, or try different machine learning models. Not being able to track the status of Spark jobs and intermediate data can make it difficult for data scientists to monitor and optimize what they are doing inside the Jupyter Notebook.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Azure Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/microsoftazure\" \/>\n<meta property=\"article:published_time\" content=\"2019-04-29T00:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-21T12:30:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/c87efc15-c26e-47d0-8f0e-07568d50ab49.gif\" \/>\n<meta name=\"author\" content=\"Ruixin Xu\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@azure\" \/>\n<meta name=\"twitter:site\" content=\"@azure\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ruixin Xu\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/\"},\"author\":[{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/ruixin-xu\/\",\"@type\":\"Person\",\"@name\":\"Ruixin Xu\"}],\"headline\":\"Understanding HDInsight Spark jobs and data through visualizations in the Jupyter Notebook\",\"datePublished\":\"2019-04-29T00:00:00+00:00\",\"dateModified\":\"2025-06-21T12:30:04+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/\"},\"wordCount\":401,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/c87efc15-c26e-47d0-8f0e-07568d50ab49.gif\",\"articleSection\":[\"Analytics\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/\",\"name\":\"Understanding HDInsight Spark jobs and data through visualizations in the Jupyter Notebook | Microsoft Azure Blog\",\"isPartOf\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/c87efc15-c26e-47d0-8f0e-07568d50ab49.gif\",\"datePublished\":\"2019-04-29T00:00:00+00:00\",\"dateModified\":\"2025-06-21T12:30:04+00:00\",\"description\":\"The Jupyter Notebook on HDInsight Spark clusters is useful when you need to quickly explore data sets, perform trend analysis, or try different machine learning models. Not being able to track the status of Spark jobs and intermediate data can make it difficult for data scientists to monitor and optimize what they are doing inside the Jupyter Notebook.\",\"breadcrumb\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#primaryimage\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/c87efc15-c26e-47d0-8f0e-07568d50ab49.gif\",\"contentUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/c87efc15-c26e-47d0-8f0e-07568d50ab49.gif\",\"width\":1037,\"height\":632,\"caption\":\"graphical user interface, text, application, email\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog home\",\"item\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Analytics\",\"item\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/category\/analytics\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Understanding HDInsight Spark jobs and data through visualizations in the Jupyter Notebook\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#website\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\",\"name\":\"Microsoft Azure Blog\",\"description\":\"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.\",\"publisher\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\",\"name\":\"Microsoft Azure Blog\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp\",\"contentUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp\",\"width\":512,\"height\":512,\"caption\":\"Microsoft Azure Blog\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/microsoftazure\",\"https:\/\/x.com\/azure\",\"https:\/\/www.instagram.com\/microsoftdeveloper\/\",\"https:\/\/www.linkedin.com\/company\/16188386\",\"https:\/\/www.youtube.com\/user\/windowsazure\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/person\/c702e5edd662b328b49b7e1180cab117\",\"name\":\"shakir\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g7664e653ea371ce16eaf75e9fa8952c4\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g\",\"caption\":\"shakir\"},\"sameAs\":[\"https:\/\/azure.microsoft.com\"],\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/shakir\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Understanding HDInsight Spark jobs and data through visualizations in the Jupyter Notebook | Microsoft Azure Blog","description":"The Jupyter Notebook on HDInsight Spark clusters is useful when you need to quickly explore data sets, perform trend analysis, or try different machine learning models. Not being able to track the status of Spark jobs and intermediate data can make it difficult for data scientists to monitor and optimize what they are doing inside the Jupyter Notebook.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/","og_locale":"en_US","og_type":"article","og_title":"Understanding HDInsight Spark jobs and data through visualizations in the Jupyter Notebook | Microsoft Azure Blog","og_description":"The Jupyter Notebook on HDInsight Spark clusters is useful when you need to quickly explore data sets, perform trend analysis, or try different machine learning models. Not being able to track the status of Spark jobs and intermediate data can make it difficult for data scientists to monitor and optimize what they are doing inside the Jupyter Notebook.","og_url":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/","og_site_name":"Microsoft Azure Blog","article_publisher":"https:\/\/www.facebook.com\/microsoftazure","article_published_time":"2019-04-29T00:00:00+00:00","article_modified_time":"2025-06-21T12:30:04+00:00","og_image":[{"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/c87efc15-c26e-47d0-8f0e-07568d50ab49.gif","type":"","width":"","height":""}],"author":"Ruixin Xu","twitter_card":"summary_large_image","twitter_creator":"@azure","twitter_site":"@azure","twitter_misc":{"Written by":"Ruixin Xu","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#article","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/"},"author":[{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/ruixin-xu\/","@type":"Person","@name":"Ruixin Xu"}],"headline":"Understanding HDInsight Spark jobs and data through visualizations in the Jupyter Notebook","datePublished":"2019-04-29T00:00:00+00:00","dateModified":"2025-06-21T12:30:04+00:00","mainEntityOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/"},"wordCount":401,"commentCount":0,"publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/c87efc15-c26e-47d0-8f0e-07568d50ab49.gif","articleSection":["Analytics"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/","name":"Understanding HDInsight Spark jobs and data through visualizations in the Jupyter Notebook | Microsoft Azure Blog","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#primaryimage"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/c87efc15-c26e-47d0-8f0e-07568d50ab49.gif","datePublished":"2019-04-29T00:00:00+00:00","dateModified":"2025-06-21T12:30:04+00:00","description":"The Jupyter Notebook on HDInsight Spark clusters is useful when you need to quickly explore data sets, perform trend analysis, or try different machine learning models. Not being able to track the status of Spark jobs and intermediate data can make it difficult for data scientists to monitor and optimize what they are doing inside the Jupyter Notebook.","breadcrumb":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#primaryimage","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/c87efc15-c26e-47d0-8f0e-07568d50ab49.gif","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/04\/c87efc15-c26e-47d0-8f0e-07568d50ab49.gif","width":1037,"height":632,"caption":"graphical user interface, text, application, email"},{"@type":"BreadcrumbList","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/understanding-hdinsight-spark-jobs-and-data-through-visualizations-in-the-jupyter-notebook\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog home","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/"},{"@type":"ListItem","position":2,"name":"Analytics","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/category\/analytics\/"},{"@type":"ListItem","position":3,"name":"Understanding HDInsight Spark jobs and data through visualizations in the Jupyter Notebook"}]},{"@type":"WebSite","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","name":"Microsoft Azure Blog","description":"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.","publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/azure.microsoft.com\/en-us\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization","name":"Microsoft Azure Blog","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","width":512,"height":512,"caption":"Microsoft Azure Blog"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/microsoftazure","https:\/\/x.com\/azure","https:\/\/www.instagram.com\/microsoftdeveloper\/","https:\/\/www.linkedin.com\/company\/16188386","https:\/\/www.youtube.com\/user\/windowsazure"]},{"@type":"Person","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/person\/c702e5edd662b328b49b7e1180cab117","name":"shakir","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g7664e653ea371ce16eaf75e9fa8952c4","url":"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g","caption":"shakir"},"sameAs":["https:\/\/azure.microsoft.com"],"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/shakir\/"}]}},"msxcm_display_generated_audio":false,"msxcm_animated_featured_image":null,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Azure Blog","distributor_original_site_url":"https:\/\/azure.microsoft.com\/en-us\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/1369","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/comments?post=1369"}],"version-history":[{"count":2,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/1369\/revisions"}],"predecessor-version":[{"id":42750,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/1369\/revisions\/42750"}],"wp:attachment":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/media?parent=1369"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/categories?post=1369"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tags?post=1369"},{"taxonomy":"audience","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/audience?post=1369"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/content-type?post=1369"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/product?post=1369"},{"taxonomy":"tech-community","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tech-community?post=1369"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/topic?post=1369"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/coauthors?post=1369"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}