How to restrict access to Azure blob storage from HDInsight by using shared access signatures. This sample spans HDInsight and Azure Storage, and samples are provided for dotnet and python.
You can use either the SASExample solution (C#) or SASToken.py (Python) to retrieve a Shared Access Signature (SAS) for an existing Azure Blob Storage account.
Open the project in Visual Studio. It's contained in the
CSharp directory of this repository.
Right click on the project in Solution Explorer, then select properties.
In properties, select Settings.
In settings, populate the following entries:
myaccountis the name of your storage account and
mykeyis the key for the storage account.
sample.logfile in the
sampledatafolder of this project that can be used.
Run the project. It will open a console window and display the SAS token created using the policy. This can be used to provide read and list access to the container. Save the token for later use.
Note: This currently requires 0.32.0 of the Azure Storage SDK for Python.
SASToken.py file (in the
Python directory of this repository,) and change the following values:
Run the script. It will display the SAS token created using the policy. This can be used to provide read and list access to the container. Save the token for later use.
HDInsightSAS.ps1 from the
CreateCluster directory of this repository.
Replace the following values:
Save the file after you have made changes.
Open a PowerShell prompt and authenticate to your Azure subscription:
Run the script from the PowerShell Prompt.
It will take around 15 minutes to complete the cluster creation process.
If you have an existing Linux-based HDInsightr cluster, you can update it to use the SAS secured storage.
2 From the left side of the Ambari web UI, select HDFS and then select the Configs tab in the middle of the page.
4 Expand the Custom core-site section, then scroll to the end and select the Add property... link. Use the following values for the Key and Value fields:
Key: fs.azure.sas.CONTAINERNAME.STORAGEACCOUNTNAME.blob.core.windows.net Value: The SAS returned by the C# or Python application you ran previously Replace CONTAINERNAME with the container name you used with the C# or SAS application. Replace STORAGEACCOUNTNAME with the storage account name you used.
Click the Add button to save this key and value, then click the Save button to save the configuration changes. When prompted, add a description of the change ("adding SAS storage access" for example,) and then click Save.
Click OK when the changes have been completed.
This saves the configuration changes, but you must restart several services before the change takes effect.
In the Ambari web UI, select HDFS from the list on the left, and then select Restart All from the Service Actions drop down list on the right. When prompted, select Turn on maintenance mode and then select __Conform Restart All".
Repeat this process for the MapReduce2 and YARN entries from the list on the left of the page.
Once these have restarted, select each one and disable maintenance mode from the Service Actions drop down.
To verify that you have restricted access, use the following methods:
For Windows-based HDInsight clusters, use Remote Desktop to connect to the cluster. See Connecto to HDInsight using RDP for more information.
Once connected, use the Hadoop Command Line icon on the desktop to open a command prompt.
For Linux-based HDInsight clusters, use SSH to connect to the cluster. See one of the following for information on using SSH with Linux-based clusters:
Once connected to the cluster, use the following steps to verify that you can only read and list items on the SAS storage account:
From the prompt, type the following. Replace SASCONTAINER with the name of the container created for the SAS storage account. Replace SASACCOUNTNAME with the name of the storage account used for the SAS:
hdfs dfs -ls wasb://SASCONTAINER@SASACCOUNTNAME.blob.core.windows.net/
This will list the contents of the container, which should include the file that was uploaded when the container and SAS was created.
Use the following to verify that you can read the contents of the file. Replace the SASCONTAINER and SASACCOUNTNAME as in the previous step. Replace FILENAME with the name of the file displayed in the previous command:
hdfs dfs -text wasb://SASCONTAINER@SASACCOUNTNAME.blob.core.windows.net/FILENAME
This will list the contents of the file.
Use the following to download the file to the local file system:
hdfs dfs -get wasb://SASCONTAINER@SASACCOUNTNAME.blob.core.windows.net/FILENAME testfile.txt
This will download the file to a local file named testfile.txt.
Use the following to upload the local file to a new file named testupload.txt on the SAS storage:
hdfs dfs -put testfile.txt wasb://SASCONTAINER@SASACCOUNTNAME.blob.core.windows.net/testupload.txt
You will receive a message similar to the following:
This error occurs because the storage location is read+list only. Use the following to put the data on the default storage for the cluster, which is writable:
hdfs dfs -put testfile.txt wasb:///testupload.txt
This time, the operation should complete successfully.