Microsoft Academic Graph PySpark Samples
PySpark examples running on Azure Databricks to analyze sample Microsoft Academic Graph Data on Azure storage.
Prerequisites
Before running these examples, you need to complete the following setups:
Setting up provisioning of Microsoft Academic Graph to an Azure blob storage account. See Get Microsoft Academic Graph on Azure storage.
Setting up Azure Databricks service. See Set up Azure Databricks.
Gather the information that you need
Before you begin, you should have these items of information:
✔️ The name of your Azure Storage (AS) account containing MAG dataset from Get Microsoft Academic Graph on Azure storage.
✔️ The access key of your Azure Storage (AS) account from Get Microsoft Academic Graph on Azure storage.
✔️ The name of the container in your Azure Storage (AS) account containing MAG dataset.
✔️ The name of the output container in your Azure Storage (AS) account.
Quickstart
git clone https://github.com/Azure-Samples/microsoft-academic-graph-pyspark-samples.git
Follow instructions in PySpark analytics samples for Microsoft Academic Graph to run PySpark scripts in this repository.