Skip to main content
Azure

Data Pipeline pricing

Hybrid data integration at enterprise scale, made easy

Explore a range of data integration capabilities to fit your scale, infrastructure, compatibility, performance, and budget needs—from managed SQL Server Integration Services for seamless migration of SQL Server projects to the cloud, to large-scale, serverless data pipelines for integrating data of all shapes and sizes.

Explore pricing options

Apply filters to customize pricing options to your needs.

Prices are estimates only and are not intended as actual price quotes. Actual pricing may vary depending on the type of agreement entered with Microsoft, date of purchase, and the currency exchange rate. Prices are calculated based on US dollars and converted using London closing spot rates that are captured in the two business days prior to the last business day of the previous month end. If the two business days prior to the end of the month fall on a bank holiday in major markets, the rate setting day is generally the day immediately preceding the two business days. This rate applies to all transactions during the upcoming month. Sign in to the Azure pricing calculator to see pricing based on your current program/offer with Microsoft. Contact an Azure sales specialist for more information on pricing or to request a price quote. See frequently asked questions about Azure pricing.

Pricing for Data Pipeline is calculated based on:

  • Pipeline orchestration and execution
  • Data flow execution and debugging
  • Number of Data Factory operations such as create pipelines and pipeline monitoring

Data Factory Pipeline Orchestration and Execution

Pipelines are control flows of discrete steps referred to as activities. You pay for data pipeline orchestration by activity run and activity execution by integration runtime hours. The integration runtime, which is serverless in Azure and self-hosted in hybrid scenarios, provides the compute resources used to execute the activities in a pipeline. Integration runtime charges are prorated by the minute and rounded up.

For example, the Azure Data Factory copy activity can move data across various data stores in a secure, reliable, performant, and scalable way. As data volume or throughput needs grow, the integration runtime can scale out to meet those needs.

Type Azure Integration Runtime Price Azure Managed VNET Integration Runtime Price Self-Hosted Integration Runtime Price
Orchestration1 $- per 1,000 runs $- per 1,000 runs $- per 1,000 runs
Data movement Activity2 $-/DIU-hour $-/DIU-hour $-/hour
Pipeline Activity3 $-/hour $-/hour
(Up to 50 concurrent pipeline activities)
$-/hour
External Pipeline Activity4 $-/hour $-/hour
(Up to 800 concurrent pipeline activities)
$-/hour
1Orchestration refers to activity runs, trigger executions, and debug runs.
2Use of the copy activity to egress data out of an Azure datacenter will incur additional network bandwidth charges, which will show up as a separate outbound data transfer line item on your bill. Learn more about outbound data transfer pricing.
3Pipeline activities execute on integration runtime. Pipeline activities include Lookup, Get Metadata, Delete, and schema operations during authoring (test connection, browse folder list and table list, get schema, and preview data).
4External pipeline activities are managed on integration runtime but execute on linked services. External activities include Databricks, stored procedure, HDInsight activities, and many more. Please refer here for a complete list of external activities. For Mapping Data Flow activity please refer to the “Data Factory Data Flow Execution and Debugging” section below.

Data Flow Execution and Debugging

Data Flows are visually-designed components inside of Data Factory that enable data transformations at scale. You pay for the Data Flow cluster execution and debugging time per vCore-hour. The minimum cluster size to run a Data Flow is 8 vCores. Execution and debugging charges are prorated by the minute and rounded up. Change Data Capture artifacts are billed at General Purpose rates for 4-vCore clusters during public preview of CDC.

Change Data Capture (CDC) objects execute on the same data flow compute infrastructure using a single node 4 vCore machine. The same Data Flow Reserved Instance pricing discount also applies to CDC resources.

Type Price One Year Reserved
(% Savings)
Three Year Reserved
(% Savings)
General Purpose $- per vCore-hour $- per vCore-hour $- per vCore-hour
Memory Optimized $- per vCore-hour $- per vCore-hour $- per vCore-hour

Note: Data Factory Data Flows will also bill for the managed disk and blob storage required for Data Flow execution and debugging.

Azure Data Factory workflow orchestration manager

Size Workflow Capacity Scheduler vCPU Worker vCPU Web Server vCPU Price Per Hour
Small (D2v4) Up to 50 DAGs 2 2 2 $-
Large (D4v4) Up to 1,000 DAGs 4 4 4 $-

Additional node Worker vCPU Price Per Hour
Small (D2v4) 2 $-
Large (D4v4) 4 $-

Data Factory Operations

Type Price Examples
Read/Write* $- per 50,000 modified/referenced entities Read/write of entities in Azure Data Factory*
Monitoring $- per 50,000 run records retrieved Monitoring of pipeline, activity, trigger, and debug runs**
*Read/write operations for Azure Data Factory entities include create, read, update, and delete. Entities include datasets, linked services, pipelines, integration runtime, and triggers.
**Monitoring operations include get and list for pipeline, activity, trigger, and debug runs.

The pricing for Data Factory usage is calculated based on the following factors:

  • The frequency of activities (high or low). A low frequency activity does not execute more than once in a day (for example, daily, weekly, monthly); a high-frequency activity executes more than once in a day (for example, hourly, every 15 mins). See Orchestration of activities section below for details.
  • Where the activities run (cloud or on-premises). See Data Movement section below.
  • Whether a pipeline is active or not. See Inactive Pipelines section below.
  • Whether you are re-running an activity. See Re-running activities section below.

Orchestration of activities

Low frequency High frequency
Activites running in the cloud

(examples: copy activity moving data from an Azure blob to an Azure SQL database; hive activity running hive script on an Azure HDInsight cluster).
$- per activity per month $- per activity per month
Activities running on-premises and involving a self-hosted Integration Runtime

(examples: copy activity moving data from an on-premises SQL Server database to Azure blob; stored procedure activity running a stored procedure in an on-premises SQL Server database).
$- per activity per month $- per activity per month

Notes:

  • Usage beyond 100 activities/month will receive a 20% discount for both low frequency and high frequency scenarios.
  • First 5 low frequency activities in a month are free in both cloud and on-premises variants.

Data Movement

Azure Data Factory can copy data between various data stores in a secure, reliable, performant and scalable way. As your volume of data or data movement throughput needs grow, Azure Data Factory can scale out to meet those needs. Refer to the Copy Activity Performance Guide to learn about leveraging data movement units to boost your data movement performance.

Data Movement between Cloud data stores $- per hour
Data Movement when an on-premises store is involved $- per hour
Note:
You may incur data transfer charges, which will show up as a separate outbound data transfer line item on your bill. Outbound data transfer charges are applied when data goes out of Azure data centers. See Data Transfers Pricing Details for more information.

Inactive Pipelines

You must specify an active data processing period using a date/time range (start and end times) for each pipeline you deploy to the Azure Data Factory. The pipeline is considered as active for the specified period even if its activities are not actually running. It is considered as inactive at all other times.

An inactive pipeline is charged at $- per month.

Pipelines that are inactive for an entire month are billed at the applicable "inactive pipeline" rate for the month. Pipelines that are inactive for a portion of a month are billed for their inactive periods on a prorated basis for the number of hours they are inactive in that month. For example, if a pipeline has a starting date and time of January 1, 2016 at 12:00 AM and an ending date and time of January 20, 2016 at 12:00 AM, the pipeline is considered active for those 20 days and inactive for 11 days. The charge for inactive pipeline ($-) is prorated for 11 days.

If a pipeline does not have an active data processing period (a start and end time) specified, it is considered inactive.

Re-running activities

Activities can be re-run if needed (for example, if the data source was unavailable during the scheduled run). The cost of re-running activities varies based on the location where the activity is run. The cost of re-running activities in the cloud is $- per 1,000 re-runs. The cost of re-running activities on-premises is $- per 1,000 re-runs.

Example

Suppose you have a data pipeline with the following two activities that run once a day (low-frequency):

  1. A Copy activity that copies data from an on-premises SQL Server database to an Azure blob.
  2. A Hive activity that runs a hive script on an Azure HDInsight cluster.

Assume that it takes 2 hours in a day to move data from on-premises SQL Server database to Azure blob storage. The following table shows costs associated with this pipeline:

First activity (copying data from on-premises to Azure)
Data Movement Cost (per month) 30 days per month
2 hours per day
$-
$-
Orchestration of Activities Cost (per month) $-
Subtotal (per month) $-
Second activity (a Hive script running on Azure HDInsight)
Data Movement Cost (per month) $-
Orchestration of Activities Cost (per month) $-
Subtotal (per month) $-
Total activities (per month) $-

You can also use the Data Factory Pricing Calculator to calculate charges for this scenario.

Notes:

  • There is no charge for the first five cloud and on-premises activities. The charges mentioned above assume that you have already used five cloud and five on-premises activities per month (in other pipelines).
  • Azure Storage and HDInsight services are billed separately at their per service rates.

Azure pricing and purchasing options

Connect with us directly

Get a walkthrough of Azure pricing. Understand pricing for your cloud solution, learn about cost optimization and request a custom proposal.

Talk to a sales specialist

See ways to purchase

Purchase Azure services through the Azure website, a Microsoft representative, or an Azure partner.

Explore your options

Additional resources

Azure Data Factory

Learn more about Azure Data Factory features and capabilities.

Pricing calculator

Estimate your expected monthly costs for using any combination of Azure products.

SLA

Review the Service Level Agreement for Azure Data Factory.

Documentation

Review technical tutorials, videos, and more Azure Data Factory resources.

Azure Data Factory V2

  • Read/write operations include create, read, update, and delete Azure Data Factory entities. Entities include datasets, linked services, pipelines, integration runtime, and triggers.
  • Monitoring operations include get and list for pipeline, activity, trigger, and debug runs.
  • An activity is a step within a pipeline. The execution of each activity is called a run.
  • An integration runtime is the compute infrastructure used by Azure Data Factory to provide the following data integration capabilities across different network environments:

    • Data movement: Transfer of data between data stores in public and private (on-premise or virtual private) networks, providing support for built-in connectors, format conversion, column mapping, and performant and scalable data transfer.
    • Activity dispatch: Dispatching and monitoring of transformation activities running on a variety of compute services, such as Azure HDInsight, Azure Machine Learning, Azure SQL Database, SQL Server, and others.
    • SQL Server Integration Services package execution: Native execution of SQL Server Integration Service packages in a managed Azure compute environment.
  • A trigger is a unit of processing that determines when a pipeline execution needs to be initiated. A trigger run is the execution of a trigger, which may produce an activity run if the conditions are satisfied.
  • A debug run is a test run that a user can perform during iterative development to ensure the steps in the pipeline are working as intended before changes are published to the data factory.
  • An inactive pipeline is one that’s not associated with a trigger and that has zero runs within a month. A charge is incurred after one month of zero runs.
  • Pipeline execution activities (Azure integration runtime data movement, pipeline activities, external and self-hosted integration runtime data movement, pipeline activities, and external) are billed at the hourly rate shown above. Pipeline execution charges are prorated by the minute and rounded up.

    For example: If you run an operation that takes 2 minutes and 20 seconds, you will be billed for 3 minutes.

  • Find scenario-based pricing examples on the Azure Data Factory Documentation page.
  • Check out guidance on how to plan and manage ADF costs on the Azure Data Factory domination page.

Azure Data Factory V1

  • Activities define the actions to perform on your data. Each activity takes zero or more datasets as inputs and produces one or more datasets as output. An activity is a unit of orchestration in Azure Data Factory.

    For example, you may use a Copy activity to orchestrate copying data from one dataset to another. Similarly, you may use a Hive activity to run a Hive query on an Azure HDInsight cluster to transform or analyze your data. Azure Data Factory provides a wide range of data transformation and data movement activities. You may also choose to create a custom .NET activity to run your own code.

  • A pipeline is a logical grouping of activities. Pipelines can be active for a user-specified period of time (start and end times). Pipelines are inactive at all other times.
  • Yes. If the Activity uses Azure services such as HDInsight, those services are billed separately at their per service rates.

  • There are two sets of costs incurred when you perform a data copy. First, the compute resources that are used for performing the copy are represented by the data movement meter. There are cloud and on-premises versions of the data movement meter, and on-premises data movement is less expensive because a portion of the compute associated with the copy is performed by your own on-premises resources. Data movement charges are prorated by the minute and rounded up. (For example, a data copy taking 41 minutes 23 seconds of compute time will result in a charge for 42 minutes).

    Second, you may incur data transfer charges, which will show up as a separate outbound data transfer line item on your bill. Outbound data transfer charges are applied when data goes out of Azure data centers. See Data Transfers Pricing Details for more information.

Talk to a sales specialist for a walk-through of Azure pricing. Understand pricing for your cloud solution.

Get free cloud services and a $200 credit to explore Azure for 30 days.

Added to estimate. Press 'v' to view on calculator
Can we help you?