This week I’ve been talking about Azure Data Factory. In today’s post I’d like to talk about the much-awaited Azure Data Factory Integration Runtime. The compute infrastructure provides data movement, connectors, data conversions and data transfers, as well as activity dispatching, so you can dispatch and monitor activities running HDInsight, SQL DB or DW and more.
A big part of V2 is that you can now lift and shift your SSIS packages up into Azure and run them from your Azure data portal inside of Data Factory. There are 3 integration runtime types:
1. Azure Integration Runtime – This is clearly set up in Azure and you would use this if you’re going to be copying between two cloud data sources.
2. Self-Hosted Integration Runtime – This can run a copy of transformation activities between cloud and on-premises, including moving data from an IaaS virtual machine. Use this if you’re going to be copying between a cloud source and an on-prem, private network source.
So, if you’re environment is behind a firewall, not in the public cloud and you want to move data from your environment to Azure and the gateway will not work for you. Also, because that IaaS virtual machine is isolated, and you can’t get into that data storage, you would set up that integration between sites.
3. SSIS Integration Runtime – Use this when you’re lifting and shifting your SSIS packages into Azure Data Factory. But a key thing to mention is that this does not yet support third party tools for SSIS, but that support will be added eventually.
Where are these located? Azure Data Factories are located in limited regions at this time, but they can access data stores and compute services globally. With Azure Integration Runtime, the location of that runtime will define the backend compute resources where those are being used. This is optimized for data compliance, efficiency and reduced network egress costs, to ensure that they’re using the best services available in the region that is needed.
The Self-Hosted Runtime is installed in the environment in that private network. The SSIS Integration Runtime is determined based on where the SQL DB or managed instance is hosting that SSIS DB catalog. It is currently limited where it can be located, but it does not have to be in the same place as the Data Factory. It will be as close to the data sources as possible and will run as optimally as it can.