Are you new to Azure and not know what Azure Data Factory is? Azure Data Factory is Microsoft’s cloud version of an ETL or ELT tool that helps you get your data from one place to another and to transform it. Today, I’d like to tell you about the high-level components within Azure Data Factory. These components pull together a data factory that helps your data flow from its source and have an ultimate end-product for consumption.
- Pipeline – A pipeline is a logical grouping of activities that performs a grouping of work. An example of an activity may be: you’re copying on-premise data from one data source to the cloud (Azure Data Lake for instance), you then want to run it through an HDI Hadoop cluster for further processing and analysis and put it into a reporting area. The components will be contained inside the pipeline and would be chained together to create a sequence of events, depending upon your specific requirement.
- Linked Service – This is very similar to the concept of a connection string in SQL Server, where you’re saying what is the source and destination of your data.
- Trigger – A trigger is a unit of processing that determines when a pipeline needs to be run. These can be scheduled or set off (triggered) by a different event.
- Parameter – Essentially, the information you can store inside a pipeline that will pass in an argument when you need to fill in what that dataset or linked service is.
- Control Flow – The control flow in a data factory is what’s orchestrating how the pipeline is going to be sequenced. This includes activities you’ll be performing with those pipelines, such as sequencing, branching and looping.