Category Archives: Azure Data Factory

Most Important Components of Azure Data Factory

Are you new to Azure and not know what Azure Data Factory is? Azure Data Factory is Microsoft’s cloud version of an ETL or ELT tool that helps you get your data from one place to another and to transform it. Today, I’d like to tell you about the high-level components within Azure Data Factory. These components pull together a data factory that helps your data flow from its source and have an ultimate end-product for consumption.

  • Pipeline – A pipeline is a logical grouping of activities that performs a grouping of work. An example of an activity may be: you’re copying on-premise data from one data source to the cloud (Azure Data Lake for instance), you then want to run it through an HDI Hadoop cluster for further processing and analysis and put it into a reporting area. The components will be contained inside the pipeline and would be chained together to create a sequence of events, depending upon your specific requirement.
  • Linked Service – This is very similar to the concept of a connection string in SQL Server, where you’re saying what is the source and destination of your data.
  • Trigger – A trigger is a unit of processing that determines when a pipeline needs to be run. These can be scheduled or set off (triggered) by a different event.
  • Parameter – Essentially, the information you can store inside a pipeline that will pass in an argument when you need to fill in what that dataset or linked service is.
  • Control Flow – The control flow in a data factory is what’s orchestrating how the pipeline is going to be sequenced. This includes activities you’ll be performing with those pipelines, such as sequencing, branching and looping.

Top 5 Takeaways from the Microsoft ICA Boot Camp

I was a recent attendee at the Microsoft International Cloud Architect Boot Camp, where I had the opportunity to participate in hands-on sessions, working closely with Microsoft teams and specialists, as well as other Microsoft Partners. This boot camp contained exclusive content that Pragmatic Works gets access to as a partner and as a preferred service within the Microsoft stack.

Here, I’d like to share my top 5 takeaways from this event:

1. Commitment to Security – As a cloud solution architect, I’m asked many questions around security and Microsoft Azure. One thing that amazed me was the commitment that Microsoft has made to security. They spend over a billion dollars each year on security to ensure they are secure from all threats. Microsoft is also the #1 attack to surface in the world. They are truly committed to making sure that your data and surfaces are secure.

2. Security Certifications – Microsoft has passed over 70 regulatory and government certifications when it comes to security and standardized processes. Their second-place competitor, AWS, has only completed 44 of these certifications. Getting these certifications and adhering to certain security and regulatory standards can be expensive, but there is a significant benefit for enterprise, government and small/medium-sized businesses.

3. Right-sizing Their Environment – This can be a challenge for many companies. Microsoft’s internal teams have gone completely to Azure and are managing their platforms within Azure for SQL databases, virtual machines and all other services Azure offers. By doing some specific right-sizing and keeping watch on what’s offered, they lowered their workloads and kept their CPU at the 95th percentile, and more importantly, they were able cut down on spending for their internals needs – to the tune of over 3 million dollars a month!

4. Differentiators from AWS – AWS is currently the #1 cloud platform as far as revenue and volume. But Microsoft is quickly catching up and they’ve identified several differentiators from AWS. Some key differentiators, such as Azure Recovery Zones and other such services, which have been slow to come up, will have releases to general audiences by the end of 2018. MS does not see any other differentiators that will allow AWS to continue to hold that lead.

5. Connections/Partnerships – By having Office 365, Dynamics 365, and Skype and LinkedIn connections, as well as the commitments to partners and ISVs, gives Microsoft a competitive advantage over AWS in what their ecosystem looks like. A common complaint heard is how AWS doesn’t work well with, or cater to, partners, leaving them to figure it out themselves.

Introduction to Azure Data Factory

Introduction to Azure Data Factory

Are you new to Azure, or looking to make the move and curious about what Azure Data Factory is? Azure Data Factory is Microsoft’s cloud version of a data integration and orchestration tool. It allows you to move your data from one place to another and make changes with it. Here, I’d like to introduce the 4 main steps or components of how Azure Data Factory works.

Step 1 – Connect and Collect

Connect and collect is where you define where you’re pulling your data from, such as SQL databases, web applications, SSAS, etc. You collect that data into one centralized location like Azure Data Lake or Azure Blob Storage.

Step 2 – Transform and Enrich

In this step, you take the data from your centralized storage and enrich it to further expand on your data using HDInsight operation, Spark or Data Lake analytics, for example.

Step 3 – Publish

Next is to publish the data to a place that it can be better used and consumed by the end users. Any BI tool, such as Power BI or reporting services are great choices.

Step 4 – Monitor

 

This last step is to monitor the data to be sure jobs are running, and data is flowing, properly. It’s also important to monitor to ensure data quality. Monitoring can be done with tools like PowerShell, Microsoft Operations Manager or Azure Monitor, which allow you to monitor inside the Azure portal.