Category Archives: Azure

Azure Data Factory Integration Runtimes

This week I’ve been talking about Azure Data Factory. In today’s post I’d like to talk about the much-awaited Azure Data Factory Integration Runtime. The compute infrastructure provides data movement, connectors, data conversions and data transfers, as well as activity dispatching, so you can dispatch and monitor activities running HDInsight, SQL DB or DW and more.

A big part of V2 is that you can now lift and shift your SSIS packages up into Azure and run them from your Azure data portal inside of Data Factory. There are 3 integration runtime types:

1. Azure Integration Runtime – This is clearly set up in Azure and you would use this if you’re going to be copying between two cloud data sources.

2. Self-Hosted Integration Runtime – This can run a copy of transformation activities between cloud and on-premises, including moving data from an IaaS virtual machine. Use this if you’re going to be copying between a cloud source and an on-prem, private network source.

So, if you’re environment is behind a firewall, not in the public cloud and you want to move data from your environment to Azure and the gateway will not work for you. Also, because that IaaS virtual machine is isolated, and you can’t get into that data storage, you would set up that integration between sites.

3. SSIS Integration Runtime – Use this when you’re lifting and shifting your SSIS packages into Azure Data Factory. But a key thing to mention is that this does not yet support third party tools for SSIS, but that support will be added eventually.

Where are these located? Azure Data Factories are located in limited regions at this time, but they can access data stores and compute services globally. With Azure Integration Runtime, the location of that runtime will define the backend compute resources where those are being used. This is optimized for data compliance, efficiency and reduced network egress costs, to ensure that they’re using the best services available in the region that is needed.

The Self-Hosted Runtime is installed in the environment in that private network. The SSIS Integration Runtime is determined based on where the SQL DB or managed instance is hosting that SSIS DB catalog. It is currently limited where it can be located, but it does not have to be in the same place as the Data Factory. It will be as close to the data sources as possible and will run as optimally as it can.

Azure Data Factory – Data Sets, Linked Services and Pipeline Executions

In my posts this week, I’ve been going a little deeper into Azure Data Factory (be sure to check out my others from Tuesday and Wednesday). I’m excited today to dig into getting data from one source into Azure.

First let’s talk about data sets. Consider a data set anything that you’re using as your input or output. An Azure blob data set is one example and is defined by the blob container itself, the folder, the file, the documentation, etc. In this case you can have that as your source or your destination.

There are many data set options, such as Azure Services (other data services or blobs, databases, data warehouses or Data Lake), as well as on premises databases like SQL Server, MySQL or Oracle. You also have your NoSQL components like Cassandra and MongoDB and you can get your files from an FTP server, Amazon S3 or internal file systems.

Lastly, you have your SaaS offerings like MS Dynamics, Salesforce or Marketo. There is a grid available in the Microsoft documentation which outlines what can be a source or a destination or both, in some cases.

Using Linked Services is the way that you link from your source to your destination. This defines the connection to the data source, whether it be the input or the output. Think of it like your connection string in SQL Server; you’re connecting to a specific place, whether you’re using that to output data from a source or input it to a destination.

Now, let’s look at Pipeline Executions. A pipeline is a collection of activities that you’ve built, and the executions run that pipeline moving the data from one place to another or to do some transformation with that data. There are 2 defined Pipeline Executions:

1. Manual (or On Demand) Executions. These can be triggered through the Azure portal, a REST API, as part of a PowerShell script, or as part of your .net application.

2. Setting Up Triggers. With this execution, you set up a trigger as part of your Data Factory. This was an exciting new change in Azure Data Factory V2. Triggers can be scheduled, so you can set a job to run at a particular time each day or you can set a tumbling window trigger. Using a tumbling window, you can set up your hours of operation (let’s say you want your data to run from 8-5, Monday through Friday, every hour on the hour). The tumbling window trigger runs continuously for the times/hours you’ve specified.

Doing this allows us to lessen our costs to run the job in Azure by using the compute time only when you need it, and not using compute time during downtimes like outside of business hours.

Azure Data Factory Pipelines and Activities

Yesterday’s Azure Every Day post covered how Azure Data Factory pricing works. In today’s post I’d like to go a bit deeper into Azure Data Factory Version 2 and review pipelines and activities. In essence, a pipeline is a logical grouping of activities. If you’re familiar with SSIS, think of an SSIS package being a grouping of activities that are happening with the data.

An example of a pipeline would look like: you want to pull data from a website, file server or database up into Azure and do some kind of transformation on that data, then report from it. Within the pipeline, multiple activities can be defined. If there’s no activity dependency on a set of activities – so you have one activity running and there’s no dependency on the next activity -then they can run in parallel.

This is good to keep in mind as you’re performing these activities because you may need to schedule them or figure out a way, so they don’t run in parallel or that one runs after another.

There are 3 main types of activities:

1. Data Movement Activities – This is the sources where you’re pulling in data from such as Azure Blob Storage, Azure Data Lake, Azure DB and DW. You can also set up an on premises gateway and pull in databases, such as commonly used DB2, MySQL, Oracle, SAP, Sybase and Teradata, as well as NoSQL databases like Cassandra and MongoDB.

I also mentioned files; you can pull from Amazon, S3, file systems, FTP, HTTP, etc. You also have the Software as a Service (SaaS) options: Dynamics, HubSpot, Marketo, QuickBooks, and Salesforce, to name a few. You can check a complete list on the Azure online documentation.

2. Data Transformation Activities – Here is where you’re taking your data after it’s ingested into Azure and doing something with it. Some common ones are HDInsight, HIVE, PIG, MapReduce, Hadoop Streaming and Spark transformations. These allow you to transform your big data in your Azure environment and stage it for your reporting.

Other common uses would be machine learning into an Azure VM, as well as stored procedures. You can have your stored procedures in SQL Server defined in Azure, and then run that stored procedure, and also use U-SQL for your Data Lake Analytics.

3. Control Activities – In these activities you can do things like execute your pipelines or run a ForEach statement or Look-up activities, the types of things where you’re controlling how the pipeline is working and interacting with the data.

 

The 5 Stages of Cloud Adoption

So, still not in the cloud and the thought of doing so feels like you’re taking a huge jump into unknown waters? We’re seeing more enterprises starting to dip their toe in the water with Microsoft Azure. Microsoft has shown their commitment to where they’re going with their cloud infrastructure and the growth has been tremendous.

Let’s look at the 5 stages or steps of cloud adoption to ease the fear of taking the leap:

Step 1 – Chaos – In most cases, there’s some chaotic event that makes businesses start looking at alternative ways to service their customers or their business. Maybe a server dies, or software comes to the end of life or support. The cloud then becomes a viable option and people start to consider it.

Step 2 – Awareness – Once the cloud is on the plate, people start by ramping up their cloud knowledge. They may start with training, hackathons, POCs or try to some hands-on opportunities, like setting up an Azure Active Directory to sync with their on-premises AD. Building knowledge leads to Step 3.

Step 3 – Security – Most companies get hung up with security concerns around the cloud. I can tell you that Microsoft has spent more money than any other company worldwide on security—over one billion spent in 2017. They are committed to making their customers security a top priority. Through their commitment they have 72 government and standardization certificates; their closest competitor, AWS, only has 44.

To overcome your fear, you need to realize that with Azure, you have an entire team of security experts watching your data and servers, as well as implementing best practices and creating new ways and policies to help companies avoid any kind of breach.

Step 4 – Governance – So, you’ve gotten over security concerns and have put your trust in the Azure public cloud, now you must develop best practices, policies and procedures around governance. The good news is when you start looking at service offerings, whether it’s PaaS, SaaS or IaaS, Microsoft has the best in class offerings and they’re managing a good portion of that security for you.

Step 5 – Optimization – Once you’ve got your environment in the cloud, how do you optimize it for performance and cost effectiveness? Take the time to choose the best services for your business and optimize your servers to minimize cost and run those servers in the best way; this can become a differentiator against your competitors.

Top 6 Business Drivers for Moving to Azure

If you haven’t made the move to Azure cloud, what’s stopping you? Many businesses either have moved or are starting to and I can tell you the top 6 reasons that are driving businesses to the cloud.

1. Business Growth – Are you experiencing new growth in your current sectors or looking into new sectors for your business? For those who are in the cloud, they saw the reasons for adopting the new cloud technologies and are already taking advantage of the business drivers discussed here. Think about how the cloud can help you, but don’t jump in with two feet. Plan or outline how the cloud can help your business; planning helps avoid cost overruns.

2. Efficiency – You can streamline processes, increase productivity, as well as deliver, both internally and externally, much faster.

3. Experience – The cloud gives you the opportunity to improve your customer and employee experience. You can deploy new technologies quickly and it gives you more of a landscape to offer new areas of engagement with existing clientele, or for new clients.

4. Agility – The flexibility of Azure public cloud will give you improved responsiveness from your internal departments, IT folks and internal consultants you may work with. You’ll also get Software as a Service (SaaS) technologies and you can spin up quicker solutions to help enable your workforce to get things done faster. And if you make a mistake, you’re not stuck with a software license purchase for years; simply swap out for a technology that’s a better fit.

5. Cost – Many have concerns about overspending. One misconception is that you’ll automatically cut costs. This is often not the case. Oftentimes, you’ll need to restructure you’re spending budgets. The good news is you’ll be spreading those costs out over time. You’re moving into a subscription model, so you’re only paying for what you’re using. You can buy what you need for service, turn off the service when you’re not using it but only use it as it works most effectively for your business.

6. Assurance – You’ll now know you have a ‘best in class’ infrastructure and much of the maintenance for your servers that IT had to take care of will be drastically reduced. Security takes a different approach. Everything is secure and on by default, like data or storage encryption, so you’ll have to go in manually to turn them off.

You can also rest assured knowing that Microsoft has already met most regulatory and policy requirements, allowing you to continue to move forward and offer more to your customers, regardless of what space they’re in.

Hybrid Identity Management with Azure Active Directory

With all the things organizations need to manage identity for – on-premises environments, mobile devices, laptops and other managed devices, plus our internal active directory systems – it’s becoming increasingly harder to manage. We are in a new world of mobile first, cloud first reality.

Here are a few stats to think about:

  • 63% of confirmed data breaches involve weak, default or stolen passwords
  • More than 80% of employees admit to using non-approved SaaS applications in their jobs
  • As we are trying to manage all this, IT budgets are barely growing – we’re seeing less than 1% growth year over year

In reality, those Software as a Service (SaaS) apps integrate nicely and enable users to be more efficient, but we must be able to manage all those identities. When a user comes into your environment, using all kinds of web applications with user accounts for each, and possibly access to a corporate credit card, then that person leaves the company or gets let go, it’s difficult to track all those if they are individually managed.

With Azure Active Directory, you can manage 1000s of apps with one identity, enable business without borders, as well as manage access to scale, plus you’re offering cloud-powered protection. With Azure AD at the core of your business, you are enabling identity as a control plane.

So, how does this look?

    • With Azure AD on your current on premises environment, you’ll want to link up with all those cloud applications (Azure, SaaS, Office 365, any public cloud).
    • In between, you’ve got Azure Active Directory, where you can easily sync that back with your on premises and then tie that into all those SaaS applications.
    • This allows you to offer self-service, single sign on to your users for all of those apps, plus any internal on premises areas you use with user names and passwords.
    • Everything will be synchronized across the landscapes and you can extend that out to your customers and partners as well.
    • This is a powerful way to enable your workforce, as well as sync with your customers and partners when you want them to have access to certain areas.

Simply put: 1000s of apps with one identity, using single sign on to any app using Microsoft Azure Active Directory. And to take it one step further, if you want to move any of your VMs up into Azure or any of your services up into a PaaS solution, you already have that integration and using your Azure AD domain services, you can set up your lift and shift that much easier.

 

An Overview of Azure File Sync

I have a question… Who is still using a file server? No need to answer, I know that most of us still are and need to use them for various reasons. We love them—well, we also hate them, as they are a pain to manage.

The pains with Windows File Server:

  • They never seem to have enough storage.
  • They never seem to be properly cleaned up; users don’t delete the files they’re supposed to.
  • The data never seems accessible when and where you need it.

In this blog, I’d like to walk you through Azure File Sync, so you can see for yourself how much better it is.

    • Let’s say I’m setting up a file server in my Seattle headquarters and that file server begins having problems, maybe I’m running out of space for example.
    • I decide to hook this up in a file share in Azure space.
    • I can set up cloud tiering and set up a threshold (say 50%), so that everything beyond that threshold, those files will start moving up into Azure.
    • When I set this threshold, it will start taking the oldest files and graying them out as far as users are concerned. The files are still there and visible as there, but they’ve been pushed off to the cloud, so that space has now been freed up on the file server.
    • If users ever need those files, they can click on them and redownload.
    • Now, let’s say I want to bring on another server at a branch office. I can simply bring up that server, synchronize it with the branch office based on those files in Azure.
    • From here, I can hook up my SMBs and NFS shares for my users and applications, as well as my work folders using multi-site technology. I have all my files synchronized and it’s going to give me direct cloud access to these files.
    • I can hook up my IaaS and PaaS solutions with my REST API or my SMB shares to be able to access these files.
    • With everything synchronized, I’m able to have a rapid file server disaster/data recovery. If my server in Seattle goes down, I simply remove it; my files are already up in Azure.
    • I bring on a new server, sync it back to Azure. My folders start to populate, and as they get used, people will download the files back and the rules that were set up will maintain.
    • The great thing is it can be used with SQL Server 2012 R2, as well as SQL Server 2016.
    • Now I have an all-encompassing solution (with integrated cloud back up within Azure) with better availability, better DR capability and essentially bottomless storage. Azure Backup Vault gets backed up automatically and storage is super cheap.

With Azure File Sync I get:

1. A centralize file service in Azure storage.

2. Cache in multiple locations for fast, local performance.

3.  I can utilize cloud based backup and fast data/disaster recovery.

Overview and Benefits of Azure Cognitive Services

With Artificial Intelligence and Machine Learning, the possibilities for your applications are endless. Would you like to be able to infuse your apps, websites and bots with intelligent algorithms to see, hear, speak, understand and interpret your user needs through natural methods of communication, all without having any data science expertise?

Continue reading Overview and Benefits of Azure Cognitive Services

What is Azure Cosmos DB?

Are you familiar with Azure Cosmos DB? Cosmos DB is Microsoft’s globally distributed, multi-model database. With the click of a button, it allows you to elastically and independently scale throughput and storage across any number of Azure’s geographic regions, so you can put the data where your customers are.

Cosmos DB has custom built APIs that allow you a multitude of data sources, like SQL Server, Mongo DB and Azure tables, as well as offering 5 consistency models. It offers comprehensive Service Level Agreements (SLAs) with money back guarantees for availability (99.99% to be exact), latency, consistency and throughput; a big deal when you need to serve your customers at optimum performance.

Cosmos DB is a great option for many different use cases:

  • Companies that are doing IOT and telematics. Cosmos DB can ingest huge bursts of data, and process and analyze that data in near real-time. Then it will automatically archive all the data it ingests.
  • Retail and Marketing. Take an auto parts product catalog, for example, with tons of parts within the catalog, each with its own properties (some unique and some shared across parts). The next year, new vehicles or new parts model come out, with some similar and different properties. All that data adds up very quickly. Cosmos DB offers a very flexible schema in a hierarchical structure that can easily change the data around as things change.
  • Gaming Industry. Games like Halo 5 by Microsoft are built on a Cosmos DB platform, because they need performance that is quickly and dynamically scalable. You’ve got things like millisecond read-times, which avoids any lags in game play. You can index player related data and it has a social graph database that’s easily implemented with flexible schema for all social aspects of gaming.

Azure Cosmos DB ensures that your data gets there and gets there fast, with a wealth of features and benefits to make your life easier. And it’s easy to set up and manage.