Sharing Integration Runtimes Among Azure Data Factories

In this post I’ll talk about self-hosted integration runtimes and the ability to share them across Data Factories. Also, I’ll tell you about a new capability that was announced in the Azure Data Factory space.

The integration runtime is essentially the connector that allows you to connect back to your on premises environment and safely and securely move data between Azure and that on-prem environment with Data Factories. This is a dedicated application for Azure Data Factory that’s similar to the on premises Data Gateway.

Here’s where this new feature helps. Until now, Data Factories could not share integration runtimes, therefore, you needed to set up different Data Factories to connect back to on-prem data, databases or flat files, etc. Also, you would have to set up individual integration runtimes for those various Data Factories or pipelines going across multiple Data Factories.

With this newly announce feature comes some new terminology

  • Shared integration runtime – is basically the standard integration runtime you’re used to, however now it can be shared
  • Linked integration runtime – when a shared integration runtime is shared, it will have linked integration runtimes and have a sub-type that’s shared with other Data Factories.

So, you’ll have your main shared integration runtime and on top of that you’ll have a linked integration runtime, which is a linked integration runtime that references the infrastructure of another self-hosted IR. That link points back to a shared IR and allows you to share among multiple Data Factories.

With this straightforward process, you install the integration runtime in your environment, set up your linked service within your Azure Data Factory and then connect it through that linked service. Then you’re ready to pull the data that you need into the cloud and do transformations and push it out to Azure Data Warehouse, Azure Data Bricks, etc.

This cool new technology allows you to get your data to the cloud much easier and more efficiently and I highly recommend for all to try!

What is Azure Data Box and Data Box Disk?

Are you looking to move large amounts of data into Azure? How does doing it for free sound and with an easier process? Today I’m here to tell you how to do just that with the Azure Data Box.

Picture this: you have a ton of data, let’s say 50 terabytes on-prem, and you need to get that into Azure because you’re going to start doing incremental back ups of a SQL Database, for instance. You have two options to get this done.

First option is to move that data manually. Which means you have to chunk it, set it up using AZ copy or a similar Azure data tool, put it up in a blob storage, then extract it and continue with the process. Sounds pretty painful, right?

Your second option is to use Azure Data Box which allows you to move large chunks of data up into Azure. Here’s how simple it is:

  • You order the Data Box through Azure (currently available in the US and EU)
  • Once received, you connect it to your environment however you plan to move that data
  • It uses standard protocols like SMB and CIFS
  • You copy the data you want to move and return the Data Box back to Azure and then they will upload the data into your storage container(s)
  • Once the data is uploaded, they will securely erase that Data Box

With the Data Box you get:

  • 256-bit encryption
  • A super tough, hardened box that can withstand drops or water, etc.
  • It can be pushed into Azure Blob
  • You can copy data up to 10 storage accounts
  • There are two 1 gigabit/second and two 10 gigabit/second connections to allow quick movement of data off your network onto the box

In addition, Microsoft has recently announced the Data Box Disk, which is a small 8 terabyte disk that you can order up to five of as part of the Data Box Disk.

With Data Box Disc you get:

  • 35 terabytes of usable capacity per order
  • Supports Azure Blobs
  • A USB SATA 2 and 3 interface
  • Uses 128-bit encryption
  • Like Data Box, it’s a simple process to connect it, unlock it, copy the data onto the disk and it send it back to copy those into a single storage account for you

Here comes the best part—while Azure Data Box and Data Box Disk are in Preview, this is a free service. Yes, you heard it right, Microsoft will send you the Data Box or Data Box Disk for free and you can move your data up into Azure for no cost.

Sure, it will cost you money when you buy your storage account and start storing large sums of data, but storage is cheap in Azure, so that won’t break the bank.

 

What is Azure Virtual WAN?

In today’s post I’d like to talk about site to site networking service. Azure already has a site to site VPN service, but the Azure Virtual WAN is a newer service currently in Preview. This networking service is optimized for branch to service connectivity and offers the capability to use partner devices currently supplied by preferred partners (currently Riverbed and Cisco) or the ability to manually configure this connectivity with your environment.

Azure Virtual WAN has some big differences to consider:

  • Automated set up and configuration of these devices by preferred partners makes much easier to configure them. You simply set up these connections which you can export directly from the device into Azure and it automatically sets it up for you.
  • It is designed for large scalability and more through-put. The site to site service is great for smaller workloads but this new service opens the pipe and allows the data to crank through much faster.
  • It’s designed as a Hub and Spoke model. The Hub being Azure and the Spoke being your branch office – all managed within Azure.

Let’s look at the 4 main components of this service:

  • The Virtual WAN Service itself – This asset is where the resources are collected, and it represents a virtual overlay of the Azure network. Think of it as a top down view of the connectivity between all the components in Azure and in your offices.
  • A site represents the on premises VPN device and its settings. I mentioned those preferred devices from Riverbed and Sysco (with more to come) and if you’re using a supported device, you can easily drop that configuration into Azure.
  • The hub is the connection point in Azure for those sites. The site connects to the hub and the virtual WAN is overlooking all of these components.
  • The hub virtual network connection allows your connection point for your hub to your virtual network.

So, your hub and your virtual network are connected through that virtual network connection. This allows the communication between your virtual networks in Azure and your site to site virtual WAN.

This offering makes the landscape a bit different with how people are doing connectivity into Azure and connecting their remote offices by consolidating what that network looks like, as well as making it easier by offering these preferred devices.

Again, this is still in Preview but definitely something I would suggest checking out.

Informatica Enterprise Data Catalog in Azure

If you’re like many Azure customers, you’ve been on the look out for a data catalog and data lineage tool and want one with all the key capabilities you’re looking for. Today, I’d like to tell you more about the Informatica Data Catalog which was discussed briefly in a previous Azure Every Day post.

The Informatica tool helps you to analyze, consolidate and understand large volumes of metadata in your enterprise. It allows you to extract both physical and business metadata for objects and organize it based on business concepts, as well as view data lineage and relationships for each of those objects.

Sources include databases, data warehouses, business glossaries, data integration and Business Intelligence reports and more – anything data related. The catalog maintains an indexed inventory of all the dated objects or ‘assets’ in your enterprise such as tables, columns, reports, views and schemas.

Metadata and statistical information in the catalog include things like profile results, as well as info about data domains and data relationships. It’s really the who, what, when, where and how of the data in your enterprise.

Informatica Data Catalog can be use for tasks such as:

  • Find your scalable assets by being able to scour your network or cloud space to look for assets that aren’t cataloged.
  • View lineage for those assets, as well as relationships between assets.
  • Enrich assets by tagging them with additional attributes, possibly tag a specific report as a critical item.

These are lots of useful features in the Data Catalog. Some key ones are:

  • Data Discovery – Do a semantic search, dynamic filtering, data lineage and relationships for assets across your enterprise.
  • Data Classification – Automatically or manually annotate data classifications to help with governance and discovery – who should have access to what data and what does the data contain.
  • Resource Administration – Like resource, schedule and attribute management, as well as connection or profile configuration management. All the items that surround the data that help you manage the data and the metadata around it.
  • Create and edit reusable profile definition settings.
  • Monitor resources and tasks within your environment.
  • Data domain management where you can create and edit domains and the kind of groups you want to group together with like data and reports.
  • Assign logical data domains to data groups.
  • Build composite data domains for management purposes.
  • Monitor the status of tasks in progress and look at some transformation logic for assets.

On top of this, you can look at how frequently the data is accessed and how valuable it is to your business users; showing this type of information around your data so you can trim reports that aren’t being used for instance.

When we talk about modern data warehousing in the Azure cloud, this is something we’ve been looking for. It’s a useful and valuable tool for those who want those data governance and lineage tools.

Simplified Managed Disk Migration in Azure

Simplified Managed Disk Migration in Azure

In the past, migrating managed disks could be a bit of a challenge. Today I’d like to talk about how Azure has simplified the process. Microsoft recently added the ability to migrate the disks through their portal instead of having to use a command line interface or a PowerShell script.

First off, why would you want a managed disk over an unmanaged one?

  • Greater scalability due to much higher IOPs and storage limits. There’s no longer the need to add additional storage accounts when you’re adding disk space, which has been a challenge for users that were using large virtual machines and required large storage space.
  • Better availability and reliability which ensures that disks are now isolated from each other in different storage scale units.
  • Managed disks offer an over 99.99% uptime, plus are always stored with 3 replicas of the data.
  • More granular access control by employing role-based access control (RBAC) security. You have granular capability to assign access to various people in your organization.

Here’s how it works:

    • When looking at an overview of your VM if you’re using an unmanaged disk, you’ll see a ribbon or banner at the top alerting you that you’re not using managed disks and that you should. Sure, they cost a bit more, but the payback is better resiliency and reliability.
    • When you click on that banner, it will give you a wizard to walk you through how to perform that migration. It will also remind you that when you migrate, you can’t go back. Your virtual machine will remain unchanged, but you’ll want to take that into account.
    • It will reboot your VM once complete, so keep this in mind so you can plan to do this during off hours.
    • Another note, if your VM is in an availability set, you’ll be prompted to migrate that availability set first, then your migration.
    • Once you’re done and back up and running, you’ll see the new disks and the old unmanaged disks, even though they can’t be mounted. You can later clean those up and delete them.
    • You’ll have a disk for the OS and each data disk in that resource group and you’re ready to go, with more availability plus the comfort of knowing you’re running in a more continuity mode.

So, look at your virtual machines and do that migration when you have a chance. This great wizard-based feature makes it much easier. The reliability benefits will greatly outweigh the added cost.

 

Introducing Microsoft Azure Sphere

As we continually see more and more consumer-grade internet devices, like appliances, baby monitors, thermostats, etc., we need a more robust way to manage these devices in a secure way. There are approximately 9 billion of these devices built every year. These devices each have a tiny chip or microcontroller (MCU) that hosts the compute, the storage and the memory, as well as the operating system.

I’m excited to tell you about a newer offering called Microsoft Azure Sphere. As these consumer-grade devices grow exponentially, the problem is that some of these devices are not built in a secure way, making them easily susceptible to hacking. There have been plenty of news stories about devices that have been hacked and then used for malicious purposes.

Microsoft is not alone in recognizing this issue. But they jumped on this in 2015 and they began to develop a way that they believed was a good approach to secure these devices and created Microsoft Azure Sphere. This is a solution for creating highly secure internet connected microcontroller devices with

3 main components:

1. Azure Sphere Certified MCUs – They’ve got manufacturers that architect a solution that combines real time and application processors built onto this MCU, using built in Microsoft security technology and connectivity capabilities.

They used the experience, processes and lessons learned from looking at the X-Box consoles that have been built over the past 15 years and put that into the design of these chips. So, third party companies can build these chips by using these processes and be certified.

2. Azure Sphere Operating System – Once an MCU is certified, you can install this operating system which is intended to be super secure and agile to serve those MCU purposes, including layers of security from Windows, Linux and specific security monitoring software all built into that operating system.

3. Azure Sphere Security Service – This allows you to protect each device, but also allows secure communication from device to device or device to the cloud.

Similar to what we talk about with IoT and IoT Hub, but there is a certified way of doing it to ensure you’re using the architecture that will remain secure and support by Microsoft for years to come. And this will apply to 10s of 1000s of companies that are going to build devices for all the areas I’ve mentioned and more, giving them that secure platform to build them.

Azure Sphere was just announced earlier this year and there’s a long way to go. But we’ll soon start to see things like software and hardware development kits (which you can pre-order now from Microsoft) and really get a chance to understand how it works and how it’s going to offer better solutions in the coming years for these connected devices.

It’s a cool technology and however they brand it, whether it’s Azure Sphere MCU Certified or something else, it will give us the assurance that we’ll be getting a quality product without the danger of getting exposed.