All posts by cseferlis

Azure Common Data Services

June 19, 2018 cseferlis Leave a comment

What do you know about Azure Common Data Services? Today I’d like to talk about this product for apps which was recently re-done by Microsoft to expand upon the product’s vision. Common Data Services is an Azure-based business application platform that enables you to easily build and extend applications with your customer’s business data.

Common Data Services helps you bring together your data from across the Dynamics 365 Suite (CRM, AX, Nav, GP) and use this common data service to more easily extract data rather than having to get into the core of those applications. It also allows you to focus on building and delivering the apps that you want and insights and process automation that will help you run more efficiently. Plus it integrates nicely with PowerApps, Power BI and Microsoft Flow.

Some other key things:

If you want to build Power BI reports from your Dynamics 365 CRM data, there are pre-canned entities provided by Microsoft.
Data within the Common Data Services (CDS) is stored within a set of entities. An entity is just a set of records that’s used to store data, similar to how a table stores data within a database.
CDS should be thought of as a managed database service. You don’t have to create indexes or do any kind of database tuning; you’re not managing a database server as you would with SQL Server or a data warehouse. It’s designed to be somewhat of a centralized data repository from which you can report or do further things with.
PowerApps is quickly becoming a good replacement for things like Microsoft Access as it comes with along with functionality and feature sets. A common use for PowerApps is extending that data rather than having to dig into the background platform.
This technology is easy to use, to share and to secure. You set up your user account as you would with Azure Services, giving specific permissions/access based on the user.
It gives you the metadata you need based on that data and you can specify what kind of field or column you’re working with within that entity.
It gives you the ability of logic and validation; you can create business process rules and data flows from entity to entity or vice versa or from app to entity to PowerApps.
You can create workflows that automate business process, such as data cleansing or record updating; these workflows can run in the background without having to manage manually.
Gets good connectivity with Excel which makes it user friends for people comfortable with that platform.
For power users, there’s an SDK available for developers, which allows you to extend the product and build some cool custom apps.

I don’t think of this as a replacement for Azure SQL DW or DB but it does give you the capability to have table-based data in the cloud that has some nice hooks into the Dynamics 365 space, as well as outputting to PowerApps and Power BI.

Azure

Overview of Azure Data Catalog

June 13, 2018 cseferlis Leave a comment

In today’s post, I’ll give you an overview of Azure Data Catalog and an example of how you may use it in your organization. Azure Data Catalog is used to discover Azure data sources in your environment, as well as tell what those data sources are and describe the data sources that you’ve already found.

It provides the ability to add metadata and annotations around all Azure data. So, if you want to describe a column, a data source, or apply documentation or a schema, you can do all this in the Azure Data Catalog. It also provides a cloud-based service in which a data source can be registered.

The data remains in the existing location, but a copy of it’s metadata is added to the data catalog, as well as a reference to the data source location, so you’ll know where to find it when you need it. The metadata is also indexed, ensuring that each data source is easily discoverable through a search, and that it’s understandable to the users who discover it.

The primary purpose of registering data sources in the data catalog is the discovery and understanding of them. Enterprise users may need data for Business Intelligence, application development, data science or other tests where the right data is required. The Data Catalog discovery experience can be used to quickly find data that matches their needs.

Users can also understand the data to evaluate if it serves their purpose. The data is consumed by opening the data source in their tool of choice. At the same time, users can contribute to the catalog and the metadata or add annotations. They can register new data sources as well, which can be discovered by other users and understood and consumed by other users who have permission to do so. This is locked down by permission and can be secured with Active Directory.

Here’s a basic example of using Azure Data Catalog:

Let’s say we’re moving towards a self-service BI idea, whether it’s a data team or IT team setting up the data, so users can create their own dashboards in Power BI. The IT or data team has already secured the data by making sure users only have access to what they need/should have. Now the information workers and analysts can create their own reports, workbooks and dashboards without having any restrictions from IT.

As the new data gets created by workers and analysts, it can be challenging to provide information about the data, where it is for instance. Let’s say I save it into a SharePoint repository. I may not remember to tell everyone about it and even if I did, I’ll probably have to remind them 6 months from now. Obviously, this is ineffective and a big waste of time.

This is where Data Catalog comes in, as it gives the ability for the data creators to catalog and tag data, making it easier for all the users with permissions to find it. It can be registered in a centralized data catalog, it leaves the data where it came from, and users can go in and add annotations or tags or metadata that applies.

Azure Data Catalog is a great tool that we highly recommend for all your data projects in Azure!

ADFv2, Azure, Azure Data Factory, Review, Strategy, Training

Continuous Integration and Deployment Using Azure Data Factory

June 11, 2018 cseferlis Leave a comment

Today I’m excited to talk about one of the new releases in Azure that gives you continuous integration and deployment using Azure Data Factory. This new release is an Azure Data Factory visual interface that allows you to export any of your Data Factory components as an Azure Resource Manager (ARM) template.

When you do these exports from your Data Factory, it will generate 2 files. The template file, which will contain all the Data Factory metadata for the pipelines, data sets, etc., as well as a configuration file, which will contain environment parameters that will be different for each of your environments. So, if you’re going to create a development, a test and a production environment, each one will be different.

You also can specify things like storage containers, Databricks clusters, etc. After you’ve deployed this, you’re going to create a new factory for your environment. You’re also going to associate your Visual Studio team services get repository to that Data Factory, enabling source control versioning and collaboration uses.

Next, you’ll set up your Data Factory with VSTS. This is where all the developers can author data factory resources, such as pipelines, data sets and other components. Once you have this development area set up, developers can modify the resources and debug them right in the interface, along with checking performance. They’ll also have the option to create a PR from their branch to master or create a collaborative branch to get the changes reviewed by peers.

Once they are satisfied with the changes and are ready to go to production, they set it in the master branch and can then publish it to the development Data Factory. Or they can promote each of those environments through exporting those ARM templates when they’re ready from the master branch, or any other branch.

So, you export the template and it gets deployed with different environment parameters to test and production environments. From there, you can also set up VSTS release definitions to automate the deployment of your Data Factory to multiple environments.

The benefit with this is it opens the opportunity to bring your true dev test and production environments, that you’re used to in your local environment using SSIS or other ETL tools, to Azure. This tool offers a tremendous amount of power and it’s getting better all the time.

Azure, IoT, Training

Device Management with Azure IoT Hub

June 9, 2018 cseferlis Leave a comment

Yesterday’s post covered what Azure IoT Hub is and what it brings. Today I’m going a bit deeper and talking about how the devices you’re bringing to the table get managed. IoT Hub provides the features and extensibility that enables devices, as well as the people who program those devices and their architectures, with a robust device management solution.

Devices are all over the place; they are sensors, microcontrollers and Raspberry Pi computers. It’s also the gateways that route the communications for groups of devices. They’re installed on a local network and can work in peer to peer networks or have a router that passes information back and forth.

Azure IoT Hub offers a flexible platform for the many different uses across many different industries and devices themselves to be able to have that compatibility no matter the industry you’re in. No matter what you’re using the devices for, a significant part lies in the planning of how the devices and gateways will work together in the IoT Hub.

Let’s look at some things to be aware of:

1. Device Management Principles – Here you’ve got your scale and automation. You need to have simple tools to automate routine tasks. And you need the ability to manage millions of devices simply, as well as remotely and in bulk, so you can make sweeping changes across a whole suite of devices.

In addition, you don’t need to be alerted for every change or notification, but you do need to be alerted when there’s a problem. There are many different devices, protocols and patterns. IoT Hub needs to accommodate all those changes; with the wide range of devices from single process chips to fully functional computers, we need to have the flexibility to accommodate those systems.

Other things you need to know are:

Context awareness to accommodate the SLA and maintenance windows for when there’s downtime.
The network and power states.
The in-use conditions – What are the expectations while the devices themselves are working?
Where the device is – Is it in a building or out in the field on a utility pole?

These devices serve many roles and must work within the IT operations of your group. They need to be easily managed from that group or an extension from that group, as well as be able to surface alerts when it’s required. Most importantly, this all needs to work within your internal IT ecosystem to keep that continuity and consistency inside the business.

2. Device Lifecycle – So, we start with a plan – how will we use the devices; how will they be managed; and what will the devices be for our specific instance? Next, we need to provision them by adding them into the IoT Hub identity registry, so when we get to the next step they are being acknowledged in the system. Our next step is to configure them. We want to maintain the health of the device, even when we’re doing these updates and configurations, and we can send confirm updates securely.

Also, we need to monitor the device’s health to be aware if it’s beginning to fail. Many are small, simple devices that have a certain lifespan. We also monitor the status of the device and we need the ability to get alerts when the device begins to have issues. Then, ultimately, we need to remove old devices that are no longer effective so they’re not showing up or cluttering up the space of that IoT Hub interface.

3. Device Management Patterns – How are we interacting with devices after they’ve been deployed? So, if you’re going to reboot, factory reset or redeploy a device, you’ll need to reconfigure it so that it can be brought back up in the system. You’ll need to do simple configurations to change how the devices behave, and these need to be done in bulk.

To ensure you’re staying on top of bug fixes and new functionality and features for your devices, you’ll need to send firmware updates. Lastly, you need to show reporting progress and statuses of the devices themselves. It’s important that you have visibility into how the devices are performing and know if there are any problems.

This has been a high-level overview of device management with Azure IoT Hub. I hope you found it informative and helpful.

Azure, IoT, Training

How Does Azure IoT Hub Work?

June 7, 2018 cseferlis Leave a comment

Today I’d like to talk about Internet of Things (IoT) and the Azure IoT Hub. IoT devices are not your typical devices like mobile phones, tablets or laptops. IoT devices are designed to respond to sensor activity that the device is being used for, like a glass break sensor for instance.

These devices are meant to be used for specific communications, whereas the typical device acts more like a server waiting to receive information from everywhere. This can cause some security threats if they are deployed in that manner. We can use firewalls and software to protect our equipment, but the whole idea with IoT is that these low power, no frills devices are what’s being deployed, so you don’t have a lot of that capability.

Also, the traditional PKI trust model is inefficient and ineffective for the IoT model; the TTL (time to live) certificates are too long and it doesn’t make sense for these devices. As well as the fact that promiscuous mode is turned on by default, which defeats the purpose of trying to have a secure environment.

Azure IoT Hub implements a service assisted communication methodology and this mediates interaction between backend systems and devices. With this you have a bi-directional, trust worthy communication set up and security is the number one priority of this configuration.

Devices will not accept unsolicited information; they must regularly check in for instructions, and authorization is based on per device identity. For devices in areas where there are network coverage or power issues, IoT provides cues for the messages that are set up for communication with the devices. Essentially, it will hold the message and validate the device before anything is sent/received; it will send the necessary data after it’s validated.

This also sets up an application payload data, which is secured separately, so any data that’s flowing through is going to be secured for protected transit through the gateways. The data is wrapped prior to sending and receiving between devices. Devices can be configured to work peer to peer before they get to a gateway to be able to extend out the range. That gateway is what communicates with your Azure IoT Hub.

All that traffic is designed to flow to and from the gateway and then communicate with the IoT Hub, which you can use to collect the data for big data uses, setting up Power BI reports or many other ways to use that data.

Azure, Review, Strategy, Training

Overview of Azure Stream Analytics

June 5, 2018 cseferlis 1 Comment

Analytics is the key to making your data useful and supporting decision making. Today I’m excited to talk about Azure Stream Analytics. Azure Stream Analytics is an event processing engine that allows you to capture and examine high volumes of data from all kinds of connections, like devices, websites and social media feeds.

You can examine those data streams and it allows you to trigger things like alerts, as well as take action with reporting or storage. So, whether you want to report on it with Power BI or store the data for down the road, you have these options. Stream analytics is used a lot with IoT or streaming feeds through social media, where people want to keep an eye on what’s happening with the data.

Here’s how it works. It starts with a data source such as Event Hub, IoT Hub or Azure Blob Storage, and it uses SQL-like query language that allows transformation on the fly. It helps you process operations like filtering, sorting, aggregating and joining the data together to make it more useable—turning data into information.

From there, when you identify the data that you want/need to use, you can then send that data downstream to be sent to a queue for triggering workflows or further processing of the data. You can also send that data to Power BI for real-time visualization. For example, let’s say you’re looking at a data quality stream and you want to pull certain key words out of Twitter to see how they’re used and watch how that’s being done. By connecting to the Twitter API, you can capture that data, stream it, and then report from it with a Power BI report.

Of course, the other option is to archive it for further processing down the road if you want to do something with that data.

This was designed to be easy to use and spin up. It has source and sync integration and an easy to use declarative SQL query-like language. Also, it’s a managed service so it’s pay-as-you-use, as with many Azure services. There’s no need to buy hardware or software up front. And it has an enterprise grade service level agreement so it’s robust, reliable and you can have multi-locations.

Another big positive is it’s in-memory processing with multi-node capabilities offers tremendous scalability and performance benefits. Plus, unlike on prem solutions it can be fairly elastic, so you can buy nodes as you need them to process more data and you can bring them back down when you’re not using them.

There are a lot of cool things being done with stream analytics and IoT; it’s an exciting time to be in this arena.

ADFv2, Azure, Azure Data Factory, Review, Training

Azure Data Factory Integration Runtimes

June 3, 2018 cseferlis Leave a comment

This week I’ve been talking about Azure Data Factory. In today’s post I’d like to talk about the much-awaited Azure Data Factory Integration Runtime. The compute infrastructure provides data movement, connectors, data conversions and data transfers, as well as activity dispatching, so you can dispatch and monitor activities running HDInsight, SQL DB or DW and more.

A big part of V2 is that you can now lift and shift your SSIS packages up into Azure and run them from your Azure data portal inside of Data Factory. There are 3 integration runtime types:

1. Azure Integration Runtime – This is clearly set up in Azure and you would use this if you’re going to be copying between two cloud data sources.

2. Self-Hosted Integration Runtime – This can run a copy of transformation activities between cloud and on-premises, including moving data from an IaaS virtual machine. Use this if you’re going to be copying between a cloud source and an on-prem, private network source.

So, if you’re environment is behind a firewall, not in the public cloud and you want to move data from your environment to Azure and the gateway will not work for you. Also, because that IaaS virtual machine is isolated, and you can’t get into that data storage, you would set up that integration between sites.

3. SSIS Integration Runtime – Use this when you’re lifting and shifting your SSIS packages into Azure Data Factory. But a key thing to mention is that this does not yet support third party tools for SSIS, but that support will be added eventually.

Where are these located? Azure Data Factories are located in limited regions at this time, but they can access data stores and compute services globally. With Azure Integration Runtime, the location of that runtime will define the backend compute resources where those are being used. This is optimized for data compliance, efficiency and reduced network egress costs, to ensure that they’re using the best services available in the region that is needed.

The Self-Hosted Runtime is installed in the environment in that private network. The SSIS Integration Runtime is determined based on where the SQL DB or managed instance is hosting that SSIS DB catalog. It is currently limited where it can be located, but it does not have to be in the same place as the Data Factory. It will be as close to the data sources as possible and will run as optimally as it can.

ADFv2, Azure, Azure Data Factory, Training

Azure Data Factory – Data Sets, Linked Services and Pipeline Executions

June 1, 2018 cseferlis Leave a comment

In my posts this week, I’ve been going a little deeper into Azure Data Factory (be sure to check out my others from Tuesday and Wednesday). I’m excited today to dig into getting data from one source into Azure.

First let’s talk about data sets. Consider a data set anything that you’re using as your input or output. An Azure blob data set is one example and is defined by the blob container itself, the folder, the file, the documentation, etc. In this case you can have that as your source or your destination.

There are many data set options, such as Azure Services (other data services or blobs, databases, data warehouses or Data Lake), as well as on premises databases like SQL Server, MySQL or Oracle. You also have your NoSQL components like Cassandra and MongoDB and you can get your files from an FTP server, Amazon S3 or internal file systems.

Lastly, you have your SaaS offerings like MS Dynamics, Salesforce or Marketo. There is a grid available in the Microsoft documentation which outlines what can be a source or a destination or both, in some cases.

Using Linked Services is the way that you link from your source to your destination. This defines the connection to the data source, whether it be the input or the output. Think of it like your connection string in SQL Server; you’re connecting to a specific place, whether you’re using that to output data from a source or input it to a destination.

Now, let’s look at Pipeline Executions. A pipeline is a collection of activities that you’ve built, and the executions run that pipeline moving the data from one place to another or to do some transformation with that data. There are 2 defined Pipeline Executions:

1. Manual (or On Demand) Executions. These can be triggered through the Azure portal, a REST API, as part of a PowerShell script, or as part of your .net application.

2. Setting Up Triggers. With this execution, you set up a trigger as part of your Data Factory. This was an exciting new change in Azure Data Factory V2. Triggers can be scheduled, so you can set a job to run at a particular time each day or you can set a tumbling window trigger. Using a tumbling window, you can set up your hours of operation (let’s say you want your data to run from 8-5, Monday through Friday, every hour on the hour). The tumbling window trigger runs continuously for the times/hours you’ve specified.

Doing this allows us to lessen our costs to run the job in Azure by using the compute time only when you need it, and not using compute time during downtimes like outside of business hours.

ADFv2, Azure, Azure Data Factory, Training

Azure Data Factory Pipelines and Activities

May 30, 2018 cseferlis Leave a comment

Yesterday’s Azure Every Day post covered how Azure Data Factory pricing works. In today’s post I’d like to go a bit deeper into Azure Data Factory Version 2 and review pipelines and activities. In essence, a pipeline is a logical grouping of activities. If you’re familiar with SSIS, think of an SSIS package being a grouping of activities that are happening with the data.

An example of a pipeline would look like: you want to pull data from a website, file server or database up into Azure and do some kind of transformation on that data, then report from it. Within the pipeline, multiple activities can be defined. If there’s no activity dependency on a set of activities – so you have one activity running and there’s no dependency on the next activity -then they can run in parallel.

This is good to keep in mind as you’re performing these activities because you may need to schedule them or figure out a way, so they don’t run in parallel or that one runs after another.

There are 3 main types of activities:

1. Data Movement Activities – This is the sources where you’re pulling in data from such as Azure Blob Storage, Azure Data Lake, Azure DB and DW. You can also set up an on premises gateway and pull in databases, such as commonly used DB2, MySQL, Oracle, SAP, Sybase and Teradata, as well as NoSQL databases like Cassandra and MongoDB.

I also mentioned files; you can pull from Amazon, S3, file systems, FTP, HTTP, etc. You also have the Software as a Service (SaaS) options: Dynamics, HubSpot, Marketo, QuickBooks, and Salesforce, to name a few. You can check a complete list on the Azure online documentation.

2. Data Transformation Activities – Here is where you’re taking your data after it’s ingested into Azure and doing something with it. Some common ones are HDInsight, HIVE, PIG, MapReduce, Hadoop Streaming and Spark transformations. These allow you to transform your big data in your Azure environment and stage it for your reporting.

Other common uses would be machine learning into an Azure VM, as well as stored procedures. You can have your stored procedures in SQL Server defined in Azure, and then run that stored procedure, and also use U-SQL for your Data Lake Analytics.

3. Control Activities – In these activities you can do things like execute your pipelines or run a ForEach statement or Look-up activities, the types of things where you’re controlling how the pipeline is working and interacting with the data.

ADFv2, Azure Data Factory, Strategy, Training

How Azure Data Factory Pricing Works

May 28, 2018 cseferlis 1 Comment

In today’s post I’d like to discuss how Azure Data Factory pricing works with the Version 2 model which was just released. The pricing is broken down into four ways that you’re paying for this service. I hope that by pointing these out, you can gain an understanding of not only how it works, but how you can keep an eye on your spending.

1. Azure activity runs vs self-hosted activity runs – there are different pricing models for these. For the Azure activity runs it’s about copying activity, so you’re moving data from an Azure Blob to an Azure SQL database or Hive activity running high script on an Azure HDInsight cluster.

With self-hosted, you want to copy activity moving from an on premises SQL Server to an Azure Blob Storage, a stored procedure to an Azure Blob Storage or a stored procedure activity running a stored procedure on an on premises SQL Server.

2. Volume of data moved – this is measured in DMUs (data movement units). This is one you should be aware of as this will default to auto, which is basically using all the DMUs it can use and this is paid for by the hour. Let’s say you specify and use 2 DMUs and it takes an hour to move that data. The other option is you could use 8 DMUs and it takes 15 minutes, this price is going to end up the same. You’re using 4X the DMUs but it’s happening in a quarter of the time.

This is good to look at and do some comparisons since how many DMUs you’re using is where the bulk of your spend if going to be.

3. SSIS integration run times – here you’re using A-series and D-series compute levels. When you go through these, it depends on what the compute needs are to invoke the process (how much CPU, how much RAM, how much attempt storage you need).

4. The inactive pipeline – you’re paying a small account for pipelines (about 40 cents currently). A pipeline is considered inactive if it’s not associated with a trigger and hasn’t been run for over a week. Yes, it’s a minimal charge, but they do add up and when you start to wonder where some of those charges come from it’s good to keep this in mind.

Also, each of the components inside the Azure Data Factory, whether it’s blob storage, SQL Server, HDInsight or any kind of storage or compute resources you’re using as part of your pipeline, will also incur charges. These are billed separately based specifically around what those resources are.

Something to keep in mind as you start of build workloads, like if you spin up an HDInsight cluster or a SQL data warehouse as part of a pipeline, make sure you shut down, pause it or destroy that cluster afterwards. So, there are opportunities to get your data moved but also keep the cost down but not keeping it running all the time.

BizDataViz

All posts by cseferlis

Azure Common Data Services

Overview of Azure Data Catalog

Continuous Integration and Deployment Using Azure Data Factory

Device Management with Azure IoT Hub

How Does Azure IoT Hub Work?

Overview of Azure Stream Analytics

Azure Data Factory Integration Runtimes

Azure Data Factory – Data Sets, Linked Services and Pipeline Executions

Azure Data Factory Pipelines and Activities

How Azure Data Factory Pricing Works

Azure and SQL Data Blog

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Azure and SQL Data Blog