Category Archives: Azure

Getting started with Spark Pools in Azure Synapse

In my latest video blog I discuss getting started on the newly Generally Available Spark Pools as a part of Azure Synapse, another great option for Data Engineering/Preparation, Data Exploration, and Machine learning workloads

Without going too deep into the history of Apache Spark, I’ll start with the basics. Essentially, in the early days of Big Data workloads, a basis for machine learning and deep learning for advanced analytics and AI, we would use a Hadoop cluster and move all these datasets across disks, but the disks were always the bottleneck in the process. So, the creators of Spark said hey, why don’t we do this in memory and remove that bottleneck. So they developed Apache Spark as an in memory data processing engine as a faster way to process these massive datasets.

When the Azure Synapse team wanted to make sure that they were offering the best possible data solution for all different kinds of workloads, Spark gave the ability to have an option for their customers that were already familiar with the Spark environment, and included this feature as part of the complete Azure Synapse Analytics offering.

Behind the scenes, the Synapse team is managing many of the components you’d find in Open-Sourced Spark such as:

  • Apache Hadoop Yarn – for the management of the clusters where the data is being processed
  • Apache Livy – for the job orchestration
  • Anaconda – a package manager, environment manager, Python/R data science distribution and a collection of over 7500 open source packages for increasing the capabilities of the Spark clusters

I hope you enjoy the post. Let me know your thoughts or questions!

Connecting to External Data with Azure Synapse

In my latest video blog I discuss and demonstrate some of the ways to connect to external data in Azure Synapse if there isn’t a need to import the data to the database or you want to do some ad-hoc analysis. I also talk about using COPY and CTAS statements if the requirement is to import the data after all. Check it out here

The Modern Data Warehouse in Azure Part 4: The Serving Layer

In this video blog post I covered the serving layer step of building your Modern Data Warehouse in Azure. There are certainly some decisions to be made around how you want to structure your schema as you get it ready for presentation with whatever your business intelligence tool of choice, for this example I used Power BI, so I discuss some of the areas you should focus on:

  • What is your schema type? Snowflake or Star, or something else?
  • Where should you serve up the data? SQL Server, Synapse, ADLS, Databricks, or Something Else?
  • What are your Service level agreements for the business? What are your data processing times?
  • Can you save cost by using an option that’s less compute heavy?

Microsoft Reimagines Traditional SIEMs with Azure Sentinel

If you’re like most, security is at the forefront of your mind for your organization. You need the right tools and the right team to keep up with the balance of increasing number of sophisticated threats and with security teams being inundated with requests and alerts.

Today I’d like to tell you about Microsoft’s reimagined SIEM tool Azure Sentinel. Over the past 10 to 15 years, Security Information and Event Management (SIEM) has become extremely popular as an aggregation solution for security and events that happen in our network.

There are also software tools, hardware appliances and managed service providers that can help support your corporate needs to better understand the level of risks in real-time and over a span of time. They do things such as log aggregation, event correlation and forensic analysis and offer features for alerting, dashboarding and compliance checks.

These are great resources to help secure our environment, our users and devices. But unfortunately, the reality is security teams are being inundated with requests and alerts. Compile this with the noteworthy shortage of security professionals in the world – an estimated 3.5 million unfilled security jobs by 2021 – this is a major concern.

Microsoft decided to take a different approach with Azure Sentinel. Azure Sentinel provides intelligent security analytics at cloud scale for your entire enterprise. It makes it easy to collect data across your entire hybrid organization on any cloud, from devices to users to applications to servers. Azure Sentinel uses the power of AI to ensure you’re quickly identifying real threats.

With this tool:

  • You’ll eliminate the burden of traditional SIEMs as you’re eliminating the need to spend time on setting up, maintaining and having to scale the infrastructure to support other SIEMs.
  • Since it’s built on Azure, it offers virtually limitless cloud scale while addressing all your security needs.

Now let’s talk cost. Traditional SIEMs have proven to be expensive to own and operate, often requiring you to commit upfront and incur high cost for infrastructure maintenance and data ingestion. With Sentinel, you pay for what you use with no up-front costs. Even better, because of Microsoft’s relationships with so many enterprise vendors (and more partners being added) it easily connects to popular solutions, including Palo Alto networks, F5 networks, Symantec and Checkpoint offerings.

Azure Sentinel integrates with Microsoft Graph Security API, enabling you to import your own threat intelligence feeds and to customize threat detection and alert rules. There are custom dashboards that give you a view to allow you to optimize whatever your specific use case is.

Lastly, if you’d like to try this out for free, Microsoft is allowing you to connect to your Office 365 tenant to do some testing and check it out in greater detail. This product is currently in preview, so there may be some kinks but I’m looking forward to seeing how it develops in the future, as a true enterprise-class security solution for your environment, whether in the cloud, on premises, in data centers or remote users or devices.

Microsoft Announces Windows Virtual Desktop in Azure

Today I’m here with some exciting news out of Microsoft with the public preview of Microsoft Virtual Desktop. Virtual desktops are not a new invention and they are currently offered by multiple vendors.

Windows Virtual Desktop is comprised of the Windows desktops themselves and the application that you would pass out to users and the management solution for these are hosted in Azure. Inside public preview desktops and apps can be deployed on virtual machines in any Azure region in the US, with the management solution in data for these virtual machines residing in the US as well.

As the service moves closer to general availability, Microsoft will start to scale out the management solution and data localization to all Azure regions. Virtual desktops can be deployed with Windows 7 (with extended support) or Windows 10 for the workstation modes, and for the server versions you can run 2012 through 2019.

It will provide a full desktop virtualization environment inside your Azure subscription without having to run any additional gateway servers as you would if you were deploying this on other vendors’ platforms or in your on premises.

With Windows Virtual Desktop you can build custom images or pick some of the canned images provided in the Azure gallery. Images can be personalized and remain static with persistent desktops. There are also many configuration options, for instance; you want to run a single application for a connectivity for setting up a server/client type deployment of an application or you want to deploy pooled multi-session resources.

Another key point is you’re deploying this with significantly reduced overhead as you no longer need to manage the remote desktop roles like you would with remote desktop services on prem or with some of the other providers. You just have to manage the virtual machines inside of your Azure subscription.

There are many great use cases historically for virtual desktops in things like education and medical along with many others.

Accelerate Your AI with Machine Learning on Azure Data Box Edge

In some past blogs I’ve discussed Azure Data Box and how the Data Box family has expanded. Today I’ll talk about Azure Data Box Edge (in preview) and elaborate on the machine learning service that it provides in your premises with the power of Azure behind it.

If you don’t know, Azure Data Box Edge is a physical hardware device that sits in your environment and collects data from environment sources like IOT data and other sources where you might take advantage of the AI features offered by the device. It then takes the data and sends it to Azure for more processing, storage or reporting purposes.

Microsoft recently announced Azure Machine Learning hardware accelerated models provided by Project Brain Wave on the Data Box Edge. Because most of our data is in real world applications and used at the edge of our networks – like image and videos collected from factories, retail stores or hospitals – it can now be used for things such as manufacturing defect analysis or inventory out of stock detection in diagnostics.

By applying machine learning models to the data on Data Box Edge, it provides lower latency (and savings on bandwidth cost) as we don’t have to send all the data to Azure for analysis. But it still offers that real time insight and speed to action for critical business decisions.

You can enable data scientists to simplify and accelerate the building, training and deployment of machine learning models using the Azure Machine Learning Service which is already generally available. They can access all these capabilities in their favorite Python environment, using the latest open source frameworks such as PyTorch, TensorFlow and sci-kit-learn.

These models can run on CPUs and GPUs, but this preview expands that out to field programmable gate array processes (FPGA), which is the processor on the Data Box Edge.

The preview is currently a bit limited but, in this case, you’re able to enhance the Azure Machine Learning Service by training a TensorFlow model for image classification scenarios. So, you would containerize that model in a docker container and then deploy it to the Data Box Edge IOT hub.

A good use case for this is if you’re using AI models for quality control purposes. Let’s say you know what a finished product should look like and what the quality specs are, and you build a model defining those parameters. Then you take an image of that product as it comes off the assembly line; now you can send those images to the Data Box Edge in your environment and more quickly capture defects.

Now you’re finding the root cause of defects quicker and throwing away fewer defective products and therefore, saving money. I’m looking forward to seeing how enterprises are going to leverage this awesome technology.

What Power BI XMLA Endpoints means for you

Are you taking advantage of Power BI’s modeling capability? This is a terrific built in capability and many users build models as they begin to design their reports.  

However, as we transition Power BI into being our enterprise visualization and reporting tool, some of the legacy applications just aren’t going to go away. Many of those applications connect to XMLA endpoints from other semantic model providers such as analysis services on your SQL on prem.

So, the Power BI team decided to give you the ability to do the same. There is a newly announced feature in public preview called XMLA Endpoints for Power BI. Beyond the ability to connect to the XMLA endpoints with other analytical dashboarding tools (like Tableau), you can also connect to it from other toolsets that support XMLA.

For example, many of the Microsoft development tools give you the ability to connect as well, things like SQL Server Management Studio (SSMS), SQL Server Profiler, DAX Studio and even from Excel pivot tables. Just keep in mind it’s only currently available for read access, so some of your capabilities will be limited. Microsoft documentation on this feature states that they will be planning to offer a read/write option soon.

From a licensing perspective, access to XMLA Endpoints is available for datasets in Power BI Premium only, but any user can connect to the endpoints regardless of whether they have a Pro license. The feature itself is turned on within the settings of the Power BI Premium tenant. This is only available in preview at this time, so it’s not supported in production.

In this scenario, other tools such as SQL Server data tools or types where you can play with the model give you the ability to change your model with Visual Studio. Now you can start to see a scenario where you can start to use Git Hub repositories to manage your source control on the models among other endless uses.

I must say I’m impressed with the modular approach of the team to roll out Power BI as a 100% enterprise class tool. First, we saw the Data Flows feature which extracts the ELT/ELT into its own lane. Then things like composite models was introduced where you can have offline and online data sources. Now we’ve got XMLA Endpoints which clearly defines 3 separate layers of development within the Power BI ecosystem.

These recently added features have opened some great avenues and spread user adoption and I know there are more great things to come. The commitment of the team to evolve and improve this product has been excellent and I look forward to what they’ll roll out next.

What is Azure Cost Management?

If you’re using Azure, one key piece is effectively planning and controlling the costs involved in running your business. In this post, I’ll discuss what I believe to be one of the most critical functions in Azure that will help you properly deploy and manage your environment.

Azure Cost Management helps you with this planning and cost control. Don’t confuse cost management with billing. To clarify, cost management and billing are two different things; billing is simply retrieving the bill, auditing it for accuracy and paying it. Cost management shows organization cost and usage patterns with advanced analytics.

Reports in cost management show the usage-based costs consumed by Azure services in third party marketplace offerings. Costs are based on negotiated prices and they factor in things like reservation and Azure hybrid benefit discounts for example.

These reports show all your internal and external costs for usage and some of the Azure Marketplace charges. One thing to be aware of when looking at the report is that some other charges such as reservation purchase, support and taxes are not yet shown.

They’ll also help you understand your spending and resource usage, as well as to find any spending anomalies. Some application of predictive analytics is also available to help with future budgeting needs based on your previous usage trends. Using this, Azure Management can deploy groups, budgets and recommendations to show you how your organization expenses are organized and how you can reduce costs going forward.

In addition, in 2018 Microsoft purchased a company called Cloudyn. Cloudyn was a third-party tool that allowed you to manage other cloud providers, as well as your Azure services. Since its purchase, Microsoft has begun to move some of those services over to Azure Cost Managementand pulled them out of or stopped supporting them on Cloudyn. The Azure Cost Management page gives you guidance as to which service is best for you.

Besides the service offering, Microsoft also offers some best practices around cost management including some key principles such as:

  • Planning ahead of deploying any resources so that the appropriate sizing and services are configured before you ‘jump in the pond’.
  • Visibility which gives you some alerting and reporting availability so you can ensure all the effected parties are notified appropriately of the services they are using and the costs they’re incurring.
  • Accountability allows you to ensure that those groups are accountable for the services they are using, and they understand the implications of the services they are deploying. And they are getting this information in a timely manner to know what their cost spending is.
  • Optimization is the process of deploying those compute resources and making sure you’re looking at where you can save some money like the appropriate licensing structures, using hybrid benefits, bringing your own license (BYOL) and looking at reserved instances for instance. And also looking at what usage your compute is consuming so you can see where you may be able to cut costs.

Azure Cost Management is a continuous, iterative process and the key to managing expenses appropriately is being sure you are keeping an eye on these things I’ve discussed and continually tweaking the services that you’re using.

What is Azure Active Directory B2C?

How important is secure identity management to you? If you’re like most, it is a top priority. In today’s post I’ll talk about Azure Active Directory B2C which is an identity management service that enables you to customize and control how users securely interact with your web, desktop, mobile or even single applications.

Using Azure AD B2C, users can sign up, sign in, reset passwords and edit profiles for the various applications they’re using.

When implementing these policies, we’ll have two choices:

  • Using common identity user flows within the Azure portal or,
  • For the more skilled developer or if the templates in the portal don’t support your use case, you can use XML based custom policies.

Once you make that decision, your choice will define the path of authentication, commonly referred to as the user journey. User journeys allow you to control behaviors by configuring some settings; things like social accounts (like Facebook) that the user uses to sign up for the application.

Data collected from the user as a first name or postal code would be used for authentication. You also have multi-factor authentication options, as well as the look and feel of how users interact with pages and information returned to the application.

Azure Active Directory B2C supports the open ID connect and the OAuth 2 protocols for these user journeys. These protocols will help ultimately receive a token that will allow for you to be authenticated. The interaction of every application follows a similar high-level pattern shown in the graphic below:

AAD B2C Flow

The steps here are:

1. The application directs the user to run a policy.

2. The user completes the policy according to the policy definition.

3. Then the application receives a token.

4. And then uses that token to try to a resource.

5. The resource server then validates the token to verify that access can be granted.

6. And the application will periodically refresh in the background ( there really are 5 steps but this 6th step is happening over and over).

Azure AD B2C can also work with additional identity providers such as Amazon, Facebook and Google that will create, maintain and manage identity information while providing authentication services to their (and other) applications.

Typically, you would only use one identity provider in your application but there are no restrictions for using more if your use case calls for it.

The main value for this service is the ability to lessen the need for username and password management for so many applications, thus improving the user experience. Our lives have been made a bit easier since we now have many applications, both web and desktop based, that allow that single sign on or no sign on experience because they are already pre-authenticated with a service like this.

What is Azure Active Directory Seamless Single Sign On?

We’re all dealing with many usernames and passwords in our everyday life, right? Today I’d like to talk about an authentication feature within Azure Active Directory that can help you with easier, faster access.

Azure Active Directory Seamless Single Sign-on (Azure AD Seamless SSO) automatically signs users in when they are on their corporate devices connected to their corporate network. When this is enabled, users don’t have to type their passwords, or even their username, to sign in to Azure Active Directory.

This feature provides users with easy access to cloud-based applications without needing any additional on premises components.

First let’s discuss how this is set up:

  • SSO is enabled used Azure AD Connect. The following steps will occur while enabling this feature:
    • A computer account representing Azure AD is created in your on premises Active Directory in each AD forest.
    • The computer account Kerberos decryption key is shared securely with Azure AD and then 2 Kerberos service principal names (SPNs) are created to represent 2 URLs that are used during Azure AD sign-on.

Authenticating in Browser

  • When doing authentication from a web browser for a web app, essentially a user navigates to a website and signs into Azure AD (see below).
AADSSSO-Image 1
  • Azure AD sends a Kerberos requests to on premises AD and on premises AD looks for an account related to the device you’re signing in on and a user account. If authorized, you get access.

Authenticating with Native Application

  • For a native client, like Outlook for instance, the process is a bit different (see below).
AADSSSO-Image 2
  • Here, the request is made from the device you’re using and authenticated off Azure AD, issuing a Kerberos ticket when it is successful.
  • When that ticket is authenticated off Azure AD and approved, a SAML token is sent to the app. Then it gets sent back to AAD for OAuth-2 authentication.
  • Once all that checks out, access is granted.

Now let’s talk about the benefits.

  • First, it’s a much better user experience. Users are automatically signed in both on premises and cloud-based applications using their built-in authentication, so there’s no need for users to repeatedly reenter their passwords.
  • It’s also easy to deploy and administer. There are no additional components needed on premises; it synchronizes your Azure Active Directory to your AD. Plus it works with any method of cloud authentication using password hash synchronization or pass through authentication.
  • Additionally, it can be rolled out to only some or all of your users by using group policy.

So, this is a great way to allow users to have multiple authentications into multiple websites and applications using only one authentication tool. This will minimize the amount of administration required to set up those users once it’s in place. And it should reduce the number of password resets for your help desk team or whomever oversees that.