Category Archives: Azure

A Look at Some of Azure SQL Database’s Intelligence Features

Today I’d like to tell you about some very cool intelligence features within the Azure SQL Database. Azure SQL Database technologies deliver intelligent capabilities through a range of built-in machine learning and adaptive technologies that monitor and manage performance and security for you.

Using telemetry from millions of databases running in Azure over the years, Microsoft has built this capability of training a truly intelligent and autonomous database that gives you the ability to learn and adapt to your workload. This intelligent performance gives you the deeper insight into database performance. Plus, it eliminates the hassle of making ongoing improvements, allowing you to focus more on driving your business and less on “chores”.

Features like query performance insights and automatic tuning continuously monitor database usage and detect disruptive events and then they take steps to improve performance.

Three examples of the intelligent performance that can collectively optimize your memory usage and improve overall query performance are things like:

  • Row mode memory grant feedback – this gives you the ability to expand on batch-mode memory grant feedback by adjusting memory grant sizes for both batch and row mode operators.
  • Approximate query processing – this is designed to provide aggregations across large datasets where responsiveness is more critical than absolute precision, and it will return an approximate value with the focus on performance.
  • Table variable deferred compilation – this improves plan quality and overall performance for queries, referencing table variable by propagating cardinality estimates that are based on actual table variable row counts. In turn, this optimizes your downstream plan operations.

Along all those features, Azure SQL Database intelligent protection allows you to efficiently and productively meet your data’s security and compliance requests by proactively monitoring for potential threats and vulnerabilities. You can flag things such as PII or a cross-scripting attack or something like that. There are detection mechanisms in there that can help you avoid these.

Through features like information protection, vulnerability assessment and threat detection, you can proactively discover and protect sensitive data, as well as uncover potential vulnerability and detect anomaly activities that could indicate a threat to your data.

In short, Microsoft has built these intelligent features over years of machine learning and is applying it to all their Platform as a Service, as well as some of their on-premises, offerings. These are really cool features and we’ve got great response about them and how well they work.

I recommend you give these features a try, but remember, always try them out in your test or dev environments prior to bringing them into production.

What is Azure Data Box Heavy?

You may have seen my previous Azure Every Day post on Azure Data Box and Azure Data Box Disk. These are a great option for getting smaller workloads, up to 80 terabytes of data, quickly up into Azure. Rather than moving it over the wire, you can send a box and bring it up.

The Data Box Heavy works the same, but you can use much larger amounts of data with up to a petabyte of space.

Let’s review the Data Box process:

  • You order the box through the Azure Portal and specify the region that you’re going to use.
  • Once you receive it, connect it into your network, set up network shares and then you copy your data over. It has fast performance with up to 40 gigabits/second transfer rates.
  • Then you return the box to Microsoft and they will load the data directly into your Azure tenant.
  • Lastly, they will securely erase the disk as per the National Institute of Standards and Technology (NIST) guidelines.

The Data Box Heavy is ideally suited to transfer data sizes larger than 500 terabytes. If you used a Data Box with it’s 80 terabytes, you’d need 5 or 6 of those in place of the Heavy. When you have those larger data sizes, it makes more sense to have it on one machine.

The data movement can be a one time or periodic thing, depending on the use case. So, if you want to do an initial bulk data load, you can move that over and then follow that up with periodic transfers.

Some scenarios or use cases would be:

  • You have a huge amount of data on prem and you want to move it up into Azure – maybe a media library of offline tapes or tape backups for some kind of online library.
  • You’re migrating an entire cabinet – you have a ton of data in there with your virtual machine farm, your SQL Server and applications – over to Azure. You can move that over into your tenant, migrate your virtual machines first, then you can do an incremental restore of data from there.
  • Moving historical data to Azure for doing deeper analysis using Databricks or HD Insight, etc.
  • A scenario where you have a massive amount of data and you want to do the initial bulk load to push it up, then from there you want to do incremental loads of additional data as it gets generated across the wire.
  • You have an organization that’s using IoT or video data with a drone – inspecting rail lines or power lines for instance. They are capturing tremendous amounts of data (video and graphic files can be huge) and they want to be able to move that up in batches. Data Box Heavy would be a great solution to quickly move these up rather than moving the files individually or over the wire.

This is a very cool technology and an exceptional solution for moving data up in a more efficient manner when you have huge, terabyte-scale amounts of data to push to Azure.

What is Azure Network Watcher?

Most of us are starting to deploy more and more cloud assets. When you think about how you deploy some assets in Azure, you basically build out a virtual network and you can set that up so it ties in with your on premises network through express route or VPN or you can run it independently in the cloud and have your virtual network set. The question is, how do you monitor and manage that virtual network, like some of the components and how the virtual machines interact? Here’s where Azure Network Watcher comes in.

Azure Network Watcher allows you to monitor, diagnose and gain insight into your network performance between various points in your network infrastructure.

Here’s a breakdown of some of the elements:

1. The Monitoring Element – You can monitor from one endpoint to another with connection monitor to ensure connectivity between 2 points, like a web application and a database for instance. You’ll be alerted with potential issues such as a disconnect between those two services.

It also monitors latency times for evaluation. When you look at those latency times over a period, you’ll know what the average latency is and the max and min. Then you can think about you possibly getting better service in a different Azure region.

2. The Network Performance Monitor – Allows monitoring between Azure and on-premises resources for hybrid scenarios using VPN or express route. It also has some advanced detection to traffic blackholing and routing errors – in other words, some advanced intelligence when it comes to these network issues.

Best of all, as you add more endpoints it will develop a visual diagram of your network with a topology tool which will look like a visio-diagram, showing IP addresses, host names, etc.

3. Diagnostic Tools – From a diagnostic standpoint there are several diagnostic tools that give you better insight into your virtual network by diagnosing possible causes of traffic issues.

IP Flow – Tells you which security rule allowed or denied traffic to or from a virtual machine in your virtual network for further inspection or remediation.

Another tool tests communication for routing rules by letting us add a source and destination IP, then shows the results of that route, again to investigate further or remediate.

The Connection Troubleshooting Tool – Enables you to test a connection between two VMs, FQDN, URI or IDP4 addresses and returns info like the Connection Monitor but only about that point and time latency, not over a span of time.

The Packet Capture Tool – Allows traffic to be captured to and from a virtual machine with some fine-grained filtering to be stored inn Azure storage and further analyzing with network encapture tools like Wire Shark, for instance.

4. Metrics Tools – There are some limitations as to how many resources you can deploy within an Azure network which can be based on subscriptions or regions. The Metric Tool gives you the visibility that you need to understand exactly where you are inside of those limitations. It shows you how many of those resources you’ve deployed and how many are still available that you can deploy – so it helps you set up planning for the future as you deploy more and more resources.

5. Logging – We’ve done some interesting things with log analytics. Log analytics provides the ability to capture data about a bunch of Azure networking components, like network security groups, public IP addresses, load balances, virtual networking and application gateways, to name a few.

All these logs can be captured and stored in Azure storage and further analyzed. Many can be fed into Operations Management Studio (OMS). This gives you a single pane of glass experience when you want to look at your environment at that “50,000-foot level”.

So, as you begin to deploy more and more assets into your Azure environment, this is a helpful service to monitor and manage your virtual network. You get a high-level overview of what that network looks like.

Improve Your Security Posture with Azure Secure Score

Security is a top priority for every business and we can never have enough of it, right? But at what point does it become too much to administer and prioritize security threats? I’m excited to tell you about a newly announced offering called Azure Secure Score which is part of the Azure Security Center.

If you’re unfamiliar, the Azure Security Center is a centralized place where you can get security recommendations based on the workloads you’ve deployed. In September at Ignite, Microsoft announced Secure Score as a security analytics tool that provides visibility of your organization’s security posture, as well as help you understand how secure your workloads are by assigning them a score.

The new Secure Score helps you prioritize and triage your response to security recommendations. It takes into consideration the severity and impact of the recommendation and based on that info it assigns a numerical value to show how fixing the recommendation can improve your security posture.

Once you implement a recommendation, the score and the overall Secure Score updates.

The main goals of Secure Score are:

  • To provide the capabilities that allow you to visualize the security posture.
  • Quickly triage and make suggestions to provide impactful actions that increase your security posture.
  • Measures the workload of the security over time.

So, how does Azure Security Center and Secure Score work?

  • Azure Security Center constantly reviews your active recommendations and calculates your Secure Score based on these.
  • The score of a recommendation is derived from its severity and security best practices that will affect your workload security over time.
  • It looks at your security and where you sit over a period. It’s not an immediate result and it won’t immediately change but it’s going to help you build up your score as you implement any recommendations and then you can silence them.
  • The Secure Score is calculated based on the ratio between your healthy resources and your total resources. If the number of healthy resources is equal to your total resources, you get the highest score value.
  • The overall score is an accumulation of all your recommendations. You can view your overall Secure Score across your subscriptions or management groups depending on the scope you select. The score will also vary based on the subscriptions selected and the active recommendations on them.

Remember, this is a marathon, not a sprint. It takes time to do the remediation, whether it be patching machines or closing ports or shutting off services. There are so many remedies offered that will make you more secure down the road. With this offering, you get a ‘scorecard’ for yourself and a look at what’s most imperative to implement first.

Be sure to check out the Azure Security Center. There are a lot of free options there as well as options to add additional services at a cost.

New Development Feature for Azure Stream Analytics

Gaining insights from our data, especially in real-time is an important part of any business. Today I’d like to talk about some new development options for Azure Stream Analytics. If you’re not clear on what Azure Stream Analytics is, it’s a fully managed cloud solution in Azure that allows you to rapidly develop and deploy low cost solutions to gain real-time insights from devices, sensors, infrastructure and applications.

Stream Analytics is part of the Azure IoT suite which brings IoT to life and allows you to easily connect your devices and analyze previously untapped data and integrate business systems. The IoT workspace is expanding as it offers so much capability and information for things like production floors, jet engines and automobiles, just to name a few. I did another blog on some of the features here.

Today my focus is a new feature that allows you to do some local testing within Visual Studio to query logic with live data without needing to run in the cloud. You can test your queries locally while using live data streams from the sources such as Event Hub, IoT Hub or Blob Storage. Also, you can use the Stream Analytics time policies to be able to start and stop queries in a matter of seconds.

This offers a big improvement in development productivity, as you can save a lot time on the inner loop of query logic testing.

Some major benefits are:

  • The behavior query consistency, so you get the same experience when you’re using Visual Studio or the cloud interface.
  • Much shorter test cycles. You normally can expect a lag in cloud development. Now testing queries directly in Visual Studio in your local environment presents the opportunity to show the shape of the data coming in to help you easily adjust the query and see some immediate results.

A couple of caveats with deployment in this new feature:

  • The local testing feature should only be used for functional testing purposes. It doesn’t replace the performance or scalability tests that you would do inside the cloud.
  • It really should not be used for production purposes since it doesn’t guarantee any kind of SLA.
  • Also note, that when you’re on your machine, you can rely on local resources but when you deploy to the cloud, you can scale out to multiple nodes which allows you to add more streams and additional resources in order to process those.
  • Cloud deployment ensures things like check pointing, upgrades and other features that you need for production deployments, as well as provides the infrastructure to run your jobs 24/7.

So, remember, this new enhancement is just for testing purposes to help shorten the query and development cycle and avoid the lag in other testing and development tools. But a cool, time saving feature to investigate, and Microsoft is adding more features to Azure Steam Analytics.

Azure Data Factory – Data Flow

I’m excited to announce that Azure Data Factory Data Flow is now in public preview and I’ll give you a look at it here. Data Flow is a new feature of Azure Data Factory (ADF) that allows you to develop graphical data transformation logic that can be executed as activities within ADF pipelines.

The intent of ADF Data Flows is to provide a fully visual experience with no coding required. Your Data Flow will execute on your own Azure Databricks cluster for scaled out data processing using Spark. ADF handles all the code translation, spark optimization and execution of transformation in Data Flows; it can handle massive amounts of data in very rapid succession.

In the current public preview, the Data Flow activities available are:

  • Joins – where you can join data from 2 streams based on a condition
  • Conditional Splits – allow you to route data to different streams based on conditions
  • Union – collecting data from multiple data streams
  • Lookups – looking up data from another stream
  • Derived Columns – create new columns based on existing ones
  • Aggregates – calculating aggregations on the stream
  • Surrogate Keys – this will add a surrogate key column to output streams from a specific value
  • Exists – check to see if data exists in another stream
  • Select – choose columns to flow into the next stream that you’re running
  • Filter – you can filter streams based on a condition
  • Sort – order data in the stream based on columns

Getting Started:

To get started with Data Flow, you’ll need to sign up for the Preview by emailing adfdataflowext@microsoft.com with your ID from the subscription you want to do your development in. You’ll receive a reply when it’s been added and then you’ll be able to go in and add new Data Flow activities.

At this point, when you go in and create a Data Factory, you’ll now have 3 options: Version 1, Version 2 and Version 2 with Data Flow.

Next, go to aka.ms/adfdataflowdocs and this will give you all the documentation you need for building your first Data Flows, as well as work and play around with some samples already built. You can then create your own Data Flows and add a Data Flow activity to your pipeline to execute and test your own Data Flow in debug mode in the pipeline. Or you can use Trigger Now in the pipeline to test your Data Flow from a pipeline activity.

Ultimately, you can operationalize your Data Flow by scheduling and monitoring your Data Factory pipeline that is executing the Data Flow activity.

With Data Flow we have the data orchestration and transformation piece we’ve been missing. It gives us a complete picture for the ETL/ELT scenarios that we want to do in the cloud or hybrid environments, your on prem to cloud or cloud to cloud.

With Data Flow, Azure Data Factory has become the true cloud replacement for SSIS and this should be in GA by year’s end. It is well designed and has some neat features, especially how you build your expressions which works better than SSIS in my opinion.

When you get a chance, check out Azure Data Factory and its Data Flow features and let me know if you have any questions!

Intro to Azure Databricks Delta

If you know about or are already using Databricks, I’m excited to tell you about Databricks Delta. As most of you know, Apache Spark is the underlining technology for Databricks, so about 75-80% of all the code in Databricks is still Apache Spark. You get that super-fast, in-memory processing of both streaming and batch data types as some of the founders of Spark built Databricks.

The ability to offer Databricks Delta is one big difference between Spark and Databricks, aside from the workspaces and the collaboration options that come native to Databricks. Databricks Delta delivers a powerful transactional storage layer by harnessing the power of Spark and Databricks DBFS.

The core abstraction of Databricks Delta is an optimized Spark table that stores data as Parquet files in DBFS, as well as maintains a transaction log that efficiently tracks changes to the table. So, you can read and write data, stored in the Delta format using Spark SQL batch and streaming APIs that you use to work with HIVE tables and DBFS directories.

With the addition of the transaction log, as well as other enhancements, Databricks Delta offers some significant benefits:

ACID Transactions – a big one for consistency. Multiple writers can simultaneously modify a dataset and see consistent views. Also, writers can modify a dataset without interfering with jobs reading the dataset.

Faster Read Access – automatic file management organizes data into large files that can be read efficiently. Plus, there are statistics that enable speeding up reads by 10-100x and data skipping avoids reading irrelevant information. This is not available in Apache Spark, only in Databricks.

Databricks Delta is another great feature of Azure Databricks that is not available in traditional Spark further separating the capabilities of the products and providing a great platform for your big data, data science and data engineering needs.

Azure Database for MariaDB in Preview

Microsoft has recently announced another Platform as a Service (PaaS) offering with the release of MariaDB in Preview in Azure. I’d like to tell you more about that offering and what are some of its advantages.

First, a little history on MariaDB. MariaDB is a community developed fork of the MySQL. Essentially, when Oracle purchased MySQL from Sun, some of the developers from MySQL were concerned that the acquisition would make changes or lead down the road where it would no longer be open source.

So, they went ahead and forked off to MariaDB with the intent to maintain high compatibility with MySQL. Also, the contributors are required to share their copyright with MariaDB foundation rights, which in a nutshell means they want this foundation to always be open source.

Now take the open source technology of MariaDB, which is a proven and valuable method for many companies, and compile that with the fact that it’s in the Azure platform as a managed offering, consumers using this offering from Azure get to take advantage of some of the standard capabilities in the PaaS model, such as:

  • Built-in high availability with no extra cost
  • 3 tiers (basic, general purpose and memory optimized) which you can choose depending on your workload, transactional or analytical processing.
  • 99% availability SLAs
  • Capabilities for predictable performance done by built in monitoring and alerting, allowing the ability to quickly assess the effects of scaling V-Cores up or down based on current or projected performance needs – through automation or manually in seconds.
  • The secure protection of sensitive data at rest and in motion. Uses 256-bit encryption on secured disks in the Azure data centers and enforces an SSL connection for data in transit. Note: you can turn off the SSL requirement if your application doesn’t support it – I don’t recommend it, but it can be done.
  • Automatic backups so you can have point-in-time restore for up to 35 days

These are all the standard advantages when you turn on in Azure Database Platform as a Service offerings, like SQL DB or Mongo DB. Azure Database for MariaDB is just another option added by Microsoft to the portfolio for databases. And a great option to check out if you use MariaDB. This is now in Preview but I’m sure it will be made generally available pretty soon.

How to Gain Up to 9X Speed on Apache Spark Jobs

Are you looking to gain speed on your Apache Spark jobs? How does 9X performance speed sound? Today I’m excited to tell you about how engineers at Microsoft were able to gain that speed on HDInsight Apache Spark Clusters.

If you’re unfamiliar with HDInsight, it’s Microsoft’s premium managed offering for running open source workloads on Azure. You can run things like Spark, Hadoop, HIVE, and LLAP among others. You create clusters and spin them up and spin them down when you’re not using them.

The big news here is the recently released preview of HDInsight IO Cache, which is a new transparent data caching feature that provides customers with up to 9X performance improvement for Spark jobs, without an increase in costs.

There are many open source caching products that exist in the ecosystem: Alluxio, Ignite, and RubiX to name a few big ones. The IO Cache is also based on RubiX and what differentiates RubiX from other comparable caching products is its approach of using SSD and eliminating the need for explicit memory management. While other comparable caching products leverage the reservation of operating memory for caching the data.

Because the SSDs typically provide more than 1 gigabit/second of bandwidth, as well as leverage operating system in-memory file cache, this gives us enough bandwidth to load big data compute processing engines like Spark. This allows us to run Spark optimally and handle bigger memory workloads and overall better performance, by speeding up these jobs that read data from remote cloud storage, the dominant architecture pattern in the cloud.

In benchmark tests comparing a Spark cluster with and without the IO Cache running, they performed 99 SQL queries against a 1 terabyte dataset and got as much as 9X performance improvement with IO Cache turned on.

Let’s face it, data is growing all over and the requirement for processing that data is increasing more and more every day. And we want to get faster and closer to real time results. To do this, we need to think more creatively about how we can improve performance in other ways, without the age-old recipe of throwing hardware at it instead of tuning it or trying a new approach.

This is a great approach to leverage some existing hardware and help it run more efficiently. So, if you’re running HDInsight, try this out in a test environment. It’s as simple as a check box (that’s off by default); go in, spin up your cluster and hit the checkbox to include IO Cache and see what performance gains you can achieve with your HDInsight Spark clusters.

Using Azure to Drive Security in Banking Using Biometrics

In the digital world we live in today, it’s getting harder to verify identity in industries such as banking. We now do less and less transactions in person. No longer do we go into banks with passbook in hand and make deposits or withdrawals face to face with a bank teller. Many of us have moved from ATM transactions to digital banking.

With this move, banks have tried many approaches of 2-factor authentication, some better than others and obviously the need is there for secure forms of authentication for the users. Let me tell you how Azure is driving identity security in banking using biometric identification. By combining biometrics with artificial intelligence, banks are now able to take new approaches at verifying the digital identity of their customers and prospects.

If you don’t know, biometrics is the process of uniquely identifying a person’s physical and personal traits. These are then recorded into a database and those images or features are captured into an electronic device and are used as a unique form of identification. Some methods we use biometrics are fingerprint and facial recognition, hand geometry, iris or eye scan and even odor or scents.

Because of their uniqueness, these are much more reliable in confirming a person’s identity than a password or access card. So, how do you verify a person is who they say they are if they’re not in person? Microsoft partners are now leveraging some of the Azure platform offerings to do this—things such as Cognitive Service’s Vision API and Azure Machine Learning tools for performing multi-factor authentication in the banking industry.

The way this works is the user provides a government issued ID (a license or passport for example) and they validate it against standards provided by the ID issuer, so they’re building an algorithm for verification of that ID and putting that into a database. So, when someone submits an ID from a particular state, we know what that ID is supposed to look like and we look for all the distinguishing features of that ID.

To take this a step further, the second factor is they’re using facial recognition software on things like your phone or computer, like Face ID for the iPhone. It will take your photo, but it will also take a video of you and force you to move your head in certain motions in order validate that is it you – you’re not wearing a mask or something – and that you’re alive.

It takes a picture of your ID and matches it to your facial constructions and compares them side by side; this becomes your digital signature. This is considered extremely secure as now you have two forms of verification and you’re using biometrics. Crazy stuff when you think about it but in the digital world we live in, you must go to these lengths to verify someone’s identity when they are not right in front you.

This is still in the early phase of what we’ll see but it’s cool to see how it’s being used and will be interesting to see how it progresses in the future. We’ve got great consultants working with Cognitive Services and Machine Learning. Anything data or Azure related, we’re doing it.