All posts by cseferlis

Improve Your Security Posture with Azure Secure Score

Security is a top priority for every business and we can never have enough of it, right? But at what point does it become too much to administer and prioritize security threats? I’m excited to tell you about a newly announced offering called Azure Secure Score which is part of the Azure Security Center.

If you’re unfamiliar, the Azure Security Center is a centralized place where you can get security recommendations based on the workloads you’ve deployed. In September at Ignite, Microsoft announced Secure Score as a security analytics tool that provides visibility of your organization’s security posture, as well as help you understand how secure your workloads are by assigning them a score.

The new Secure Score helps you prioritize and triage your response to security recommendations. It takes into consideration the severity and impact of the recommendation and based on that info it assigns a numerical value to show how fixing the recommendation can improve your security posture.

Once you implement a recommendation, the score and the overall Secure Score updates.

The main goals of Secure Score are:

  • To provide the capabilities that allow you to visualize the security posture.
  • Quickly triage and make suggestions to provide impactful actions that increase your security posture.
  • Measures the workload of the security over time.

So, how does Azure Security Center and Secure Score work?

  • Azure Security Center constantly reviews your active recommendations and calculates your Secure Score based on these.
  • The score of a recommendation is derived from its severity and security best practices that will affect your workload security over time.
  • It looks at your security and where you sit over a period. It’s not an immediate result and it won’t immediately change but it’s going to help you build up your score as you implement any recommendations and then you can silence them.
  • The Secure Score is calculated based on the ratio between your healthy resources and your total resources. If the number of healthy resources is equal to your total resources, you get the highest score value.
  • The overall score is an accumulation of all your recommendations. You can view your overall Secure Score across your subscriptions or management groups depending on the scope you select. The score will also vary based on the subscriptions selected and the active recommendations on them.

Remember, this is a marathon, not a sprint. It takes time to do the remediation, whether it be patching machines or closing ports or shutting off services. There are so many remedies offered that will make you more secure down the road. With this offering, you get a ‘scorecard’ for yourself and a look at what’s most imperative to implement first.

Be sure to check out the Azure Security Center. There are a lot of free options there as well as options to add additional services at a cost.

New Development Feature for Azure Stream Analytics

Gaining insights from our data, especially in real-time is an important part of any business. Today I’d like to talk about some new development options for Azure Stream Analytics. If you’re not clear on what Azure Stream Analytics is, it’s a fully managed cloud solution in Azure that allows you to rapidly develop and deploy low cost solutions to gain real-time insights from devices, sensors, infrastructure and applications.

Stream Analytics is part of the Azure IoT suite which brings IoT to life and allows you to easily connect your devices and analyze previously untapped data and integrate business systems. The IoT workspace is expanding as it offers so much capability and information for things like production floors, jet engines and automobiles, just to name a few. I did another blog on some of the features here.

Today my focus is a new feature that allows you to do some local testing within Visual Studio to query logic with live data without needing to run in the cloud. You can test your queries locally while using live data streams from the sources such as Event Hub, IoT Hub or Blob Storage. Also, you can use the Stream Analytics time policies to be able to start and stop queries in a matter of seconds.

This offers a big improvement in development productivity, as you can save a lot time on the inner loop of query logic testing.

Some major benefits are:

  • The behavior query consistency, so you get the same experience when you’re using Visual Studio or the cloud interface.
  • Much shorter test cycles. You normally can expect a lag in cloud development. Now testing queries directly in Visual Studio in your local environment presents the opportunity to show the shape of the data coming in to help you easily adjust the query and see some immediate results.

A couple of caveats with deployment in this new feature:

  • The local testing feature should only be used for functional testing purposes. It doesn’t replace the performance or scalability tests that you would do inside the cloud.
  • It really should not be used for production purposes since it doesn’t guarantee any kind of SLA.
  • Also note, that when you’re on your machine, you can rely on local resources but when you deploy to the cloud, you can scale out to multiple nodes which allows you to add more streams and additional resources in order to process those.
  • Cloud deployment ensures things like check pointing, upgrades and other features that you need for production deployments, as well as provides the infrastructure to run your jobs 24/7.

So, remember, this new enhancement is just for testing purposes to help shorten the query and development cycle and avoid the lag in other testing and development tools. But a cool, time saving feature to investigate, and Microsoft is adding more features to Azure Steam Analytics.

Azure Data Factory – Data Flow

I’m excited to announce that Azure Data Factory Data Flow is now in public preview and I’ll give you a look at it here. Data Flow is a new feature of Azure Data Factory (ADF) that allows you to develop graphical data transformation logic that can be executed as activities within ADF pipelines.

The intent of ADF Data Flows is to provide a fully visual experience with no coding required. Your Data Flow will execute on your own Azure Databricks cluster for scaled out data processing using Spark. ADF handles all the code translation, spark optimization and execution of transformation in Data Flows; it can handle massive amounts of data in very rapid succession.

In the current public preview, the Data Flow activities available are:

  • Joins – where you can join data from 2 streams based on a condition
  • Conditional Splits – allow you to route data to different streams based on conditions
  • Union – collecting data from multiple data streams
  • Lookups – looking up data from another stream
  • Derived Columns – create new columns based on existing ones
  • Aggregates – calculating aggregations on the stream
  • Surrogate Keys – this will add a surrogate key column to output streams from a specific value
  • Exists – check to see if data exists in another stream
  • Select – choose columns to flow into the next stream that you’re running
  • Filter – you can filter streams based on a condition
  • Sort – order data in the stream based on columns

Getting Started:

To get started with Data Flow, you’ll need to sign up for the Preview by emailing adfdataflowext@microsoft.com with your ID from the subscription you want to do your development in. You’ll receive a reply when it’s been added and then you’ll be able to go in and add new Data Flow activities.

At this point, when you go in and create a Data Factory, you’ll now have 3 options: Version 1, Version 2 and Version 2 with Data Flow.

Next, go to aka.ms/adfdataflowdocs and this will give you all the documentation you need for building your first Data Flows, as well as work and play around with some samples already built. You can then create your own Data Flows and add a Data Flow activity to your pipeline to execute and test your own Data Flow in debug mode in the pipeline. Or you can use Trigger Now in the pipeline to test your Data Flow from a pipeline activity.

Ultimately, you can operationalize your Data Flow by scheduling and monitoring your Data Factory pipeline that is executing the Data Flow activity.

With Data Flow we have the data orchestration and transformation piece we’ve been missing. It gives us a complete picture for the ETL/ELT scenarios that we want to do in the cloud or hybrid environments, your on prem to cloud or cloud to cloud.

With Data Flow, Azure Data Factory has become the true cloud replacement for SSIS and this should be in GA by year’s end. It is well designed and has some neat features, especially how you build your expressions which works better than SSIS in my opinion.

When you get a chance, check out Azure Data Factory and its Data Flow features and let me know if you have any questions!

Intro to Azure Databricks Delta

If you know about or are already using Databricks, I’m excited to tell you about Databricks Delta. As most of you know, Apache Spark is the underlining technology for Databricks, so about 75-80% of all the code in Databricks is still Apache Spark. You get that super-fast, in-memory processing of both streaming and batch data types as some of the founders of Spark built Databricks.

The ability to offer Databricks Delta is one big difference between Spark and Databricks, aside from the workspaces and the collaboration options that come native to Databricks. Databricks Delta delivers a powerful transactional storage layer by harnessing the power of Spark and Databricks DBFS.

The core abstraction of Databricks Delta is an optimized Spark table that stores data as Parquet files in DBFS, as well as maintains a transaction log that efficiently tracks changes to the table. So, you can read and write data, stored in the Delta format using Spark SQL batch and streaming APIs that you use to work with HIVE tables and DBFS directories.

With the addition of the transaction log, as well as other enhancements, Databricks Delta offers some significant benefits:

ACID Transactions – a big one for consistency. Multiple writers can simultaneously modify a dataset and see consistent views. Also, writers can modify a dataset without interfering with jobs reading the dataset.

Faster Read Access – automatic file management organizes data into large files that can be read efficiently. Plus, there are statistics that enable speeding up reads by 10-100x and data skipping avoids reading irrelevant information. This is not available in Apache Spark, only in Databricks.

Databricks Delta is another great feature of Azure Databricks that is not available in traditional Spark further separating the capabilities of the products and providing a great platform for your big data, data science and data engineering needs.

Azure Database for MariaDB in Preview

Microsoft has recently announced another Platform as a Service (PaaS) offering with the release of MariaDB in Preview in Azure. I’d like to tell you more about that offering and what are some of its advantages.

First, a little history on MariaDB. MariaDB is a community developed fork of the MySQL. Essentially, when Oracle purchased MySQL from Sun, some of the developers from MySQL were concerned that the acquisition would make changes or lead down the road where it would no longer be open source.

So, they went ahead and forked off to MariaDB with the intent to maintain high compatibility with MySQL. Also, the contributors are required to share their copyright with MariaDB foundation rights, which in a nutshell means they want this foundation to always be open source.

Now take the open source technology of MariaDB, which is a proven and valuable method for many companies, and compile that with the fact that it’s in the Azure platform as a managed offering, consumers using this offering from Azure get to take advantage of some of the standard capabilities in the PaaS model, such as:

  • Built-in high availability with no extra cost
  • 3 tiers (basic, general purpose and memory optimized) which you can choose depending on your workload, transactional or analytical processing.
  • 99% availability SLAs
  • Capabilities for predictable performance done by built in monitoring and alerting, allowing the ability to quickly assess the effects of scaling V-Cores up or down based on current or projected performance needs – through automation or manually in seconds.
  • The secure protection of sensitive data at rest and in motion. Uses 256-bit encryption on secured disks in the Azure data centers and enforces an SSL connection for data in transit. Note: you can turn off the SSL requirement if your application doesn’t support it – I don’t recommend it, but it can be done.
  • Automatic backups so you can have point-in-time restore for up to 35 days

These are all the standard advantages when you turn on in Azure Database Platform as a Service offerings, like SQL DB or Mongo DB. Azure Database for MariaDB is just another option added by Microsoft to the portfolio for databases. And a great option to check out if you use MariaDB. This is now in Preview but I’m sure it will be made generally available pretty soon.

How to Gain Up to 9X Speed on Apache Spark Jobs

Are you looking to gain speed on your Apache Spark jobs? How does 9X performance speed sound? Today I’m excited to tell you about how engineers at Microsoft were able to gain that speed on HDInsight Apache Spark Clusters.

If you’re unfamiliar with HDInsight, it’s Microsoft’s premium managed offering for running open source workloads on Azure. You can run things like Spark, Hadoop, HIVE, and LLAP among others. You create clusters and spin them up and spin them down when you’re not using them.

The big news here is the recently released preview of HDInsight IO Cache, which is a new transparent data caching feature that provides customers with up to 9X performance improvement for Spark jobs, without an increase in costs.

There are many open source caching products that exist in the ecosystem: Alluxio, Ignite, and RubiX to name a few big ones. The IO Cache is also based on RubiX and what differentiates RubiX from other comparable caching products is its approach of using SSD and eliminating the need for explicit memory management. While other comparable caching products leverage the reservation of operating memory for caching the data.

Because the SSDs typically provide more than 1 gigabit/second of bandwidth, as well as leverage operating system in-memory file cache, this gives us enough bandwidth to load big data compute processing engines like Spark. This allows us to run Spark optimally and handle bigger memory workloads and overall better performance, by speeding up these jobs that read data from remote cloud storage, the dominant architecture pattern in the cloud.

In benchmark tests comparing a Spark cluster with and without the IO Cache running, they performed 99 SQL queries against a 1 terabyte dataset and got as much as 9X performance improvement with IO Cache turned on.

Let’s face it, data is growing all over and the requirement for processing that data is increasing more and more every day. And we want to get faster and closer to real time results. To do this, we need to think more creatively about how we can improve performance in other ways, without the age-old recipe of throwing hardware at it instead of tuning it or trying a new approach.

This is a great approach to leverage some existing hardware and help it run more efficiently. So, if you’re running HDInsight, try this out in a test environment. It’s as simple as a check box (that’s off by default); go in, spin up your cluster and hit the checkbox to include IO Cache and see what performance gains you can achieve with your HDInsight Spark clusters.

Using Azure to Drive Security in Banking Using Biometrics

In the digital world we live in today, it’s getting harder to verify identity in industries such as banking. We now do less and less transactions in person. No longer do we go into banks with passbook in hand and make deposits or withdrawals face to face with a bank teller. Many of us have moved from ATM transactions to digital banking.

With this move, banks have tried many approaches of 2-factor authentication, some better than others and obviously the need is there for secure forms of authentication for the users. Let me tell you how Azure is driving identity security in banking using biometric identification. By combining biometrics with artificial intelligence, banks are now able to take new approaches at verifying the digital identity of their customers and prospects.

If you don’t know, biometrics is the process of uniquely identifying a person’s physical and personal traits. These are then recorded into a database and those images or features are captured into an electronic device and are used as a unique form of identification. Some methods we use biometrics are fingerprint and facial recognition, hand geometry, iris or eye scan and even odor or scents.

Because of their uniqueness, these are much more reliable in confirming a person’s identity than a password or access card. So, how do you verify a person is who they say they are if they’re not in person? Microsoft partners are now leveraging some of the Azure platform offerings to do this—things such as Cognitive Service’s Vision API and Azure Machine Learning tools for performing multi-factor authentication in the banking industry.

The way this works is the user provides a government issued ID (a license or passport for example) and they validate it against standards provided by the ID issuer, so they’re building an algorithm for verification of that ID and putting that into a database. So, when someone submits an ID from a particular state, we know what that ID is supposed to look like and we look for all the distinguishing features of that ID.

To take this a step further, the second factor is they’re using facial recognition software on things like your phone or computer, like Face ID for the iPhone. It will take your photo, but it will also take a video of you and force you to move your head in certain motions in order validate that is it you – you’re not wearing a mask or something – and that you’re alive.

It takes a picture of your ID and matches it to your facial constructions and compares them side by side; this becomes your digital signature. This is considered extremely secure as now you have two forms of verification and you’re using biometrics. Crazy stuff when you think about it but in the digital world we live in, you must go to these lengths to verify someone’s identity when they are not right in front you.

This is still in the early phase of what we’ll see but it’s cool to see how it’s being used and will be interesting to see how it progresses in the future. We’ve got great consultants working with Cognitive Services and Machine Learning. Anything data or Azure related, we’re doing it.

Introducing Azure SQL Database Hyperscale Service Tier

If your current SQL Database service tier is not well suited to your needs, I’m excited to tell you about a newly created service tier in Azure called Hyperscale. Hyperscale is a highly scalable storage and compute performance tier that leverages the Azure architecture to scale out resources for Azure SQL Database beyond the current limitations of general purpose and business critical service tiers.

The Hyperscale service tier provides the following capabilities:

  • Support for up to 100 terabytes of database size (and this will grow over time)
  • Faster large database backups which are based on file snapshots
  • Faster database restores (also based on file snapshots)
  • Higher overall performance due to higher log throughput and faster transaction commit time regardless of the data volumes
  • The ability to rapidly scale out. You can provision one or more read only nodes for offloading your read workload for use as hot standbys.
  • You can rapidly scale up your compute resources (in constant time) to accommodate heavy workloads, so you can scale compute up and down as needed just like Azure Data Warehouse

Who should consider moving over to the Hyperscale tier? This is not an inexpensive tier, but it’s a great choice for companies who have large databases and have not been able to use Azure databases in the past due to its 4-terabyte limit, as well as for customers who see performance and scalability limitations with the other 2 service tiers.

It is primarily designed for transactional or OLTP workloads. However, it does support hybrid and OLAP workloads, but something to keep in mind when designing out your databases and services. It’s also important to note that elastic pools do not support the Hyperscale service tier.

How does it work?

  • You separate the compute and storage out into 4 separate nodes similar to Azure Data Warehouse.
  • The compute node is where the relational engine lives or where the querying process happens.
  • The page server node is where the scaled-out storage engine resides and where database pages are served out to the compute nodes on demand and keeps pages updated as transactions update data, so these nodes are moving the data around for you.
  • The log service node is where the log records are kept as they come in from the compute node and kept in a durable cache, then they’re forwarded along to additional compute nodes and caches to ensure consistency. When all this is spread out and everything is consistently spread across the compute nodes, it will get stored in Azure storage for long term storage of your logs.
  • Lastly, the Azure storage node is where all the data is pushed from the page servers. So, all the data that eventually lands in the database gets pushed over to Azure storage and this is also the storage that gets used for backups, as well as where the replication between availability groups happens.

This Hyperscale tier is an exciting opportunity for those customers that don’t have their requirements fulfilled with prior service tiers. It’s another great Microsoft offering that’s worth checking out if you have had these service tier issues up to now. And it helps to leave a line of distinction between Azure Data Warehouse and Azure Database because you now can scale out/up and tons of data, but it’s still built out for the transactional processing, as opposed to Azure Data Warehouse which is more of the analytical or massively parallel processing.

What is Azure Automation?

So, what do you know about Azure Automation? In this post, I’ll fill you in on this cool, cloud-based automation service that provides you the ability to configure process automation, update management and system configuration, which is managed across your on-premises resources, as well as your Azure cloud-based resources.

Azure Automation provides complete control of deployment operation and decommissions of workloads and resources for your hybrid environment. So, we can have a single pane of glass for managing all our resources through automation.

Some features I’d like to point out are:

  • It allows you to automate those mundane, error-prone activities that you perform as part of your system configuration and maintenance.
  • You can create Notebooks in PowerShell or Python that help you reduce the chance for misconfiguration errors. And it will help lower operational costs for the maintenance of those systems, as you can script it out to do it when you need instead of manually.
  • The Notebooks can be developed for on-premises or Azure resources and they use Web Hooks that allow you to trigger automation from things such as ITSM, Dev Ops and monitoring systems. So, you can run these remotely and trigger them from wherever you need to.
  • On configuration management side, you can build these desired state configurations for your enterprise environment. This will help you to set a baseline for how your systems will operate and will identify when there’s a variance from the initial system configuration, alerting you of any anomalies that could be problematic.
  • It has a rich reporting back end and alerting interface for full visibility into what’s happening in your Windows and Linux systems – on-premises and in Azure.
  • Gives you update management aspects (in Windows and Linux) to help you define the aspects of how updates are applied, and it helps administrators to specify which updates will be deployed, as well as successful or unsuccessful deployments and the ability to specify which updates should not be deployed to systems, all done through PowerShell or Python scripts.
  • It can share capabilities, so when you’re using multiple resources or building those Notebooks for automation, it allows you to share the resources to simplify management. You can build multiple scripts but use the same resources over and over as references for things like role-based access control, variables, credentials, certificates, connections, schedules and access to source control and PowerShell modules. You can check these in and out of source control like any kind of code-based project.
  • Lastly, and one of the coolest features in my opinion, where these are templates you’re deploying out in your systems, everyone has some similar challenges. There’s a community gallery where you can go and download templates others have created or upload ones you’ve created to share. With a few basic configuration tweaks and review to make sure they’re secure, this is a great option for making the process faster by finding an existing script and cleaning it up and deploying it in your systems and environment.

So, there’s a lot you can do with this service and I think it’s worth checking out as it can make your maintenance and management much simpler.

What is Azure Firewall?

I’d like to discuss the recently announced Azure Firewall service that is now just released in GA. Azure Firewall is a managed, cloud-based network security service that protects your Azure Virtual Network resources. It is a fully stateful PaaS firewall with built-in high availability and unrestricted cloud scalability.

It’s in the cloud and Azure ecosystem and it has some of that built-in capability. With Azure Firewall you can centrally create, enforce and log application and network connectivity policies across subscriptions and virtual networks, giving you a lot of flexibility.

It is also fully integrated with Azure Monitor for log analytics. That’s big because a lot of firewalls are not fully integrated with log analytics which means you can’t centralize these logs in OMS, for instance, which would give you a great platform in a single pane of glass for monitoring many of the technologies being used in Azure.

Some of the features within:

  • Built in high availability, so there’s no additional load balances that need to be built and nothing to configure.
  • Unrestricted cloud scalability. It can scale up as much as you need to accommodate changing network traffic flows – no need to budget for your peak traffic, it will accommodate any peaks or valleys automatically.
  • It has application FQDN filtering rules. You can limit outbound HTTP/S traffic to specified lists of fully qualified domain names including wildcards. And the feature does not require SSL termination.
  • There are network traffic filtering rules, so you can create, allow or deny network filtering rules by source and destination IP address, port and protocol. Those rules are enforced and logged across multiple subscriptions and virtual networks. This is another great example of having availability and elasticity to be able to manage many components at one time.
  • It has fully qualified domain name tagging. If you’re running Windows updates across multiple servers, you can tag that service as an allowed service to come through and then it becomes a set standard for all your services behind that firewall.
  • Outbound SNAT and inbound DNAT support, so you can identify and allow traffic originating from your virtual network to remote Internet destinations, as well as inbound network traffic to your firewall public IP address is translated (Destination Network Address Translation) and filtered to the private IP addresses on your virtual networks.
  • That integration with Azure Monitor that I mentioned in which all events are integrated with Azure Monitor, allowing you to archive logs to a storage account, stream events to your Event Hub, or send them to Log Analytics.

Another nice thing to note is when you set up an express route or a VPN from your on premises environment to Azure, you can use this as your single firewall for all those virtual networks and allow traffic in and out from there and monitor it all from that single place.

This was just released in GA so there are a few hiccups, but if none of the service challenges effect you, I suggest you give it a try. It will only continue to come along and get better as with all the Azure services. I think it’s going to be a great firewall service option for many.