Category Archives: Review

What is Azure Firewall?

I’d like to discuss the recently announced Azure Firewall service that is now just released in GA. Azure Firewall is a managed, cloud-based network security service that protects your Azure Virtual Network resources. It is a fully stateful PaaS firewall with built-in high availability and unrestricted cloud scalability.

It’s in the cloud and Azure ecosystem and it has some of that built-in capability. With Azure Firewall you can centrally create, enforce and log application and network connectivity policies across subscriptions and virtual networks, giving you a lot of flexibility.

It is also fully integrated with Azure Monitor for log analytics. That’s big because a lot of firewalls are not fully integrated with log analytics which means you can’t centralize these logs in OMS, for instance, which would give you a great platform in a single pane of glass for monitoring many of the technologies being used in Azure.

Some of the features within:

  • Built in high availability, so there’s no additional load balances that need to be built and nothing to configure.
  • Unrestricted cloud scalability. It can scale up as much as you need to accommodate changing network traffic flows – no need to budget for your peak traffic, it will accommodate any peaks or valleys automatically.
  • It has application FQDN filtering rules. You can limit outbound HTTP/S traffic to specified lists of fully qualified domain names including wildcards. And the feature does not require SSL termination.
  • There are network traffic filtering rules, so you can create, allow or deny network filtering rules by source and destination IP address, port and protocol. Those rules are enforced and logged across multiple subscriptions and virtual networks. This is another great example of having availability and elasticity to be able to manage many components at one time.
  • It has fully qualified domain name tagging. If you’re running Windows updates across multiple servers, you can tag that service as an allowed service to come through and then it becomes a set standard for all your services behind that firewall.
  • Outbound SNAT and inbound DNAT support, so you can identify and allow traffic originating from your virtual network to remote Internet destinations, as well as inbound network traffic to your firewall public IP address is translated (Destination Network Address Translation) and filtered to the private IP addresses on your virtual networks.
  • That integration with Azure Monitor that I mentioned in which all events are integrated with Azure Monitor, allowing you to archive logs to a storage account, stream events to your Event Hub, or send them to Log Analytics.

Another nice thing to note is when you set up an express route or a VPN from your on premises environment to Azure, you can use this as your single firewall for all those virtual networks and allow traffic in and out from there and monitor it all from that single place.

This was just released in GA so there are a few hiccups, but if none of the service challenges effect you, I suggest you give it a try. It will only continue to come along and get better as with all the Azure services. I think it’s going to be a great firewall service option for many.

What is Azure Data Box and Data Box Disk?

Are you looking to move large amounts of data into Azure? How does doing it for free sound and with an easier process? Today I’m here to tell you how to do just that with the Azure Data Box.

Picture this: you have a ton of data, let’s say 50 terabytes on-prem, and you need to get that into Azure because you’re going to start doing incremental back ups of a SQL Database, for instance. You have two options to get this done.

First option is to move that data manually. Which means you have to chunk it, set it up using AZ copy or a similar Azure data tool, put it up in a blob storage, then extract it and continue with the process. Sounds pretty painful, right?

Your second option is to use Azure Data Box which allows you to move large chunks of data up into Azure. Here’s how simple it is:

  • You order the Data Box through Azure (currently available in the US and EU)
  • Once received, you connect it to your environment however you plan to move that data
  • It uses standard protocols like SMB and CIFS
  • You copy the data you want to move and return the Data Box back to Azure and then they will upload the data into your storage container(s)
  • Once the data is uploaded, they will securely erase that Data Box

With the Data Box you get:

  • 256-bit encryption
  • A super tough, hardened box that can withstand drops or water, etc.
  • It can be pushed into Azure Blob
  • You can copy data up to 10 storage accounts
  • There are two 1 gigabit/second and two 10 gigabit/second connections to allow quick movement of data off your network onto the box

In addition, Microsoft has recently announced the Data Box Disk, which is a small 8 terabyte disk that you can order up to five of as part of the Data Box Disk.

With Data Box Disc you get:

  • 35 terabytes of usable capacity per order
  • Supports Azure Blobs
  • A USB SATA 2 and 3 interface
  • Uses 128-bit encryption
  • Like Data Box, it’s a simple process to connect it, unlock it, copy the data onto the disk and it send it back to copy those into a single storage account for you

Here comes the best part—while Azure Data Box and Data Box Disk are in Preview, this is a free service. Yes, you heard it right, Microsoft will send you the Data Box or Data Box Disk for free and you can move your data up into Azure for no cost.

Sure, it will cost you money when you buy your storage account and start storing large sums of data, but storage is cheap in Azure, so that won’t break the bank.

 

Overview of Azure Reserve VM Instances

We’re all looking for ways to save money within our Azure subscriptions and resources. How does a savings of up to 72% sound? Today I’d like to give you an Overview of  Azure Reserve Virtual Machine Instances, a payment option which allows you to get that savings off the standard pay as you go plan by pre-committing to a 1 or 3-year term for the compute of virtual machine usage.

If you know you’re going to use Azure virtual machines for an extended period for your cloud workloads, then this is worth looking at. Just keep in mind that this only covers the virtual machine compute; the networking, other software, Azure services or storage, as well as Windows and SQL Server licensing does not get applied to the reserve.

Although, people who have purchased on-prem licensing for their servers can use their Azure hybrid benefit which allows you to bring your own on-prem Windows and SQL licenses to Azure. If you’re currently using an enterprise agreement or pay as you go plan, if you choose to go with Azure Reserve VM Instances, your cost would be reduced against your enterprise agreement or the credit card that you use for your pay as you go plan would be billed according to what you’re using.

When you purchase your Reserve Instances, it’s instantaneous; you just go in and specify your machine type and the term (1 or 3 years). It will detect those machine types in your current subscriptions or if you’re adding new machine types, it will apply that savings to those machine types.

So, if you know you’re going to use a particular machine type for the next year, say for migration, you’ll experience a good savings by pre-committing up front. And the scope of the Reserved Instance can go across multiple subscriptions and apply the discount to each of them.

Gotchas

A couple things to note; first, when the term expires, it does not auto renew and your discount ends. You can renew your contract and choose your hardware that you need; you’re not stuck using the same hardware you originally specified. And second, Reserved Instances cannot be used for enterprise dev test subscriptions or virtual machines in Preview.

Overview of Azure Operations Management Suite (OMS)

In this post I’d like to give an overview of what Azure Operations Management Suite is and what it can be used for. First, Operations Management Suite, or OMS, is a collection of management services designed for Azure cloud. As new services are added to Azure, more capabilities are being built into OMS to allow for integration.

OMS allows you to collect things in one central place like the many Azure services that need deeper insight and manageability, all from one portal, as well as being able to set up different groups and different ways of viewing your data. OMS can also be used with on prem resources with Window and Linux Agent, so you can collect logs or backup your servers or files to Azure, for example.

The key Operations Management Suite services are:

  • Log analytics allows you to monitor and analyze the availability and performance of different resources including physical and virtual machines, Azure Data Factory and other Azure services.
  • Proactive alerting for when an issue or problem in your environment is detected, so you can either take corrective action or have a preprogrammed corrective action.
  • Ability automate manual processes and enforce configuration for physical and virtual machines, like automating clean-up operations you do on servers for instance. You can do this through Runbooks which are based on PowerShell scripts or PowerShell workloads where you can programmatically do what you need to do within the OMS.
  • Integrate backups so the agent and integration allow for backing up a service, a file level; whatever you need to do for critical data and run those stores, whether they are on-prem or cloud-based resources.
  • Azure Site Recovery runs through OMS and helps you provide high availability for apps and servers that you’re running.
  • Orchestrate running your replication up into Azure. This allows you to do it from physical servers, Hyper Vs or VMware servers using Windows or Linux.

Mainly, it provides management solutions. These are prepackaged sets of templates provided by Microsoft and/or partners that help implement multiple OMS services at one time. One example is the Update Management Solution which creates a log search, dashboard and alerting inside log analytics, but at the same time creates an automation runbook for installing updates on the server. This will tell you when updates are available, when they’re needed and then let you automate the install of those updates.

There is a lot of power and capability that comes with the Operations Management Suite. It’s a great centralized management solution within Azure that is quick to configure and start using.

 

Business Continuity Strategies in Azure

Keeping businesses online and operational is a key concern, no matter the nature of your downtime. Most companies don’t focus on business continuity until it’s too late or have incomplete, untested barebones recovery plans. High Availability, Disaster Recovery and Backup are all critical to a complete business continuity solution. In a recent webinar, Senior Principal Architect Chris Seferlis discussed how leveraging Azure for disaster recovery and business continuity is the most effective way to ensure you’re protected.

If your business’s data is in the cloud, there is nothing is more pivotal than your cloud backup, recovery and migration procedures. Only 18% of decision makers feel fully prepared to recover their data center in the event of a site failure or disaster. The issues are out-of-date recovery plans and limited back-up and recovery testing.

Most disaster situations are caused by system failures, power failures, natural disasters and cyber-attacks. The challenges businesses face in disaster recovery are significant, including cost, complexity and reliability. To have a successful business continuity strategy, organizations must prioritize high availability, disaster recovery and data back-ups.

Disaster recovery is important; there is always a risk of failure with your data, including software bugs, hardware failure and human error. Important factors to consider are Recovery Time Objective (RTO); the targeted duration of time and a service level within which a business process must be restored after a disaster; and Recovery Point Objective (RPO), the maximum targeted period in which data might be lost from an IT service due to a major incident. Both RTO and RPO are business decisions.

Azure can protect against planned and unplanned events by distributing the placement of VMs across the infrastructure. Azure also helps with Disaster Recovery through consistent backup for Windows Azure VMs and file-system backup for Linux Azure VMs. Additionally, it provides efficient and reliable backups to the cloud with no infrastructure maintenance.

Watch the full webinar here to gain more insight into:

  • Azure Backup Server
  • VMWare VM Backup
  • Azure Site recovery
  • Disaster Recovery for Hyper-V
  • Azure Migration

Click here to view my slides from this presentation. If you’d like to learn more about business continuity using Azure or need help with any Azure project from discussions and planning to implementation, click the link below and talk to us today. We can help no matter where you are on your cloud journey.

Azure Blob Storage Lifecycle Management

When we talk about blob storage, we talk about the three different tiers – hot, cool or archive – for delegating the importance of data and how accessible it is. The challenge has been that when we picked the tier that was pretty much the end of story.

What we want is have our data accessible when and where we need it as it can take some time to pull from cool and archive tiers, as well as be costlier to retrieve. Also, with the more expensive hot tier, data can sit there unnecessarily, and we need a way to move it out after it becomes static or stale.

Here’s some good news! Microsoft recently introduced the public preview of Blob Storage Lifecycle Management. This now makes it easier to manage and automate that movement of data by offering a rule-based policy which you can use to transition your data to the best access tier, as well as expire data at the end of its lifecycle.

This great new toolset allows capability and flexibility to define rules for transitioning blobs to a cooler storage. You can also delete blobs by defining how long a blob should live there, define rules to be executed daily or apply rules to storage containers or subsets of blobs, thus allowing you to access certain blob containers and delete others that you specify based on how you’re moving that data around.

So, you can set up a scenario where data hasn’t been accessed in 3 months and it’s set to be transitioned from hot storage to cool, but then it sits there for 6 more months. You then want to be able to move that data off to archive. These are settings you can change based on the last modification date of the file.

You also can delete blob snapshots that have become stale after a defined period of time. Maybe you set it to delete after 120 days or maybe blobs that haven’t been accessed for a several year period—seven years being the magic number for audits and such.

Microsoft is great at listening to what users have to say and to keep evolving and adding more capability to the technology. If you love data, Azure and Azure Blob Storage as much as I do, let me know by sharing this video.

 

Overview of HDInsight R Server

Today I’ll wrap up my series on HDInsight with R Server. What R Server does is when you create an HDInsight cluster, you can select it as an option and it will provide data scientists, statisticians and R Programmers with on demand access to scalable and distributed methods of analytics on HDInsight.

Where it is open source, R allows you to leverage any of the 8,000+ open source packages. Because it falls in Microsoft’s big data analytics package, it includes the scale R routines. These routines provide things such as descriptive statistics, generalized linear models, logistic regression, classification and regression trees, as well as decision forests.

You can run an edge node outside of a cluster that provides a great place to connect on the cluster. You can also run your R scripts which gives the option of running parallel distributed functions. The models that are built can be downloaded for on prem use and can also be sent to Azure Machine Learning Studio for further processing and scoring.

So, why would you choose the Microsoft R Server over other options?

  • Microsoft is putting a lot behind AI and R Server and this big data offering as part of the HDInsight suite.
  • It provides an internally built set of algorithms and when you combine that with the open source community offerings, you create a bridge for cutting edge AI, machine and deep learning applications.
  • As with other Azure offerings, you’re getting a simplified, secure, highly scalable environment, so instead of wasting time building those clusters in-house, you can focus on the capabilities of the platform itself by quickly and easily spinning up a cluster.

Many of these topics have been discussed throughout this series about the capabilities of HDInsight and what each has to offer. Looking at R, some key features are:

    • R enabled for the R programming language with runtime infrastructure for script execution.
    • Also, Python enabled with runtime infrastructure for Python scripting.
    • Pre-trained models to help with visual analytics and text statement analysis that is ready to score the data you provide.
    • You can put the server into operations and deploy solutions as a web service very quickly; so you spin up your cluster, turn everything on, hook it into your domain, use your domain credentials and start training your models.
    • Remote web execution allows us to work from our work station and train models, rather than having to log directly into the server or use SSH or other means. It allows you to build your scripts locally and then execute them remotely, giving you more flexibility with the way you’re operating.

R Server fits within the Azure and HDInsight ecosystems, so you can use and easily integrate these technologies together, such as integrating with Azure Data Factory or Azure Data Bricks, etc.

Overview of HDInsight Interactive Query

Last week I began a series on HDInsight. Today I’m continuing that series with a focus on Interactive Query. Interactive Query leverages Hive which uses LLAP (Long Live and Process), also known as low latency analytical processing. This allows for interactivity with complex data warehouse-style queries on big data, that is stored in commodity storage, such as a blob or Data Lake Store.

This stand-alone cluster is separate from HDI Hadoop clusters; it only contains the Hive service. The LLAP replaces the direct interaction with the HDFS data node, allowing for caching, prefetching, some light query processing and access control. Heavier query processing workloads are still happening at the yarn container with text orchestration, and that helps with the overall execution.

Obviously, it’s much more efficient to be able to query the data interactively where the data is prepared, rather than needing to move the data from one storage location to another, as we normally would with data warehousing. It allows for faster insight and resiliency, as well as reduced effort and simplified architecture – less components meets more simplicity.

There are several ways to execute Hive queries from Interactive Query:

  • Power BI, so you can tap right into it with your Power BI reports
  • Zeppelin notebooks
  • Visual Studio
  • Ambari with Hive View
  • Beeline from head node or an empty edge node
  • ODBC

You can also leverage existing workloads, so if you’re running batch or ETL workloads using HDInsight, you can attach your Interactive Query cluster to an existing metastore and data storage without any additional overhead.

There may be a need to convert CSV or JSON files into ORC, Parquet or Avro field as they can be more efficient for Hadoop processing. But with Interactive Query, that need is either lessened or eliminated because they can load that data into memory. The queries now determine what is cached and what can just run quickly since it’s running in memory instead of running from a storage area.

It also uses the Enterprise Security Package and Azure Log Analytics. These two features get wrapped into more of a true enterprise offering and allows your users to use their simplified Active Directory domain log in. Users can connect using Interactive Query and run their workloads without having to have a separate set of credentials, plus you can monitor your nodes from the Log Analytics piece. This helps you bring that data into OMS for a top down view and an understanding of what the whole environment looks like.

Interactive Query offers some great opportunities to run things more efficiently and smaller workloads can be run very quickly.

 

Overview of HDInsight Kafka

Continuing with my HDInsight series, today I’ll be talking about Kafka. HDInsight Kafka will sound much like Storm but as I get into the nuts the bolts you’ll see the differences. Kafka is an open source distributed stream platform that can be used to build real time data streaming pipelines and applications with a message broker functionality, like a message cue.

Some specific Kafka improvements with HDInsight:

  • 99.9% uptime from HDInsight
  • You get 16 terabyte managed discs which increases the scale and reduces the number of required nodes for traditional Kafka clusters, which would have a limit of 1 terabyte.
  • Kafka takes a single rack view, but Azure is designed in 2 dimensions for update and fault domains. Thus, Microsoft designed special tools to rebalance the partitions and replicas. Once you scale out, you would repartition your data and then you’d be able to take advantage of the additional nodes, as well as when you scale down.
  • Kafka allows you to change the number of worker nodes for scaling up/down, depending on the workload and this can be done through the portal or PowerShell or any automation tool within Azure.
  • Direct integration with Azure log analytics. This looks at the virtual machine level information like the disc and the network. The importance of this is it allows you to roll that up into the Microsoft OMS suite for global log analytics. So, when you’re looking at all your resources in Azure through OMS, it helps you to see it at a high level and also drill in for more details.
  • The Zookeeper manages the state of the cluster which helps the concurrency, resiliency and the low latency transactions, as well as the orchestration of the data through the nodes and clusters.
  • Records are stored in topics which is produced by a producer and consumed by consumers. The producers send records to Kafka brokers and each worker node in the cluster is considered a broker. These brokers are what is helping the data move around inside the clusters.

Again, Kafka and Storm sound relatively similar, here’s some major differences:

    • Storm was invented by Twitter; Kafka by LinkedIn. But these are all using the Hadoop platform and it’s an open source, so they can build their own iterations.
    • Storm is meant more for real time message processing; Kafka is for distributed messaging processing.
    • Storm can take data from Kafka and other database system and process the data; Kafka is taking in those streams from things like Facebook, Twitter and LinkedIn.
    • Kafka is a message broker; Storm’s primary use is stream processing.
    • In Storm there is no data storage, you can only stream data through it; Kafka stores the data on the file system. As those streams are processed, Storm can do it much faster, on a micro-batch processing level. Kafka is doing small batches, larger than micro.
    • As far as dependency, Kafka requires Zookeeper for all the orchestration; Storm does not depend on anything externally.
    • Storm has a latency of milliseconds; with Kafka it depends on the source of the data, but typically takes slightly less than 1-2 seconds. So, you’re keeping the data local in Kafka, processing it, then pushing it somewhere else. Whereas with Storm, you’re processing the data in motion as you’re pushing it somewhere else.

Basically, two different ways to solve similar problems depending on the use case. It apparently worked better for LinkedIn to design it this way as opposed to the way that Twitter handles their data.

 

Overview of HDInsight Storm

Next in my series on HDInsight, today I’ll be talking about Storm. HDInsight Storm is a distributed stream processing computational framework. It uses spouts which define information sources and bolts which are manipulations in processing to allow batch distributed processing of streaming data.

Think of it’s apology in the shape of a direct acyclic graph. It’s a DHE where the edges are named streams and direct the data from node to node. When you put it all together, it creates the data transformation pipeline.

When you break it down, it’s topology is like that of map/reduce jobs; the difference being that map/reduce jobs run in individual batches and Storm is processed continuously in real time.

The Storm cluster has 2 different types of nodes. There’s a Master node which executes a Nimbus which assigns tasks to machines and monitors their performance. The Worker node runs Supervisor which assigns tasks to the other worker nodes and operates them as needed.

The Storm cluster can’t monitor its own state and health, so it deploys a Zookeeper node to connect to the Nimbus and Supervisor to keep an eye on things.

The 3 main components of Storm are:

1. The topology which is basically a network for the stream and spout.

2. The stream which is an unbounded pipeline of tuples.

3. The spout which is the source of the data which converts the data to the tuple of streams and then sends the bolts to be processed.

What makes this effective is that the data processing engine is guaranteed as far as every tuple will be fully processed and delivered, giving it a 99.9% uptime SLA from Microsoft. It does this by tracking the lineage of the tuple as it makes its way through the typology. It works like a query system as the messages can be replayed if there’s a failure in delivery.

Some use cases for Storm:

    • Writing the data after it gets processed into an Azure Data Lake Store.
    • As a source for Azure Event Hubs, as well as processing events from here. It can take a vehicle sensor, for instance, and can process it in Event Hubs, then send the data to Cosmos DB or an Azure Storage Blob.
    • Twitter is using Storm in a variety of ways. They use it for discovery on their data, running real time analytics and personalization in real time, so when you log into Twitter it knows your preferences based on past visits. It also works for real time Search and for their own internal revenue optimization.

As with other HDInsight components, it’s used among various typologies to solve and satisfy big data requirements and workloads. For example, if you were doing a customer churn analysis in real time based on a Twitter feed, this would be a technology you would use along side Hadoop.