In today’s post I’d like to talk about the Enterprise Security Package for Azure HDInsight. HDInsight is a managed cloud Platform as a Service offering built on the Hadoop framework. It allows you to build big data solutions using Hadoop, Spark, Hive, LLAP and R, among others.
Let’s think about the traditional deployment of Hadoop. In traditional deployment, you would deploy a cluster, give local admin access to users with SSH access to that cluster. Then you would hand it over to the data scientists, so they could do what they needed to run those data science workloads; train the models, run scripts and such.
With the adoption of these types of big data workloads into the enterprise, it became much more reliant on enterprise security. There was a need for role-based access control with Active Directory permissions. Admins wanted to get greater visibility into who was accessing the data and when, as well as what they tried to get into and were they successful in their attempts or not – basically all those audit requirements when we’re working with large data sets.
Who is the leader in enterprise security? Microsoft, of course, for Active Directory. The Enterprise Security Package allows you to add the cluster to the domain within the creative process, as a sort of ‘add-on’ to your Azure portal. Other things it allows you to do are:
- Add an HDI cluster with Active Directory Domain Services.
- Role based access control for HIVE, Spark and Interactive HIVE using Apache Ranger.
- Specific file and folder permissions for the data inside of an Azure Data Lakes Store.
- Auditing of logs to see who has access to what and when.
Currently, these features are only available for Spark, Hadoop and Interactive Query workloads, but more workloads will be adopted soon.