Today I’ll wrap up my series on HDInsight with R Server. What R Server does is when you create an HDInsight cluster, you can select it as an option and it will provide data scientists, statisticians and R Programmers with on demand access to scalable and distributed methods of analytics on HDInsight.
Where it is open source, R allows you to leverage any of the 8,000+ open source packages. Because it falls in Microsoft’s big data analytics package, it includes the scale R routines. These routines provide things such as descriptive statistics, generalized linear models, logistic regression, classification and regression trees, as well as decision forests.
You can run an edge node outside of a cluster that provides a great place to connect on the cluster. You can also run your R scripts which gives the option of running parallel distributed functions. The models that are built can be downloaded for on prem use and can also be sent to Azure Machine Learning Studio for further processing and scoring.
So, why would you choose the Microsoft R Server over other options?
- Microsoft is putting a lot behind AI and R Server and this big data offering as part of the HDInsight suite.
- It provides an internally built set of algorithms and when you combine that with the open source community offerings, you create a bridge for cutting edge AI, machine and deep learning applications.
- As with other Azure offerings, you’re getting a simplified, secure, highly scalable environment, so instead of wasting time building those clusters in-house, you can focus on the capabilities of the platform itself by quickly and easily spinning up a cluster.
Many of these topics have been discussed throughout this series about the capabilities of HDInsight and what each has to offer. Looking at R, some key features are:
- R enabled for the R programming language with runtime infrastructure for script execution.
- Also, Python enabled with runtime infrastructure for Python scripting.
- Pre-trained models to help with visual analytics and text statement analysis that is ready to score the data you provide.
- You can put the server into operations and deploy solutions as a web service very quickly; so you spin up your cluster, turn everything on, hook it into your domain, use your domain credentials and start training your models.
- Remote web execution allows us to work from our work station and train models, rather than having to log directly into the server or use SSH or other means. It allows you to build your scripts locally and then execute them remotely, giving you more flexibility with the way you’re operating.
R Server fits within the Azure and HDInsight ecosystems, so you can use and easily integrate these technologies together, such as integrating with Azure Data Factory or Azure Data Bricks, etc.