Tag Archives: Azure Synapse

Overview of Azure Synapse Link featuring CosmosDB

January 30, 2021 cseferlis Leave a comment

Azure Synapse Link allows you to connect to your transactional system directly to run analytical and machine learning workloads while eliminating the need for ETL/ELT, batch processing and reload wait times.

In this vLog, I explain how to turn the capability to use Link on in CosmosDB, and what’s happening under the covers to give access to that analytical workload without impacting the performance of your transactional processing system.

Check it out here and let me know what you think!

Azure, Azure Synapse

Getting started with Spark Pools in Azure Synapse

January 1, 2021 cseferlis Leave a comment

In my latest video blog I discuss getting started on the newly Generally Available Spark Pools as a part of Azure Synapse, another great option for Data Engineering/Preparation, Data Exploration, and Machine learning workloads

Without going too deep into the history of Apache Spark, I’ll start with the basics. Essentially, in the early days of Big Data workloads, a basis for machine learning and deep learning for advanced analytics and AI, we would use a Hadoop cluster and move all these datasets across disks, but the disks were always the bottleneck in the process. So, the creators of Spark said hey, why don’t we do this in memory and remove that bottleneck. So they developed Apache Spark as an in memory data processing engine as a faster way to process these massive datasets.

When the Azure Synapse team wanted to make sure that they were offering the best possible data solution for all different kinds of workloads, Spark gave the ability to have an option for their customers that were already familiar with the Spark environment, and included this feature as part of the complete Azure Synapse Analytics offering.

Behind the scenes, the Synapse team is managing many of the components you’d find in Open-Sourced Spark such as:

Apache Hadoop Yarn – for the management of the clusters where the data is being processed
Apache Livy – for the job orchestration
Anaconda – a package manager, environment manager, Python/R data science distribution and a collection of over 7500 open source packages for increasing the capabilities of the Spark clusters

I hope you enjoy the post. Let me know your thoughts or questions!

Azure, Azure Synapse, Training

Connecting to External Data with Azure Synapse

November 30, 2020 cseferlis Leave a comment

In my latest video blog I discuss and demonstrate some of the ways to connect to external data in Azure Synapse if there isn’t a need to import the data to the database or you want to do some ad-hoc analysis. I also talk about using COPY and CTAS statements if the requirement is to import the data after all. Check it out here

Uncategorized

Comparing Azure Synapse, Snowflake, and Databricks for common data workloads

November 6, 2020 cseferlis Leave a comment

In this vLog post I discuss how Azure Synapse, Databricks and Snowflake compare when it comes to common data workloads:

Data Science

Business Intelligence

Ad-Hoc data analysis

Data Warehousing

and more!

Uncategorized

Should I Choose Azure Data Factory or Synapse Studio

August 23, 2020 cseferlis Leave a comment

In this vLog, I cover the reasons why you might consider using Azure Data Factory, a mature cloud service for orchestration and processing of data over the newly GA Azure Synapse Studio.

Synapse has all of the same features as Azure Data Factory, but if you have a large development team working on ELT operations, or a simple data processing activity, it could make sense for the less-cluttered Azure Data Factory.

Take a look at the vLog here and let me know your thoughts on other scenarios for you!

Azure, Database, Databricks, Power BI, SQL Server, Strategy, Training

The Modern Data Warehouse in Azure Part 4: The Serving Layer

May 10, 2020 cseferlis Leave a comment

In this video blog post I covered the serving layer step of building your Modern Data Warehouse in Azure. There are certainly some decisions to be made around how you want to structure your schema as you get it ready for presentation with whatever your business intelligence tool of choice, for this example I used Power BI, so I discuss some of the areas you should focus on:

What is your schema type? Snowflake or Star, or something else?
Where should you serve up the data? SQL Server, Synapse, ADLS, Databricks, or Something Else?
What are your Service level agreements for the business? What are your data processing times?
Can you save cost by using an option that’s less compute heavy?

BizDataViz

Tag Archives: Azure Synapse

Overview of Azure Synapse Link featuring CosmosDB

Getting started with Spark Pools in Azure Synapse

Connecting to External Data with Azure Synapse

Comparing Azure Synapse, Snowflake, and Databricks for common data workloads

Should I Choose Azure Data Factory or Synapse Studio

The Modern Data Warehouse in Azure Part 4: The Serving Layer

Azure and SQL Data Blog

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Azure and SQL Data Blog