All posts by cseferlis

Here comes your CoPilot

The new age of large language models (LLMs), and the ability to accelerate various forms of novel thought is being cast upon us at a rapid pace. Just like an airplane copilot, we are seeing an explosion of tools in various areas to help us do our everyday jobs, making us more productive and freeing up additional time to enhance our creativity… Or play candy crush.

You have already seen a plethora of announcements from Microsoft about various copilot tools that are being added to their Office productivity suite to assist the common office worker, GitHub CoPilot for assisting the software developer with writing, analyzing, and documenting code, the data analyst using Power BI and Microsoft Fabric for simplifying the analysis and report building process that could be tedious, and this is just the beginning from their standpoint. They’ve also announced the AI CoPilot software development kit that allows developers to add a CoPilot to any number of applications that are used throughout the business and consumer worlds for assisting people with their everyday tasks and simplifying the process by which they were able to develop and create new pieces.

The real question that comes to mind, however, is “who gets credit for the work that gets created?”. When we see situations such as in the entertainment industry, where movie scripts are being created by these tools, thousands of songs have been recently removed from Spotify due to the fact that they were generated with AI, images and videos are being developed and manipulated with AI, this question comes to mind frequently. And this is just the beginning of what I anticipate will be a massive explosion of questions around who really should get the credit for what’s being created. If AI is helping individuals complete their work at a faster pace, and the broader community is benefiting as a result, does it really matter? If I read 10 articles on copilot, and I am able to retain a significant portion of what I read, then turn around and form an opinion, and write my own article, such as I’m doing here about how I see things happening, is that still my work? Is it my work even though I am writing it based on a whole bunch of other material that others produced, and I’m summarizing in a slightly different way? This is the process by which the majority of research has been based for centuries now, as well as many fiction and non-fiction works have been created. Is that really different when we look at the technology that underlies LLMs?

In the world of data science, we can see tremendous opportunity to take advantage of already created machine learning models based on the algorithms that were used to be able to then replicate the findings for any number of various data sets and opportunities to create new algorithms and predictive models. Is this somehow “cheating” suddenly? Are these data scientists working, hopefully, towards the greater good, and having the novel inspiration for what they want to build, but are using these tools to help them produce it more quickly, cheaters? I think these are the questions we need to be asking ourselves rather than pointing fingers at who the people are using the tools to produce the work that they are producing.

The other major concern coming from all of this are the privacy and security implications of training these LLMs with information that ultimately should not be shared. Microsoft is providing excellent options with regard to these concerns by allowing customers to create their own instance of the various tools provided, such as ChatGPT or Dall-E APIs, that allow their customers to isolate the models that are trained specifically for them in their own Azure Subscription and are not using those individual models or data that is collected to train any of the other models. Using tools such as Google’s, Bard or OpenAI’s ChatGPT interface, you do not have the same luxury, and those models are being re-trained by all the data that is fed into them. This very example was made loudly public recently when some engineers from Samsung fed some of their data into ChatGPT, exposing corporate secrets unknowingly. This is also causing rash decisions by CxOs to widely ban the tools that are helping their employees be more productive. Clearly more education is needed for helping with these decisions and scenarios at all levels of corporations, education, and individuals across the board.

As a writer myself, recently just finishing publishing my first book, the days of writer’s block and challenges with getting started on various topics in chapters, still haunt me. I see these tools as an opportunity to help us get past that and produce work that much more quickly. Truthfully, the answer here is subjective to each person’s opinion, similarly to how a person feels about a painting, song, or written piece. We already base so many of our works of art, whether in the form of an application, methodology, algorithm, and more traditional artistic stylings on the knowledge we’ve acquired through experience, that the notion of “original thought” is now so uncommon, and even when introduced, rejected by the greater society, how is this any different? I say we take advantage of the tools we are given and make the copilot as pervasive as possible to help gain efficiencies in every aspect of the modern world!

Azure Cognitive Services

AI solutions are exploding, and Azure has the most complete offering of any cloud provider! Watch this video to get started with our API based Cognitive Services in Azure and a sample architecture of how to employ them with the Azure Bot Service. Azure Cognitive Services are cloud-based services with REST APIs and client library SDKs available to help you build cognitive intelligence into your applications.

You can add cognitive features to your applications without having artificial intelligence (AI) or data science skills. Azure Cognitive Services comprise various AI services that enable you to build cognitive solutions that can see, hear, speak, understand, and even make decisions. Azure Bot Service enables you to build intelligent, enterprise-grade bots with ownership and control of your data. Begin with a simple Q&A bot or build a sophisticated virtual assistant.

https://docs.microsoft.com/en-us/azure/cognitive-services/what-are-cognitive-services

https://docs.microsoft.com/en-us/azure/cognitive-services/cognitive-services-apis-create-account?tabs=multiservice%2Cwindows

https://dev.botframework.com/

#Azure #AI #CognitiveServices #ArtificialIntelligence #Bots #ReferenceArchitecture #MachineLearning #API #Cloud #Data #DataScience

What is HTAP in Azure

Hybrid Transactional and Analytical Processing, or HTAP, is an advanced database capability that allows for both types of workloads to be performed without one impacting the performance of the other.

In this Video Blog, I cover some of the history of HTAP, some of the challenges and benefits of these systems, and where you can find them in Azure.

Overview of Azure Synapse Link featuring CosmosDB

Azure Synapse Link allows you to connect to your transactional system directly to run analytical and machine learning workloads while eliminating the need for ETL/ELT, batch processing and reload wait times.

In this vLog, I explain how to turn the capability to use Link on in CosmosDB, and what’s happening under the covers to give access to that analytical workload without impacting the performance of your transactional processing system.

Check it out here and let me know what you think!

Getting started with Spark Pools in Azure Synapse

In my latest video blog I discuss getting started on the newly Generally Available Spark Pools as a part of Azure Synapse, another great option for Data Engineering/Preparation, Data Exploration, and Machine learning workloads

Without going too deep into the history of Apache Spark, I’ll start with the basics. Essentially, in the early days of Big Data workloads, a basis for machine learning and deep learning for advanced analytics and AI, we would use a Hadoop cluster and move all these datasets across disks, but the disks were always the bottleneck in the process. So, the creators of Spark said hey, why don’t we do this in memory and remove that bottleneck. So they developed Apache Spark as an in memory data processing engine as a faster way to process these massive datasets.

When the Azure Synapse team wanted to make sure that they were offering the best possible data solution for all different kinds of workloads, Spark gave the ability to have an option for their customers that were already familiar with the Spark environment, and included this feature as part of the complete Azure Synapse Analytics offering.

Behind the scenes, the Synapse team is managing many of the components you’d find in Open-Sourced Spark such as:

  • Apache Hadoop Yarn – for the management of the clusters where the data is being processed
  • Apache Livy – for the job orchestration
  • Anaconda – a package manager, environment manager, Python/R data science distribution and a collection of over 7500 open source packages for increasing the capabilities of the Spark clusters

I hope you enjoy the post. Let me know your thoughts or questions!

Connecting to External Data with Azure Synapse

In my latest video blog I discuss and demonstrate some of the ways to connect to external data in Azure Synapse if there isn’t a need to import the data to the database or you want to do some ad-hoc analysis. I also talk about using COPY and CTAS statements if the requirement is to import the data after all. Check it out here

Comparing Azure Synapse, Snowflake, and Databricks for common data workloads

In this vLog post I discuss how Azure Synapse, Databricks and Snowflake compare when it comes to common data workloads:

Data Science

Business Intelligence

Ad-Hoc data analysis

Data Warehousing

and more!

Where does Azure Data Explorer fit in the rest of the Data Platform?

In this vLog I give an overview of Azure Data Explorer and the Kusto Query Language (KQL). Born from analyzing logs behind Power BI, ADX is a great way to take large sets of data and quickly analyze those datasets and get actionable insights on that data.

Find more details about Azure Data Explorer here: https://azure.microsoft.com/en-us/services/data-explorer/

And get started with these great tutorials: https://docs.microsoft.com/en-us/azure/data-explorer/create-cluster-database-portal

Should I Choose Azure Data Factory or Synapse Studio

In this vLog, I cover the reasons why you might consider using Azure Data Factory, a mature cloud service for orchestration and processing of data over the newly GA Azure Synapse Studio.

Synapse has all of the same features as Azure Data Factory, but if you have a large development team working on ELT operations, or a simple data processing activity, it could make sense for the less-cluttered Azure Data Factory.

Take a look at the vLog here and let me know your thoughts on other scenarios for you!