Are you new to Azure, or looking to make the move and curious about what Azure Data Factory is? Azure Data Factory is Microsoft’s cloud version of a data integration and orchestration tool. It allows you to move your data from one place to another and make changes with it. Here, I’d like to introduce the 4 main steps or components of how Azure Data Factory works.
Step 1 – Connect and Collect
Connect and collect is where you define where you’re pulling your data from, such as SQL databases, web applications, SSAS, etc. You collect that data into one centralized location like Azure Data Lake or Azure Blob Storage.
Step 2 – Transform and Enrich
In this step, you take the data from your centralized storage and enrich it to further expand on your data using HDInsight operation, Spark or Data Lake analytics, for example.
Step 3 – Publish
Next is to publish the data to a place that it can be better used and consumed by the end users. Any BI tool, such as Power BI or reporting services are great choices.
Step 4 – Monitor
This last step is to monitor the data to be sure jobs are running, and data is flowing, properly. It’s also important to monitor to ensure data quality. Monitoring can be done with tools like PowerShell, Microsoft Operations Manager or Azure Monitor, which allow you to monitor inside the Azure portal.
Over the past 7 or 8 years, I’ve gone from “0 to 60” when it comes to database design, development, ETL and BI. Most of the skills I’ve learned were a result of mistakes I’ve made, and I’ve been mostly self-taught with the exception of some more recent “formalized” learning with programs from Pragmatic Works. As I’ve grown more and more satisfaction with the process, I’ve gone on and started working towards my MCSA (2 of 3 complete) in SQL 2012, as well as speaking at SQL Saturdays and local user groups in New England. It’s become one of the most rewarding, exciting and challenging aspects of my career. As a result, I’ve posted some blog articles about some of the challenges I’ve overcome, though not frequently enough, and attempted to become more active in some SQL forums. The list below is far from complete when it comes to all of the best practices I’ve learned over the years, however, many of these lessons and best practices have really helped me to be organized when it comes to good BI architecture. I hope to provide at least one item that will benefit a newbie or even a seasoned pro. So, without further ado, here are the top 10 best practices I’ve learned the hard way…
Use Source Control
For anyone who was a developer in their past life, or is one now, this is a no-brainer, no-alternative best practice. In my case, because I come from a management and systems background, I’ve had to learn this the hardway. If this is your first foray into development, get ready, because you’re in for some mistakes, and you’re going to delete or change some code you really wish you didn’t. Whether it be for reference purposes on something you want to change, or something you do by accident, you’re going to need that code you just got rid of yesterday, and we both know you didn’t back up your Visual Studio jobs… Hence, source control. Github and Microsoft offer great solutions for Visual Studio, and Redgate offers a great solution for SSMS. I highly recommend checking them out and using the tools! There are some other options out there that are free, or will save your code to local storage locations, but the cloud is there for a reason, and many of us are on the go, so having it available from all locations is very helpful.
Standardize Transform Names
Leaving the default names for the OLE DB connector or the Execute SQL Task seems like something very silly a person might do because it’s so easy to label them for what they actually are, but I have to admit here, I’m guilty of doing so. I’ve found myself in situations where I’ve thrown together a quick ETL package for the purpose of testing, or further work on the database side, and then I’ve forgotten to go back and fix them. Fast-Forward 6 months, and saying to myself, “I think I had a package like that once before”, only to find it, open it, and not have a clue what it’s actually doing. This of course requires me to go through each component and refresh my memory on what I did. Truth be told, in this day of resources, memory, etc, there is absolutely no need to not give all the details needed within the title of the tool being used. Don’t be lazy, you never know when it might bite you!
Document inside your code and ETL workflow
If you’re using the script transforms, it opens a pretty standard development window anyone familiar with Visual Studio will recognize. As with any development, good code comes with good documentation. Use it! Your predecessors, if not you, yourself, will be very appreciative down the road. Name your functions appropriately, and explain what you’re doing throughout the code. Further, as you build your workflows, you have the ability to document what each step of the process is doing, use it! This also goes back to point 2. With standardized names for your transforms alongside documenting the workflow as you go, it paints a very nice picture of what your workflow is doing.
Setup detailed alerts for failures
The traditional workflows in SSIS allow for users to create mail notifications for successful and unsuccessful steps within the workflow. Of course, depending on how your packages are being run, you could have the same type of notification directly from the SQL Server running a SQL Job and also send notification, however, why use the SQL Alerts to tell you that “Job A” failed, and no real good information when you can have your package tell you exactly which transform or component failed, and what the error was when it failed. There are some limitations to the canned SSIS component as to the methods you can use to send an email, however, there are also some workarounds like the one here on Stackoverflow that shows how to use the script task to connect to and send mail through gmail. Either way, there is plenty of functionality that will help you to be informed of exactly what is happening inside a job and where there are warnings and errors for each component. Taking the time to do this is much better than getting a phone call with complaints about data not being updated properly!
Standardize File and Folder Names and Locations
Ok, ok, I know this is getting a bit redundant… but remember, these are the mistakes I made as a newb, and I really want to help you out as you start to use the software more and more, and get more and more complex with workflows. This one was a really big one for me. Because I do all of my work in visual studio, and all of the BI jobs look the same when it comes to the file level, I really needed to be able to show the separation between my ETL jobs, SSRS jobs and SSAS jobs. This also helped me out with my source control structure as well. I separated SSIS, SSRS, and SSAS jobs (you can even go as far as separating Tabular and Multi-dimentional if necessary) into separate folders, then labeled each type with “ETL” or “reports” as part of the file name. It saves me time when I’m opening a recent job because typically I’m working on tweaking the ETL at the same time as developing the reports to get just the right mix.
Have a “Report Request Form”
When you first start writing reports, it’s really exciting. You’re delivering value to the business and really helping people do their jobs, especially when you start transforming and aggregating data… but soon, you become more and more relied upon for for those reports, and no two reports are alike it would seem. A common best practice for people who are spitting out report after report is to have a report requirements request form like the one here from SQL Chick. This request form is pretty in-depth, so tweak as necessary, but it will really help you to prioritize and design reports going forward.
Experiment to fine tune and improve performance
So, this best practice item is really a whole blog post unto itself, but it’s something to be aware of. Just as a quick for instance, the “OLE DB command” transform is a great tool in theory, however, because of some of the nuances of the tool, if you’re using a large dataset, it can be significantly slower than using the “Execute SQL Task”, but the only way to know this is to compare them side by side, which I had to do, and realized the SQL Task took about 3 min, and the OLE DB Command took about 45 minutes. Moral of the story: if something seems to take a long time, there may, and most likely is, a better way to do it, go out and play!
Set your backup type to Basic or Bulk-logged on staging tables
Ok, so normally I would stress the importance of backups (always do them automatically, including logs, and before you change anything), but that would be more of a blog post about maintenance and configuration, but this is more to focus on ETL and Reporting, so let’s talk about the effect loading lots of data will have on your database, or better yet, why not check out this article that has a very good description of the 3 different backup models. The basics are just this… if it’s a staging table or db for data, you can probably get away with the basic model. If it’s a critical db, but you’re doing lots of data loads, the bulk-logged model will ignore SELECT INTO, bcp, INSERT SELECT and BULK INSERT DML operations so your transaction logs don’t get huge, fast.
Temp Tables will likely make your ETL run faster than staging tables
I can’t really take credit for this little nugget here. When doing a project with one of my coding buddies, he came in to my office one day and said “Hey, did you know that using temp tables in SQL will allow you to use multiple processors on the server at one time?” I did not know this, and man did it make a difference. What a huge performance boost it was for my project. Now, like everything else, there are exceptions to the rule. Some guys who are much smarter than me had this discussion on a forum that sheds some more light on the topic, and here are some more scenarios where it might not make sense. I think it’s good information overall, and the more information you have, then better off you’ll be when you’re designing your BI Architecture
Have a development and testing platform
As with some of the other best practices listed above, for some, this is a no-brainer. For others… we have no brains when it comes to this stuff and we need to have it beaten into our heads (I’m the latter if that wasn’t already clear). I can’t stress enough how much this will save you. You should never, ever, be doing development on a production environment. There are just too many things that can go wrong. Even those “quick” or “minor” changes can really cause a calamity and ruin your day quickly. Now, there can be challenges if you don’t have a proper production/dev/test environment at your office or your client’s location, however, with SQL Server Developer’s Edition now being a free tool, and PCs these days having tons of resources, you should be able to do your testing on even the simplest of computers and get a warm and fuzzy that you’re going to be able to deploy this latest package, report, or code successfully. Performance tuning might not be truly possible to do comparisons against a beefy production server, but you should be able to establish a baseline and have a general idea of how performance will be for various configurations.
Well, that’s it for this post. I really hope you’re able to provide even the slightest hint of learning something new here because that’s always my goal. If you have questions, you can follow me or send me a message on twitter @bizdataviz as I’m always happy to hear how I can write better blog posts to help people out whom are just getting their feet wet.
Recently I was presenting a session on Microsoft’s Power View in what I had intended to be my final time presenting the session before retiring it to the archives. Unfortunately the presentation didn’t go as planned. For the life of me, I couldn’t find my Power View icon in the MS Excel ribbon, and began to go out of my mind! Note, I’ve discussed this topic at various SQL Saturdays and User Groups for the past couple of years, and it started to become somewhat of a second nature for me doing demo after demo of the various components and moving on to the next. So, it’s only natural that phrases like “it was just here” and “what did Microsoft do this time!!” came out of my mouth in front of the group, and I rapidly changed my focus and did what components of the demo I could show in PowerBI as an alternative. Later find out that, in fact, Microsoft did do something. With an assumed effort to get people to use PowerBI and move away from Silverlight, Microsoft made the decision to remove the button from Excel in both Excel 2013 and Excel 2016 as described in the following article by John P White:
A couple of lessons learned from this and, in a sense, re-learned: 1) every time you think you know what Microsoft is doing, you don’t. 2) do at least one quick dry run of your presentation immediately before the presentation, or as close to it as possible 3) Power View appears to be transitioning to a “legacy” product… I guess it was a good time to retire the session
Now that I’m doing some more development of dashboards in the Power View interface, I thought it might be helpful to post about some of the date “gotchas” I’ve run in to with the development interface. Power View is very nice about handling Date filters if you are able to specify a date you’d like to use as a start point, end point, or 2 dates to post data about between, however, if you’re looking to determine if a date range is say, “This Week”, “This Month”, “This Quarter”, etc… you’ll have quite a bit more trouble. The workaround I’ve found to solve this issue is to create a new column that results in a boolean (true/false) answer. Then within your Power View filtering, you are able to choose “IsPastWeek” = True, etc. For Example, in your formula bar (or whatever tool you’re using, excel, powerbi, or SSAS tabular):
The above formula, simply put, says If the date is later than today minus 7 days, then return true, else, return false. Below is a view of the filters in my Power View Dashboard:
By setting this filter, the ability to look at specific dates or time periods makes it easier to report graphically. Also, the ability to filter on separate objects within the same canvas allows us to easily compare similar time periods side by side.
While rehearsing my demo for my upcoming session at SQL Saturday Boston this weekend, #sqlsat500 where I will be lecturing and demonstrating Scratching the Surface of Power View, I noticed a quirky issue where I wasn’t able to drill through my column data, but was fine with row data. Further, after some playing, I was able to drill down if I changed the order of my column values, or tried using other fields. Everyone hates NULL data, and I think this is just another reason why. As it turns out, there are some NULLs that happen to show up as the first column when you drill down, so the ability drill “up” is lost, and the only way to get back to your top-level data is to close and re-open the report.
Using the AdventureWorksDW2014 Data, and building a basic example of creating a matrix, then adding the Drill Down properties, my selections look like this:
As you can see. A fairly simple example, just trying to capture the essence of Drilling Down and back out again. When I click to drill down in to the “Black” column, I get the following:
Note, the first column heading has no title, and there isn’t an available “Drill Up” arrow either. At this point, I’m stuck. I can’t go back up to my original report, and now need to close and re-open to start over. If I change the order of the columns so that “Style” is listed first, above “Color”, the Drill Down and Drill Up work fine. In order to workaround the issue, I replaced all the NULLs in the table with “NA” and Viola! Works perfectly. I’m not sure if this is by design from Microsoft in order to force the data to have values, but since it’s their dataset to begin with, I’m assuming it’s a bug.
Hope this helps anyone else running into this issue!
Have you run in to issues where your SSRS column headers just won’t scroll with the data? How about losing the column headers when you go to your next page? Well, there seems to be some ambiguity about this topic, and anyone who has searched for an answer to their problem has probably run into a bunch of rudimentary answers on how to fix said issues. I’m living proof that said “fixes” don’t seem to always work consistently, and I have gone ahead and created my own set of steps to address the issue at hand. I can’t promise it will fix your issues, but in every instance I had the problem, it worked.
For starters, I would read the following page from MSDN to get an understanding of how row and column headings work:
As you can see, this should be pretty simple, but in reality, I can’t get it to work consistently, and it seems many others have tried and failed.
For starters… these settings in the Tablix Properties just seem to cause more issues than help:
If you modify the “Row Headers” and “Column Headers” properties in here (select them), then you will get the following error when you attempt to run your report later… so don’t use them with this method:
This error will ruin your day when you’re using the advanced properties… trust me… I know!
As referenced in the MSDN page, the way to gain better control over the column headers is to use the advanced properties for Row and Column Groups:
After changing to “Advanced Mode”, you’ll see essentially 2 different types of top row. Either way, the top row with contain the word “static” the major differentiation will whether or not the word is in parenthesis.
What makes the parenthesis critical is that if they’re missing, it means the header row has been deleted. If the header row has been deleted, my advice is to re-create the report. You will spend much less time in recreating it than you will chasing your tail trying to fix it!
On the flip side, if that top row has no parenthesis, you should be able to tackle this issue.
Select your top row and go to your properties window:
The key fields you want to look at here are “FixedData”, “KeepTogether”, “KeepWithGroup”, and “RepeatOnNewPage”. By default, I typically set FixedData = True; KeepTogether = True; KeepWithGroup = After; and RepeatOnNewPage = True.
From there, if it’s not quite right, you can play with those 4 settings to resolve your issue. Also, I would pay close attention to the bottom section of the MSDN article with regard to “Renderer Support” as there are only a few particular types of renderings for which the repeating rows and headers will work.
Recently I came across a challenging issue with calculating a time span within a group inside an SSRS Report. I am working on an issue where I need to be able to calculate labor pace operations per hour, and the time a person works on a particular labor task. My report data looks like the following:
As you can see, this person submitted their first weight at 7:34 AM with a weight of 13.805 lbs. In SSRS, it’s fairly straightforward to have a total for the weight, however, a total for the shift proved to be a little more challenging. I was hoping I could simply use the SUM() function in hopes that SSRS would know how to handle the time format and just give me a total time. No such luck. Through a series of trials and errors with the TimeSpan function, I realized that I could use the Min() and Max() functions with a simple math equation. Ultimately, my formula wound up being the following:
when applying the function, it yielded the a total of:
My next challenge was formatting the total so that it could be used in an equation with the total weight in order to determine my average lbs per hour. After several hours of trial and error with various TimeSpan and DateDiff/DatePart functions, I found simple was best and ended with this formula to get my results:
A little while back, the boys from Pragmatic Works (www.pragmaticworks.com) came up to Boston for their “Master” level training series on SSRS. I attended the class, and wanted to share my experience so other interested people can get a preview of what to expect from taking a PW class.
So, up front, I want to be honest about the fact that I am somewhat biased about the training services provided by PW due to my own past experiences in taking their classes. They have a plethora of offerings in relation to the SQL Server Stack and Data topics. I have attended several of their virtual and on-site classes in the past, and am always pleased with the course material, humorous injections, and interesting nuggets they always provide.
Ok, now that the free paid advertisement is over, let’s get to the meat and potatoes of the class. Taught by Devin Knight (@knight_devin) and Mike Davis (@MikeDavisSQL),the focus of the class was to look at some of the deeper components of how to really expand the capabilities of SSRS beyond the “canned” options and features. One thing I really like about the presentation style of the courses, beyond the humor, is the fact that they will talk about best practices, give demonstrations and examples of them, then follow-up with tips and tricks to get around some of the nuances of the technology being taught. This particular course focused on several areas of particular interest to me, and some others that don’t apply to my situation, so they were just good for informational purposes. The areas within SSRS covered were:
Good Report Design
Reporting from Cubes
Configuration and Security
As well as a section on Power View.
Overall, I thought the delivery of the class and the ability of the presenters to break up the material in order to keep it interesting was very good. Devin and Mike clearly know their stuff, and very obviously love doing it as well as sharing their insight and the various “Microsoftisms” that can occur. Below is a bit of detail on what worked and didn’t work as well for me about the class specifically.
The examples used about displaying the numbers in such a way that they become more readable by highlighting numbers falling in certain ranges, or more specifically, in-report KPIs, were very helpful.
Getting into some of the deeper security and configuration topics offered some different techniques on establishing better security alternatives.
The modules on linking and mapping within reports were good to have in this course, as those areas have provided some headaches to myself and my peers in the past.
Personally, I don’t do a whole lot with SSAS and Cubes, as most of my work surrounds SSIS and SSRS, but there were some interesting nuggets revealed and “ah ha” moments, as this appears to be somewhat of a tricky subject.
Not as useful stuff:
Given the class is at the “Master” level, I’m not sure an entire module and lab needed to be dedicated to Good Report Design. My assumption is that the majority of the people in the class were there because they have a fair amount of experience in delivering reports, and they were looking for ways to better extend them. Maybe instead of a module, using poor report design as a “pop quiz” throughout presenting in order to break up the monotony and add some humor.
I found the preparation materials for the class to be lacking a bit. Many of the students in the class (I believe there were around 75 of us) did not have their environment setup correctly on the first day so Mike and Devin spent much of the day running from person to person to make sure the configuration was correct. I especially remember this being the case for the SSAS and Cube sections.
I’ve been a big fan of Pragmatic Works products and training for a while now and recommend them to anyone that is looking to brush up on their SQL skills, or might have a need for their suite of tools. I found this class to be helpful and was able to use some of the topics covered almost immediately after taking the class.
I just recently presented at my first SQL Saturday event, SQL Saturday#334 – The Boston BI Session, and wanted to share my experience for future first-timers in the hopes it might help them with their presentations. Special thanks to Mike Hillwig (@mikehillwig or http://mikehillwig.com/) for giving me a shot for this great event. I did a fair amount of preparation leading up to the event in order to not be a total flop, and was able to speak at a local user group a couple of months before, which helped immensely. I’m a member of the SeacoastSQL User Group (http://seacoastsql.org/) out of Portsmouth, NH, and got some great guidance and feedback from Jack Corbett (@unclebiguns) after my first demo. The group is co-run by Mike Walsh (@mike_walsh), and is regularly attended by 10-15 members at our monthly meetings. I’ll take readers through the process I took to try and improve my presentation skills. Also, I will share the finished product and the elements of the presentation I believe I can improve on, and hope to, for any upcoming SQL Saturday experiences.
Pick a technology:
In order to prepare for my session, I took some time to think about what SQL technology I had the most experience with, and wanted to give an overview about. I have seen some phenomenally brilliant people in a specific technology completely flop when trying to explain that technology, or freeze up when getting in front of a crowd, so I really wanted to make sure it was something I was very comfortable with. When choosing my topic, SSRS, I decided it would be good to give an overview of the technology, as well as some “best practice” items for attendees to ponder as they walked away. Also, I wanted to choose something I was very familiar with in case something went wrong with my demo, and I needed to adjust on the fly in order to keep things from becoming awkward.
Brush up on those speaking skills:
I’ve always been relatively comfortable being in front of a crowd and have loved the opportunity for good discussion. There seems to be a general mix of people who like speaking and those who don’t. Being in front of a crowd of your peers should be something to get excited about, and in order to build the community and our knowledge, everyone should try it at least once. For those who aren’t aware, talking about tech can get a bit boring and tedious at times, so a nice overview of something where it can be kept light, and throwing some “softball” questions out to the audience to keep them engaged were items I focused on when building my presentation. For those who aren’t comfortable, start with a small group to get feedback, or even just a recorded session to be able to playback your voice and notice what you’re doing that might annoy people.
For my presentation, I was showing a slideshow and a demo all in the same hour-long session.
I’ve actually seen 3 different types of speakers:
• All presenting with slides and examples
• Some presenting and some demo
• All demo
It’s really up to you what you want to do, and what you think will deliver an effective session to the audience. My topic required some demonstration, and at the same time, gave me the opportunity to instill some methods and best practices for success.
Create a Script:
Some of the best advice I read about and received while preparing for the session was to create a script with some easy to reference queries for necessary coding elements of the presentation. Also, the other piece of advice I picked up was to always avoid typing in a demo. Copy and paste any code possible in order to avoid errors and delays.
Build your presentation:
I started with an overview of my background as well as the topics I wanted to cover. Some people are better at reading from cue cards, but I’m more of an “off the cuff” speaker, so I just jotted down some notes I wanted to highlight in a basic order to go along with a PowerPoint presentation. Successful PowerPoint design rules are posted all around the web about how much content, bullet points, static text, and ways to keep people interested, so do some reading on how to make the presentation flow cleanly.
Practice the presentation:
The old saying is: Practice makes perfect, and not much can be further from the truth. I did a dry run about 6 times to get a sense of how long the whole presentation would take as well as putting the order of topics to memory. From there, I recorded a session using the Camtasia Studio and sent it to a few friends to critique. I knew I would be presenting for about an hour, including questions, so I made sure to leave time for interruptions, system stalls, and anything that might slow me down a bit. My dry runs were taking about 45 minutes and when it came to the actual demonstration, it took 1 hour and 1 minute, so I was pleased with the timing.
When I reviewed the comments from the session evaluations, there was a mix of people who came to get introduced to the technology, and people who were refreshing their skills from some time ago. Most people felt that they walked away having learned something, which means I succeeded in my mission. On a 1-5 scoring system, 5 being the best, I received many 4’s and 5’s, and a few 3’s, so it would seem people were pretty pleased with the topic and presentation.
Among the items I learned in this presentation was that you can expect a wide range of questions from people, both on topic and off. I found myself spending time on questions that weren’t necessarily relevant to the conversation, so be aware of the audience and do your best to filter without being rude to the questioner if the question is off topic.
SQL Saturday events are for learning and networking, so if you find someone is showing interest in your topic, and/or somewhat jumping in and answering questions directed at you, I would suggest engaging that person after your session is over. This is a good opportunity for you to possibly learn more about the topic, or have a resource to rely on when you might be running into issues with a project.