In our own work with data analytics, in coaching financial & business analysts, and with data conversions, we often coach clients to learn and use Microsoft Access or SQL Server, rather than (or perhaps more accurately, in conjunction with) Microsoft Excel. In this article, we explore some of the reasons why.
"I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail." - Abraham Maslow (read about this)
Excel is Immediate & Powerful
First, I’m not knocking Excel. It is an amazing tool. Between rapid sorting, filtering, and de-duplicating data, Excel is an amazing and vital tool for anyone analyzing and cleaning up data. With pivot tables and charts & graphs, you can provide insightful reports as well.
Add the power of VBA (Visual-basic for Applications) and Excel can be used for data conversions and analysis quite nicely. And in a pinch, it works as a data repository. I’m using the term data repository - a place to store and track simplified data - because I’m unwilling to call it a database. More on that later.
You may have used Excel for years and never ventured into the Microsoft Access or SQL Server world. And it may be that you do not need to.
Excel is immediate in a way that is difficult to match. You take some data in the form of comma delimited text, import or paste it into Excel, and voila, you are immediately sorting and filtering with incredible speed and deftness.
Excel is NOT Relational
This is a significant weakness for Excel. Excel does not truly manage or view relational data very effectively. If you are unfamiliar with relational data, you can find any number of tutorials online. It is important to understand - not only for the most accurate and optimal data cleanup but will also help you better utilize and understand source data in Excel.
Relational Data Explained
A great way to understand relational data is the student, class example. And mind you, this is grossly simplified… having built a true, working classroom/student database, there are a number of other critical relationships beyond what I am showing here. But as an example, this makes sense.
Here is a video (not mine) that is simple enough to show you the class/student example.
So relational data is a way to duplicate as little data as possible by referencing a unique identifier for key data elements. In the student/class example, there are two lists. Students and Classes.
In the Excel world, think of one worksheet per list. In order to properly join them - in the relational sense - you would have a third worksheet - perhaps called, StudentClasses. In that sheet you would only have two columns. Student ID and Class ID.
Of course, you can use formulas like VLookup to find and display matching data - or the more complex Match/Index combo. Array formulas also let you aggregate data somewhat.
But from a data management standpoint, Excel cannot truly represent and manage relational data - allowing you to create complex queries in a fashion that touches Microsoft Access and Microsoft Excel.
Regarding the quote at the start of this article. When we work with some organizations, there can be a pretty strong push-back on using tools beyond Excel. Some Excel users are AMAZING! Truly - they can and do accomplish extraordinary things. And they balk at the idea of using Access or SQL Server saying, “I can do that with Excel and NOT go through the learning curve.”
Chances are - they can. No doubt. And for A project (one, singular project), it may not matter. But, and I make no apologies for this, there is simply NO WAY you can do the level of data cleanup and visualization of relational and complex data in Excel that you can in Access or SQL Server.
Add to this, the ability to write complex queries, pulling data from SQL data sources like SQL Server, MySQL, Oracle, etc. can give you FAR CLEANER data to bring into Excel.
This is less of an either/or situation but more of a both/and situation. Meaning, we are NOT saying you should get rid of Excel. It is one of the most basic and powerful skills we recommend you advance. But, if you add true understanding of relational design tools, you dramatically increase your skillset and value in data heavy projects.
It is another tool, particularly well-suited to some applications, to add to your toolkit. That’s a good thing.
Microsoft Access & SQL Server Tools
Setting up relational tables in Microsoft Access or SQL Server takes more time than dropping the data in Excel. This is unavoidable and is often the reason that many Excel experts never move past Excel Worksheets.
However, queries and views are too powerful to overlook. Access forms and sub-forms are also hard to ignore.
Queries (MS Access) and Views (SQL Server)
Queries & Views are similar (not precisely but close enough for our discussion) constructs. They allow the data analysts/programmer to join relational data from multiple tables into a single set of data. Depending upon complexity, tables can be joined to themselves or a join with multiple tables - including custom functions to make visualizing key aspect of data possible.
This would be the same as having a special worksheet in Excel that is built on the fly, connecting 2, 3, or 10 other sheets - connecting those sheets with key data elements, like Student ID or Class ID.
Whether extracting data or cleaning it up, add the ability to query data to your existing Excel skillset and you can do far more with data visualization and preparation.
MS Access Forms
We often use Access as a front-end to SQL Server. But whether you are using native Access data or SQL data, a form and a simple - data connected - subform allows you to easily see and edit relational data.
For instance, you could have a student form with a classes sub-form - allowing you to easily see any classes a student is signed up for and add classes with a simple drop-down list in the sub-form.
Get Started with MS Access & SQL Server
A subscription to Office 365 Professional is an inexpensive way to get Microsoft Access. Jump in with a simple project - use the Student and Class example and learn to make a simple query and a simple form and sub-form.
Don’t over-complicate it. Use as little data as needed to demonstrate and work with relational data. I’ve included an example you can download.
SQL Server Express
Microsoft’s SQL Server express is a free/distributable version of SQL Server. Download it, including the management suite. You will find that creating tables is not much more difficult than MS Access natively.
SQL provides a better method for securely connecting to data as you connect to the SQL Server service, not to a data file as you do with the native Access data. In multi-user environments, Access requires all users have access to the actual data file - which means the file could be deleted inadvertantly or purposely. SQL allows you to determine the level of access of any given user and DO NOT provide direct access to the data files themselves.
Microsoft Access & Excel Example
I’m providing a super-simplified and sparse example of an Excel sheet with data and the same data in Microsoft Access. Contained in the download is a Microsoft Access database and an Excel workbook. Students-and-classes.accdb & students-and-classes.xlsx
This is NOT a tutorial - so I am not providing a lot of details about the setup. However, I provide the following notes. If you have a question, comment or send me a message.
- The Student Worksheet corresponds with the tblStudents table.
- The Classes Worksheet corresponds with the tblClasses table.
- The StudentClasses worksheets corresponds with the tblStudentClasses table.
- In the StudentClasses table, each ID field is setup as a drop-down combo box that reads the ID and the primary data of that table.
2 columns are included, the ID and the student name or class name, respectively.
Column widths are set to [0”; 2”;] - which effectively displays the name, not the ID.
It is Important to note that in the table, what is being stored is the ID, not the data elements name. This is how you display data a user can understand while only storing the relational data element (the ID).
- The query, qryStudentswithClasses, joins all three tables. You can see it visually or you can view it as SQL. This is a good way to learn how to query almost any type of SQL database.
- The form: frmStudentswithSubform is a primary form for students and has a sub-form named, frmStudentClasssubform. The subform is connected using the StudentID field. You can see this represented in the properties of the subform while in design view.
- As you add data to the student or class tables, then open the primary form, you will see be able to add classes to any given student using the subform. Add new classes and they will appear the next time you select from the class drop down in the subform.
If you or your team is using Excel extensively but want to add the power of better data cleanup and presentation, take the time to learn MS Access & SQL Server. As a data professional, this will dramatically increase your value to your organization and allow you to build much more robust tools and solutions.
If you want help learning these tools, contact us.