Spreadsheets ARE databases: A Rebuttal
Dear Katie Redderson-Lear,
We haven't met before. We've never interacted. From your LinkedIn profile I can see that you work at one of my absolutely favorite companies (Zapier), and you have been there for a number of years. The rest of your LinkedIn profile looks equally cool - like you speak Chinese well enough to proofread an encyclopedia? That is SO cool. And it is something we have tangentially in common - I have done proofreading for my work (in English, boring as that is) and I even took an NYU Continuing Ed course in it (if you can believe that is really a thing, or it used to be anyway). We both seem to have some technical skills and a leaning towards sharing those with customers and colleagues. Honestly, I'm surprised we don't seem to have any people connections in common (yet). I really hope we get a chance to connect and become friends and maybe even work together one day!
But today, I want to respectfully disagree with nearly everything that you've written in your blog post for Zapier called Spreadsheets aren't databases - stop using them like one. Nearly everything, but not EVERY thing. Like for example, the first line of your post is spot-on:
"Sometimes a spreadsheet isn't enough."
Yup. Yup. Yup. Absolutely. I oftentimes like to think of spreadsheets as stepping stones and learning experiences on the way to bigger and better systems. And in terms of Constituent Relationship Management, spreadsheets can even be harmful because it is often the case that there isn't great management and oversight of how they are being used across an organization. So people and teams end up keeping information about the same person in different places, and that can end up hurting the relationship with the constituents and causing extra work and troubles.
But let me back up and take issue with the title of your blog post.
"A spreadsheet isn't a database."
I'm sorry, but yes it is. The classic definition of a database is "a systematic collection of data". Sure, spreadsheets can be used poorly as databases, but they can also be used perfectly well for data collection, storing, manipulating, displaying, and analyzing. Spreadsheets, done well, can even serve as relational databases. They can have tables of data that are connected by formulas. I've made them. I work with them. They aren't as sophisticated as a true database system powered by more impressive programming languages, but a lot of people would be surprised at just how far you can press a spreadsheet into being a highly effective database, particularly for smaller data batches (under 500,000 records per table). And as you yourself say "the vast majority of spreadsheets...will never come close" to the kind of data numbers that would cause problems.
"Spreadsheets show you everything, all the time"
Nope. You can hide anything and everything. You can do it in a few different ways. But this isn't really what you are talking about in this section of your post. You go on to talk more about processing power. That the more data you have and the more formulas you have can slow down your spreadsheet. And that is absolutely true - I can confidently say I have tested the limits of Google Sheets and Excel in this area. Excel at least has some fun ways to help manage the processing power issues - you can choose exactly when your calculations run, you can store data in separate spreadsheets and link them, only updating the data at your convenience, and with PowerQuery you can run specific sequences of gathering, manipulating, and outputting data much the way sophisticated database systems process data. You note that databases "store data behind the scenes". Spreadsheets can do this, too. They can and often are called upon to create temporary tables within formulas that act in the same way that other database systems do to "only load what you need when you request it."
"If you're past the point of being able to get anything valuable out of looking at your sheet, you should probably set up a database for the job."
Again, spreadsheets are databases. And if you have a ton of unorganized, illegible data in them, that isn't the spreadsheet's fault. It is a symptom of not knowing how to best set up a database for consumption. Most of my spreadsheets are set up with 3 basic tabs:
Raw Data: This tab is where you input data in its messy entirety.
Transform: This is where a ton of formulas and functions are used to manipulate that data into information.
Display: This is where we take the manipulated data and display it in a way that allows people to consume it, gain insights, ask questions, etc.
"Spreadsheets aren't great at filtering or querying."
I'm sorry, what? Spreadsheets are excellent at filtering and querying. In fact, I think I can make an argument that spreadsheets have filtering capabilities that some sophisticated database systems do not. You link to a post about how to use the most basic "filter" function in Google Sheets. The ability to add that filter function - for a non-technical spreadsheet user to be able to filter their data using the filter function - makes spreadsheets infinitely more accessible than other database systems that require some sort of written command, often in a programming language, to filter data. And Google Sheets even allows you to create "Filter Views" that can provide specific views of data to specific users, which is really cool (just ask my spreadsheet soulmate about it - https://thedataarealright.blog/2020/09/23/steven-universe-presents-filter-views/)
You say that using the filter function like this "requires making changes to the sheet itself" while other systems do querying and reporting "without actually affecting the data." This is misleading. No data changes in any way when you filter. Only the VIEW of the data changes.
(Side note: There are cases where using the filter function can actually change the data - it happens when you use a specific kind of formula, like the FILTER formula or QUERY formula/language and then try to sort data with the filter function (the little down-arrow that should be appearing on the first row of your spreadsheet containing your column headers). In this particular example you can run into trouble. But this is rare, and usually only used by spreadsheet nerds like me, and I always leave a note in the sheet telling users not to sort, only to filter.)
Now that I mentioned the FILTER and QUERY formulas, there are a million-plus formulas and approaches to filter just as well as any other database, without having any impact on the raw data that you put into the spreadsheet. There are seriously so so so many cool ways to do this, and the formulas are arguably more accessible to more people using spreadsheets. I'm not going to load you down with a ton of examples, but my spreadsheet soulmate has gone into it in the past, so that is a great place to start if you are curious: https://thedataarealright.blog/2020/10/18/filter-down-spreadsheet-formulas/.
"Typos are a thing."
Now you know as well as I do that typos and bad data happen in sophisticated database systems ALL. THE. TIME. You are totally correct when you say that by default, spreadsheets don't typically come with data validation in columns. That needs to be set up. But you yourself say that you need to "set it up" in your database as well ("When you add a column to a database, you include the expected data type and limitations along with the column name"). The exact same process can be done in a spreadsheet. But regardless, there is no database with perfect data in it, and there will always need to be checks and balances. A good spreadsheet creator will set these up at the beginning, just like any good database manager would.
"Databases are better at making connections."
The key word here is "better" because as you yourself note, spreadsheets can be set up in such a way as to make them relational, so that if you choose a single-source of truth for data entry/raw data, then updating data in that place can be made to update the same data everywhere else in the spreadsheet. The difference between spreadsheets and more sophisticated databases is that the latter have the idea of relationships baked in and often hidden behind a pretty UI (I'm thinking about CRMs like Salesforce). I frequently argue that some of the reason people have trouble learning Salesforce and other database systems is because they haven't been given the opportunity to really SEE the behind-the-scenes work of those data table relationships in action. So although more sophisticated databases may be built for these connections, I think it can sometimes be a disservice to the vast majority of data and technology users to not get the education and hands-on learning about relational data tables that a spreadsheet provides.
"So if you're working with a lot of data and need to make regular connections, it's a good idea to look into setting up a proper database."
I agree with this statement. But I don't agree that it cuts out spreadsheets as a good option and a proper database. The key is in the setup and implementation of the spreadsheet, just as it would be with any database system. Just because you have MySQL or AirTable (the two you mentioned at the end of the post) doesn't mean they aren't going to be TOTAL DISASTERS if your database manager isn't on point, if you have crazy legacy data, if you have shadow systems passing around additional information, if you don't have good documentation, if you don't have the right levels of access, if you haven't set up processes for data entry and output, if your business and operational teams haven't agreed on how to organize the data coming in from the real world (for example, when an organization and/or program team hasn't agreed on definitions about program categorizations, program participants, etc. - this is fairly commonplace in nonprofits).
As my spreadsheet soulmate reminded me, many people and organizations using sophisticated database systems are still using “hella spreadsheets” to help them “fix, verify, validate, update, insert, heck even delete data.” There might be some magic orgs out there that are doing every single bit of that within their awesome database setup, but more often “what databases do is centralize the spreadsheets, not eradicate them.”
So Katie, I really do hope we can be friends and work together at some point. I love Zapier, you seem like a really cool person, and I know this all sounded a bit harsh. But it is very important to me that people know how to use spreadsheets and how to use them well. I would love to see most people and organizations get to the point where they can ALSO use more sophisticated database systems and understand them well. But I truly believe that the way most people are going to be able to get to that point is to first conquer their spreadsheet fears and learn how they are in total control of their data and technology. Using spreadsheets may be the easiest and most accessible way to do that, and it can set up the knowledge path to being an excellent sophisticated database user.
Sincerely,
Emily Hicks-Rotella
Make Tech Work For You