Yak Shaving and Data Cleaning
In one of his Akimbo podcast episodes, Seth Godin describes the act of Yak Shaving:
Yak Shaving is the last step of a series of steps that occurs when you find something you need to do. “I want to wax the car today.”
“Oops, the hose is still broken from the winter. I’ll need to buy a new one at Home Depot.”
“But Home Depot is on the other side of the Tappan Zee bridge and getting there without my EZPass is miserable because of the tolls.”
“But, wait! I could borrow my neighbor’s EZPass…”
“Bob won’t lend me his EZPass until I return the mooshi pillow my son borrowed, though.”
“And we haven’t returned it because some of the stuffing fell out and we need to get some yak hair to restuff it.”
And the next thing you know, you’re at the zoo, shaving a yak, all so you can wax your car.
I have found that Yak Shaving is incredibly ubiquitous when it comes to anything having to do with data and technology. For example, I am helping one of my clients work with their Salesforce instance to capture information about how their students progress through their multi-year program. We wanted to know how many new and returning students we have in the program this year. To do that, I pulled a report based on our recent overhaul of Salesforce and started to see some questionable data: Some students appeared to have duplicate program records for the year (making them count twice on some reports). Others had conflicting information on their records (their grade was listed as middle school but they were enrolled in a high school program). There were some clear duplicate contacts in the bunch that needed to be merged. I would need to go through all these cleanup tasks before I could get the answers I needed. And that is how I found myself at the zoo shaving a yak staying up until midnight on a weeknight, merging duplicates, creating new fields to help sort out data quality, and tagging staff members on tons of records to ask them to verify the information.
Seth says about Yak Shaving: The minute you start walking down a path toward a yak shaving party, it’s worth making a compromise. Doing it well now is much better than doing it perfectly later.
When it comes to data cleaning and even data reporting, it can be very helpful to take Seth’s advice and choose where you can “compromise” - because data quality is a journey, not a destination. But I’d also like to make a small case for the fact that the Yak really did need to get shaved - and all those other things might have needed to happen as well. They don’t have to be done all at once and they don’t all need to be done perfectly, but there can be a huge satisfaction in completing the chain to get to your original need.
The incomparable Samantha Shain of The Data Are Alright explains with an empathetic and humanizing touch - we have options. We can normalize data, meaning we can manipulate it and map it so that it fits well together (example: having all state abbreviations or all fully spelled out state names, and a table that shows both). We can also "normalize data cleanup." As Samantha says, we need to take any blaming and shaming out of the data quality journey. So you have messy data...so what? You always, ALWAYS have the chance to clean it up. And because it is a journey, you do NOT have to do it all in one sitting (and probably can't, anyway).
So, how can we normalize data cleanup? We might just have to look a little bit more kindly on Yak Shaving. I can tell you what worked for me as I went through it late into the night last week:
1. Reframing the task at hand.
When I sat down at my computer, it was to answer a question about students. But I quickly saw that to do that accurately I would need to change my expectations about what exactly I'd be spending my time on that night. I shifted data cleaning from the periphery to be the main act, and suddenly I wasn't doing something extra and tediously outside the scope of my work - I was just doing the expected and "normal" work that goes into representing the real world accurately in our database. It might take a little longer than I anticipated, but luckily that was going to be OK.
2. Deciding what success would look like.
In my case, I was able to easily come up with some measures of success, like "no duplicates with the same first and last name" and "no students in middle school with high school program records". When I did the somewhat manual tasks of cleaning up the records for these statements to be true, I gave myself a little prize. I also made some dashboard components to show me when we might have these specific data quality issues again in the future. Having an accurate number of students is absolutely a great outcome - but the real success came in making our whole system better to answer many more questions as they arise.
3. Not going it alone.
I don't have all the context like the people on my client's program team do. So I brought them into the fun with chatter, Slack, and email. Communication, collaboration, and transparency - these are the three team-based backbones of data cleanup. Plus the more you talk about it, the less taboo it can become (I like to use Radical Transparency in these cases).
4. It's not a problem, it's an opportunity.
I've been called out before for being a little bit too optimistic and encouraging on this point, but it really helps me get through any frustrated or overwhelmed feelings I have when doing data cleanup. Maybe I'm not even "cleaning" something that was "dirty" - instead I am getting a better and more intimate look at our data, learning new methods for updating data, learning from others about how they handle similar situations and building relationships through that sharing and learning, and thinking of ways to work with the system and the team to avoid having to do "cleanup" again.
OK, I've really talked myself into it now and I'm just going to come out and say it: I love working on data quality! It has the same kind of feeling that finishing a puzzle can have, oh so satisfying. Let me know if you love it, too, or if you want to talk more about what data quality issues you may be experiencing and how to apply these mindsets to the task at hand.