Having been working with big data since the early days, I’ve had the privilege of watching many organizations work through the opportunities, challenges, and changes it has driven. In that time I’ve seen things work out well and, sadly, sometimes not so well. In the spirit of sharing some best practices, the team and I thought it’d be helpful to write some blogs bringing together big data best practices to help companies learn from others.
In this entry, I’ll identify (and demystify) the top five most common misconceptions about big data that I’ve heard, and provide some tips on how to get past them. So without further ado, here we go:
Misconception #5: Tackling big data is just doing more of exactly what you’re doing today
When I first talk with many organizations new to big data, the first thing they usually want to do is make all their existing reports run faster. Or to provide the same runtimes with more data. This is a very well-intentioned ask driven by a classic IT challenge – do more with less. But it misses the value of big data. Can we make a report run faster? Absolutely. Can we meet existing SLAs with orders of magnitude more data? Definitely. Will the business open their purse strings and spend more money for a big data project which promises to deliver the status quo with incremental improvements? Hmm.
The business opportunities for big data can be significant. One of the more straightforward examples which didn’t involve any exotic new practices or people is Guess Inc. They were able to re-engineer their data pipeline to completely transform the experience of managing their retail stores. In the old world the store managers had a weekly printed report. In the new world they have real-time, dynamic information about their store, their customers, and brand & loyalty programs. So Guess was able to overhaul the process of decision-making. If they’d just focused on doing more of the same, this wouldn’t have happened.
Misconception #4: Tackling big data means throwing out everything and starting new
At the other extreme are the organizations I speak with who are convinced that they have to start from scratch. They might bring in new leadership, and maybe some consultants. And they might look to create an entirely new data architecture from scratch. The obvious issue with this is the high risk of undertaking the unknown path to big data. “But Walt,” the astute reader may observe “you just told me that big data meant not doing what I’m doing today. Now you’re telling me that it’s not about starting from scratch either. So which is it??”
Our most successful customers take a balanced approach. With few exceptions, most businesses weren’t born yesterday. They have accumulated knowledge of their business and solutions which are likely keeping the lights on. So while it’s very high risk to throw everything out and start over because the end value of that effort is unknown, it may be even higher risk to be myopic and focus exclusively on making incremental improvements to existing things. Striking a balance between entrepreneurship and predictability is important in this situation. Increasingly we see companies opening innovation centers – internal incubators where new or existing staff can experiment and find better ways of solving problems. When managed well, these can be engines of change.
Misconception #3: The value is five years away
This belief is often a symptom of companies who’ve tried ambitious IT transformation before…only to see the projects either fail outright, or become “zombie projects” which never really end, consume resources, and never deliver anything with tangible value. Past experience can often predict future events. Unless it doesn’t. The notion of a balanced approach can help here also. With the right technology in-hand, a big data project can show real value in months. The first project isn’t re-inventing the business, but an early win is strategically useful for a number of reasons. It shows that big data is actually worth doing. It also would gain support from stakeholders. And it gets the technology team crucial experience in the new domain.
I want to stress that technology selection is key to reducing time to value. If the team goes with technology that’ll take six months to set up, or which requires the retraining and/or hiring of staff to obtain needed skills, the project is likely in trouble before it’s even fully launched. Sadly, we’ve seen this happen often. Don’t get bogged down in the tech morass.
Misconception #2: You absolutely, positively need Hadoop
Hadoop solves some problems very well, but it isn’t a cure-all. When it was first built, it was very handy as a batch map/reduce platform. It can be a cheap place to park lots of data. But, like any young technology, it comes with risks. For the company new to it, it might be very appealing in that it appears to do all things big data. Just go peruse the Apache Hadoop ecosystem – it’s got everything. Caveat emptor. Is the team excited about that project which describes itself as a data warehouse? Well, the devil is in the details. Is it SQL complete? Does it support the BI tools you need? Is it robust enough to meet your SLAs? And so on. The point? See Misconception #3.
Some organizations are large enough to bear the cost of being Hadoop experts. Many aren’t. And the degree of expertise required for the care and feeding of Hadoop is highly dependent on how it’s being used. There are organizations who are happily using exactly no Hadoop at all, yet still manage to deliver value from big data. That said, like any tool, it can be useful when used in the right way. It’s not uncommon to see organizations using it as a platform for data preparation – ETL for big data. Some use it for archiving. And while some companies are using it for analytics, they are the sorts of firms who can bear the cost of the expertise required to accomplish this.
Misconception #1: You absolutely, positively need data scientists
Spoiler alert: heresy ahead! Big data is complicated so it must require data science, right? Wrong. There are a great many insights to be had which are not the result of “rocket science” analytics. I spent a number of years early in my career practicing the thing we now call data science for business. My big lesson? That 80% of the work was in preparing the data, 19% was in understanding the data enough to articulate a story that went with the model, and 1% was generating the model. In the world of NASA, a tiny error in prediction might result in the destruction of a $100 million satellite. Not so in the world of business. Insights are pragmatic things in the business world, and they rarely require six sigma accuracy.
What this means is that I can take an undergrad with two semesters of stats, train her on how to use a half dozen algorithms effectively, and make her a productive analyst…if it’s not too hard to get the data and set it up for analysis. The dirty little secret of data science is that it’s the setup of the data that consumes most of the time & effort. And that’s not data science. A longtime colleague refers to this as “data wrangling”. And while it’s complex, it’s not building & launching a satellite.
So to be successful with big data, businesses do not absolutely require data scientists. They need people who can approach a problem methodically and articulate a credible story to go along with explanatory or predictive models. And they need technology which makes data wrangling something that analysts can do quickly & easily. And it just so happens that we make that technology. Click here to have a conversation with one of our gurus about our big data technology and how it can help your big data program.