Engineering Integrated Data Solutions: A Few Considerations

I once had a manager that told me "big data" was a "big scam," over-hyped, and they wanted nothing to do with it. This same manager adhered almost exclusively to survey and focus group methodologies (except for one particular project which was, in my opinion, our best achievement during my tenure with this organization). Flash forward to 2017 and I would hope they'd changed their tune by now because clearly big data or whatever you choose to call it isn't going away -- it's growing rapidly.

Now just about every company in every industry is jumping on board. Although this has all the signs of a fad (a la NPS), big data has seemingly surpassed the typical time span of fads, is continuing to grow, and has even permeated into the lexicon of the general population. However, Gartner predicts that 60% of all big data projects will fail and while I couldn't immediately access the research that reached this conclusion, I'm not it a position to disagree with it. Additionally problematic is the fact that data is to be found nearly everywhere, in every format (or lack thereof), and in such a complete mess that it's high-time ISO or some similar governing body step in and save the world from the data morass.

While data itself can be the culprit for poor outcomes, a bigger issue lies in the due diligence and end-to-end design of big data solutions from the beginning stages. There is often the urge to jump in the deep end right from the start. Whether this is a "keeping up with the Joneses" decision from overblown hype or not, I'd wager to say it has a lot to do with what happens (and what doesn't) initially.

Careful With That First Step -- It's a Doozy!

There is one main question that needs to be answered before setting off:

  1. Why do we think we need a big data solution?

It sounds simple but it leads to several other tangential questions that can better define your organization's current state. I would bet that more than one business was sold on the benefits of incorporating a big data solution costing hundreds of thousands (maybe millions) of dollars before actually defining what those benefits were supposed to be.

This is really the biggest decision point of the process because once you head down the path of integrating big data solutions you have made a decision that will alter the course of your business for several years to come.

Quick Aside

Opportunity lies within data. It is an asset. Yet it does not necessarily have to be "big" in nature (and if you are wondering what most consider "big" it's basically any data, single-source or aggregate ("Variety"), that a single machine cannot store in RAM ("Volume") and/or process efficiently via its CPU/GPU in a timely fashion appropriate for the application ("Velocity") -- your definition may vary) so from now on let's just say "data solutions." Some data solutions can be modeled without the need for massive infrastructure investments.

If A Data Solution Falls in the Woods and Nobody is Around to Hear it...

Data without context is just white noise on your television without an antenna. This is why it's important to internalize data solutions -- only your business can provide the antenna to unscramble contextually-meaningful insights.

This isn't a simple process so let's get that off the table right away. Delivering an integrated capability to achieve a competitive advantage with data is typically a lengthy process that may need to evolve along the way. It goes beyond the integration of the data/tools/models/etc. More importantly it requires the integration of people.

What does this mean? Well first it means placing the members of your data solutions team in a position close to the data and close to the resources they may need to access and manipulate that data. If we are talking about an actual big data implementation (see my definition above), then most of the time this means IT folks. From my experience, I'd say that one of the biggest impediments to implementing a data solution is the time it takes to actually get the data to the people that need it. Depending on your business model, data can get old very fast so if you have to wait weeks -- dare I say months -- to get it then it may already be of less value.

Although this is skipping ahead a bit, you also need to think about enterprise-wide integration and engagement. Most businesses of sufficient size implement ERP tools and most decision makers are comfortable working within these. There are often ways of integrating data solutions within these tools (via APIs for instance) so that the workflow does not have to be disrupted with yet another tool. It's like handing a baseball player a cricket bat and telling them to "batter up" -- while it's surely possible it isn't something they are used to, they will most likely find it objectionable, and they will go back to their comfort zone. Your data solution has failed.

Arguably, the most difficult aspect of integrating data solutions isn't the actual implementation -- it's getting people to understand it, use it, and trust the results.

Just making access to data solutions easier isn't enough in and of itself. We are talking about behavior change on a potentially massive scale! Since people tend to revert back to old habits when faced with new and intimidating changes, engaging them early on and throughout implementation is key. While surely an entire article could be written on this topic alone, to simplify: Educate, demonstrate, iterate, and watch it proliferate!

The Easy Winners: Not Just a Scott Joplin Rag

From the get-go, look for small, measurable wins. Do not implement a large, costly data solution within your organization before you can see results! I suggest finding areas within your business that already rely heavily on data, for instance digital marketing. Many businesses stop short with Google Analytics never realizing the treasure trove of data both it and other similar web metrics tools can provide as inputs to data solutions.

Starting small not only saves resources but allows businesses the opportunity to build "data networks." Think of this as analogous to a database where data is stored with different relationships linking tables (i.e. one-to-one, one-to-many, many-to-many). Your initial project should help you identify other business areas that also rely on the same or "adjacent" data and from there you can slowly expand into other areas that may benefit from data solutions. For example, an initial project may focus on market basket optimization which segues into supply chain/logistic optimization.

The goal of this step isn't just to expand the data solutions further into the organization but to also help other business departments adapt to the change slowly.

This incremental method of data solution integration has other benefits too. Some solutions developed in initial projects can be re-engineered to address other projects without having to completely "reinvent the wheel." Since organizations often deal with proprietary internal data, many times the data engineering processes can be used again and again, saving both time and money. Just be sure to adhere to the "Hit By A Bus" principal: document everything and make that documentation available to all those that need it. If some unfortunate (bus) accident were to happen to a data solutions lead, you can't miss a step because of the tacit knowledge they alone hold.

The project should be marketed heavily internally via your organization's email communication, social tools (e.g. Yammer), webinars, etc. The added visibility layered with highlighting progress acts as a catalyst for organizational buy-in at the individual contributor level. You shouldn't only be communicating about the project though. This is also the perfect time to begin training while you have a captive audience wondering to themselves: "Wow, this is kind of neat. I wonder how they are accomplishing these goals?" Black-box analytics isn't always an easy sell, so start a forum for answering employee questions.

Potential Side Effects

There is more going on than may initially meet the eye. Building a data solutions capability internally can completely change organizational culture and practices by breaking down the barriers traditionally obstructing cross-functional collaboration. When the people in an organization begin to realize that they are all "swimming in the same data lake," they begin to think more broadly about the problems they used to cordon off into their functional area. This creates opportunities for more holistic approaches to tackling business issues and can foster new relationships between departments.

Again, this falls back to the need to change the behavior of people which is always a tall order at the enterprise level. Politics will come into play, roles will need to be redefined, and some will adamantly refuse to play ball. Organizations that don't get discouraged and move onward at this stage are those that are slated for success.

Another potential side effect of integrated data solutions is that some roles will be driven to obsolescence. It seems more and more likely that any task that touches data even tangentially will eventually be automated. This makes it all the more necessary to build an organization that educates and trains its employees how to be data cognizant and fearless.

Maximum Overdrive: Scaling Your Data Solutions...Not a Movie About Killer Vending Machines

I cannot overemphasize the simplicity I'm using to describe the data solution process. "Scale" is one of those consulting terms that everyone uses but really doesn't have a definitive, objective meaning universally. If you are a small to mid-sized business, your "Easy Winner" project might account for a large portion of your data solution needs so scaling isn't really applicable. You can do all you need on a single computer.

When the "volume/variety/velocity" of the data you take in reaches a certain threshold this single-system solution no longer works. For example, on your personal computer you may have a terabyte of storage space that has no chance of ever filling up completely (unless you do something like video editing). If it does you can just go out and buy another hard drive and carry on. You have the "volume" but not the "velocity." In an enterprise-level data solution environment however, you might be collecting and storing a terabyte of data every day -- maybe every few hours!

The data solution ecosystem is intimidating to say the least. It helps if you think about it like buying a giant, virtual supercomputer (which I'm sure we've all done before, right?). Joking aside, think about what components make up your computer: a hard drive for storage, memory and a processor for performing calculations, a network card for pulling in data/information from the Internet/Intranet, and if you are a researcher/analyst/data junkie then a program for working with data. All the scalable data solution ecosystems that come to mind (like those offered via Google Cloud Platform or AWS) can be likened to these components. The difference is they don't necessarily have an upper bound on their capacity or capability (hence "scalable").

Last and certainly not least, I refer back to the human aspect of scaling integrated data solutions:

While not everyone will be or needs to be well versed in the technology aspects, everyone should understand the benefits of this business evolution and how it improves the way they do their jobs and make decisions.

Scaling isn't just about accommodating inflows of data, but the organizational culture that is tasked with making sense of it all. Again, many of these initiatives fall short because of the focus on tech. Few are taking the time to regularly train and educate the end users how to use and understand data solutions. Ultimately that could be the most costly mistake and will surely have some thinking big data is a big scam.

John Sukup