We Live in Exciting Times for Everything Data!

Do we? Of course we do! I have been in BI and data management for about 20 years now, and I have never witnessed a time with such an excitement in the market.

Not only we can do what we used to do with the traditional small data much better and much faster, we also introduced the new categories of Big Data and Data Science. To tell the truth they really aren't really new but the costs involved in dealing with the two has plummeted thanks to technology and cultural progress. This is making them increasingly common.

Sometimes, I have the sensation to be firefighting, trying to absorb and get command as quickly as possible of all the new stuff being thrown at the market. This is a difficult but also a highly enjoyable process; it is fun to learn new things!

However, I often come across people who are not excited at all; people who would prefer that all this progress were much slower or just went away. I have to admit, if I wear their shoes, they have plenty of good reasons to be exasperated.

Who are these people? Those who pay the bills; the CIOs, CEOs and other executives involved.

In today's confused and ever shifting scenario, it is difficult to make a choice for a long term investment. We are facing the same level of complexity and cost that we have been facing in the past with the choice of an ERP: large investments that will require years to be paid off and will orient the entire technology landscape in the organization.

I tell this immediately, I wish there were a simple way through it. There is none. At least, if you are an unbiased market observer. What we can do is trying to table some considerations that may help.


So, let's shift our point of view for a moment and let's look at the data market with the eyes of a marketing director of a medium sized retailer, for example. We realize that she will be more and more conscious of the potential of big data and data science to engage the customers and ultimately sell more. She has already conducted many exploratory projects that have returned good results, up to the point to realize that this data driven approach must be engineered ad embedded in the marketing department's operations.  She will go to the CIO asking for a new handful of the same stuff she had already asked for and the problems begin.

"Well, we did the customer clustering on a one off set of data, today it will be already changed, and we used a tool named Mahout on top of Hadoop, which is good but very complicated. Today we have some other stuff, for example a thing called Spark might do better but we haven't tried it yet."

"The sentiment analysis, yes that was cool, but it is based on a new Machine Learning cloud technology from MS. It is a bit expensive and unpractical, at least by now. But we could always redo the thing on a different technology. Also that one was just Twitter, but you need also Facebook, Instagram, some blogs etc. sure. That has to be implemented."

"You hired a half dozen data scientists and I know that they are amazing. The level of insight that they are producing is amazing too. They are working with R mainly. However, I'd need to buy them computers with more ram because they told me that they can't work with the largest datasets. We also really do not know how to save their results in the data warehouse so they are readily available."

"I know you would like the output of the predictive analytics in our reports. I understand that having the budget compared with actual and prediction for the end of the period is an insightful piece of information, but we are using a client tool to do the predictions and we have yet to understand how to integrate it in the morning data build".

And so on.

Today we are in the middle of a data gold rush, were evolution is constant. Traditional players (Microsoft, oracle, IBM, SAP) are bringing their solution to the market. New players are becoming big players (Cloudera, Hortonworks,Tableau). Every day a new data centered startup makes the headlines. A manager who has to take a decision about the technologies to rely on in the next 5 to 10 years, is deep in trouble.

At this point you would expect me to make a forecast. In a sense I will do but I think it is more important to establish a framework to evaluate how these new technologies are going to stand the test of time.

The first aspect to consider is having a clear roadmap made public and a tradition of mainly sticking to it. SAP is probably the most accurate in this area. Generally they plan many years ahead and steer gently. Oracle and IBM, to a lesser extent, behave similarly. Pretty much all the traditional big players are quite reliable on their roadmaps.  

Other players, however, conceal the lack of vision and planning, terming their behavior as "Dynamic" or "Market Oriented". They are less reliable and you may well expect that their technology stack is going to get old fast, forcing the adopters to take difficult decisions.

Another key aspect is the endorsement of the technology by different players. There is little doubt that Hadoop is here to stay, there are too many players now working on it or making it part of their offering. The picture is different for all the Byzantine stacks built on it, some are de facto standards (pig, hive, sqoop ...) others are just bets, or stunts, to try to fill a market niche. In this area, too, you can make the difference between seriously developed complements, developed to tackle a market requirement (Spark, Impala ) and other components which nobody felt the need for (Why should I need a thing like Kafka? ok ok  I know, I know, it is just for fun ...).

Finally, the direction the market is heading to is going to be an important indicator. It is a bit complex to keep up with the polls and the inquiries which feel the pulse of the market and many of them may be affected by the "cool" factor about a technology. Some other may not be completely unbiased and they are of little value. A smart way to keep track of where the market is heading is to analyze the stream of job offerings and the technologies which are mentioned in it. Being actual job openings they are showing in "real time" what other organizations are doing. 

So, there are some ways to find a way through this forest of names and technologies to make a choice that is going to stand the test of time. Waiting for the market to settle, is not going to be an option because the risk is being put on the back foot by the competition that is already doing something. So, best of luck for your choices.

After all, we are paid for this, aren't we?