What’s the Point of a Data Hub?

Cloudera Heffalump

My biggest challenge with what Cloudera presented was that they tried to cram in too much into a single presentation. It was a grab bag of features, rather than a real explanation of why anyone should care. Part of this is because Apache Hadoop, on which Cloudera is based, is such a large project with multiple components, as we discussed during my Cloudera preview post.

What was missing was focus. A laundry list of features just sounds like a company trying to be all things to all people, instead of saying “this is something you can do with this stuff”.

Big Hype

The hype around Big Data, and data analytics in general, has been running at fever pitch for some years. I’ve written about this extensively, and I’m still not seeing enough people clearly communicate what it’s all for. We get fluffy descriptions of it changing the way decisions will be made, and that we’ll be able to gain new insights or whatever, but nothing concrete. It’s all apparently magic.

Which it most definitely isn’t. There is great stuff you can do with these tools, but I increasingly think the people describing the wondrous things you could do don’t actually have a clear understanding of what they’re selling. There are glimpses of it, such as when Justin Erickson mentioned a ‘massive grep engine’ as something analysts could use to just search through big piles of text logs for company names, etc., but it’s not highlighted enough.

This is a classic trap that people fall into, particularly engineering types: talking about features, not benefits. This is one of the basic things you learn in Marketing class: People don’t buy features, they buy benefits.

People don’t buy a drill because they want to own a drill, they buy it because they need some holes. Yes, there are some exceptions, but they are a very different kind of market to the people who want holes, and you can’t advertise to them in the same way.

Far too much of the data analysis is focussed either on features (how much data it can store, it’s ‘fast’) or on nebulous benefits (you can get insights into customer behaviour!) but no time spent linking it to why anyone should care. It’s just assumed that having these things is an inherently good thing, and that all potential customers just intuitively get why having the feature is good.Instead of starting with the features, I’d prefer companies to discuss the benefits, in classic marketing style:

For [target segment] that need [the problem we solve] we offer [our product] which, unlike [next best alternative], our thing provides [these quantified benefits]. We do this by [how we do it], as demonstrated by [proof].

Instead, what I keep seeing is more like this:

We have [this product] which has [list of features]. You could do all sort of interesting things with this, such as [list of theoretical activities]. We have a hunch that [people with thumbs] might like to do some of them.

This is confusing, and not just to the C-level executives who need to sign off on the level of spend required to implement many of these things. It confuses technical folk (like me) because the tool simply does too much. Figuring out what it’s for is left up to the technical folks, and while that might be fine for experimental technologies, if you’re trying to sell it as a product, I shouldn’t be left guessing what your thing is for.

As so often happens with these presentations, the best bit came at the end. Xplain.io is an acquisition Cloudera made earlier this year. Anupam Singh, co-founder of Xplain.io and Head of Data Management for Cloudera, explained what it’s for very quickly, clearly, and concisely. If you’re an enterprise, and a candidate customer for Cloudera, you already have a lot of data. And you’re already using that data, in a datawarehouse, with BI tools, datamarts, whatever. That means you’re already running queries on that data. Great!

Xplain.io can look at those queries to figure out what data is important, and then you can use that information to build your Hadoop environment with bits you need most. It makes it easier to figure out what components of Hadoop you need, because Hadoop is so large, complex, and confusing. And there’s the link to the overall Cloudera value proposition: Hadoop, only easier. Here’s a quick go at putting this in the marketing 101 format I used above:

For people interested in Hadoop who need to figure out what bits they need/whether it’s worth it we offer xplain.io which unlike doing it yourself by hand our thing provides an easy way to figure out what parts of Hadoop you need. It does this by analysing the queries of your existing data to see what data is important, and recommending the appropriate component (e.g. Impala, Hive, HBase) for your Hadoop environment, as demonstrated by [these customers/this demo].

Tada! And it’s not just a single, opaque report you print out, it’s an interactive GUI based display that displays the information visually. These kinds of visualisations are extremely useful, when they’re designed well. It also helps IT to explain a fairly detailed and complex technical situation to others without the technical background, such as management, or simply other teams. “Your dashboard report is late all the time because this circle is red. Making it green, and making your report run on time, will cost $x. Yes, or no?”

Being able to clearly explain something to other people is an important skill, and one that more IT folk are starting to realise they lack. Tools like xplain.io help you to do a better job of figuring out what’s important and then communicating that importance to people in ways they will understand.

Bring it on.

Bookmark the permalink.


  1. Justin,
    I keep coming back to this post over and over again. I’ve referred it to a number of tech-marketing colleagues, or taken a couple of snippets out of it to prove a point

    “For [target segment] that need [the problem we solve] we offer [our product] which, unlike [next best alternative], our thing provides [these quantified benefits]. We do this by [how we do it], as demonstrated by [proof].”

    Might be classic marketing, but I don’t see a lot of it about these days, certainly nothing that succinct.

    Only one minor niggle which appears to be a typo is the following

    .. thing is for.Xplain.ioAs so often …

    regardless of that, its an excellent article from which most tech marketers should learn, or perhaps relearn some of their craft.

  2. Thanks, John.

    I’ve fixed the weird formatting issues. Must be a cut&paste error or something.

Comments are closed