Category Archives: Big Data

The Opportunity Cost of Complexity: Simple is Still Better

softwaredesignWhen I joined Netezza back in 2006, there was a powerpoint presentation that included a slide that we all came to refer to as “the no slide”.  It listed all of the things that you didn’t need to worry about when you were running a Netezza appliance — indexes, complex partitioning schemes, tablespace sizing and configuration, storage administration, and so on.  Simplicity was always a core goal of engineering Netezza systems, now the IBM PureData System for Analytics.

In fact, it’s really hard to try and apply simplicity after the fact as vendors like Teradata and Oracle have tried to do with their “appliance” offerings.  The inherent complexity of a Teradata system, for example, is entrenched in the core software itself.  Teradata will tell you that having 19 different types of indexes is a good thing, because it lets you tune the system.  Sessions on how best to use Teradata indexes are a common at Teradata user conferences, and there are several expensive books available on the topic as well.

Not only do constructs like indexes add overhead to the system itself (more objects, more storage, more maintenance), but additional complexity also translates to more DBAs and staff required to run the system.  It’s a double edged sword:  you have to do lots of upfront work to get the system running, and then lots of continuing work to keep it happy.

All of this translates to higher cost of ownership, and longer deployment times, which is borne out by the ITG Cost/Benefit Analysis study.   Indeed, consider all of the missed opportunites that complexity leads to.  Some highlights:

  • Three-year costs of ownership for Teradata 2750 systems averaged 1.5 times higher than for PureData System for Analytics equivalents
  • Personnel costs for use of the Teradata 2750 averaged 2.6 times higher than for PureData System for Analytics.
  • Deployment costs, principally for external professional services, averaged 3.8 times higher for Teradata 2750 systems than for use of PureData System for Analytics N200X.

To put it simply (pun intended), simple, easy to use systems = Faster time to value at lower cost.   A well designed system, like the PureData System for Analytics, shouldn’t need to be complex and can provide outstanding performance without added complexity.  So don’t let vendors try and tell you that complexity is a friend because it gives you more control and power… With friends like that, who needs enemies?

What can we tell about the health of a company from social media?

sinking_ship-1Like most technically savvy people today, I have accounts on most of the major social media platforms — Facebook, Twitter, LinkedIn, and others.

I’ve also spent quite a bit of time over the past few years in my technical marketing and competitive analyst role at IBM trying to assess the relative strengths, weaknesses, market penetration, etc… of the key companies we compete against in the highly competitive data warehouse , analytics, and “big data” space.

Standard sources of information — websites, press releases, analyst reports, webinars, etc… — provide one view of an organization.  This is a carefully crafted publicly visible view of an organization.  I guarantee that most of this information is heavily reviewed and vetted by any of these organization’s senior level executives, product management and product marketing folks.  In other words, they don’t tell the whole story.

Over the course of my professional career, I’ve both observed and participated in another phenomenon.  High functioning teams of employees and respected peer groups often migrate from company to company in groups.   Often, this starts as a trickle and becomes a flood.   And if there are significant issues with, let’s call it the “source organization,” these early trickles of employee exoduses can serve as an early warning for those tuned in and looking for it.

In the old days, it didn’t matter so much; today, however, lots of people post enthusiastically about starting a new job at a new company, and as our social networks mature, it’s becoming easier to detect patterns.

I was struck by this this week when a handful of people I’ve worked with in the past left one organization and landed at another.   Putting on my competitive analyst hat, and correlating this with recent information I have on how one of these organizations has performed in POC activity, I suspect either product issues or company issues with the source organization.  In short, the sales engineers aren’t making money and the support folks are frustrated.

The real question is, how to use this in my own competitive marketing efforts.  And the other real question is, how can I turn this into a cool “health-o-meter” app…

Thoughts on the Big Data Landscape for 2014

bigdataAs we head into the new year, I thought I would take a moment to reflect on the past year in Big Data and Analytics, while contemplating what the upcoming year may bring.

We’ve seen different vendors continue to build out their big data solutions — either by doubling down on what they already have, partnering with other vendors, or going in new directions.

We’ve seen in-memory computing  take the market by storm, largely driven by the marketing arm of SAP — regardless of the actual results for true in-memory computing.  We’ve also seen Oracle change their messaging on Exadata with the release of X4 (they no longer refer to X3 as an “in-memory” machine, which was stretching the truth in no uncertain terms).

And, we’ve seen IBM launch DB2 with BLU Acceleration, which takes in-memory computing to the next level, while maintaining “appliance simplicity.”

We’ve also seen vendors like Teradata, Microsoft, and Oracle expand their partnerships for Hadoop with HortonWorks for the first two, and Cloudera for Oracle… but more importantly we’ve seen the first workable logical data warehouse architectures emerge that augment traditional data warehousing with Hadoop and Stream computing.

So where do we go from here?  Cloud computing is emerging bringing in a new era of “agile data warehousing” starting with SAP’s HANA One in late 2012 followed almost immediately by Amazon RedShift, and then IBM’s BLU for Cloud technology preview later in 2013.  And appliances as a delivery model for big data continue to prove capable and cost-effective — for use cases that fit their economics and sweet spot.

My bold (and some not so bold) predictions for 2014 and beyond:

  • In-memory and solid-state data warehousing will emerge as a viable solution for the “less than 50 TB” crowd, while traditional media will remain king on the 100+ TB implementations
  • BLU Acceleration from IBM will emerge as the first of the next-generation in-memory systems that solve many of the current problems associated with in-memory (cost, scalability)
  • Cloud analytics and data warehousing will consistently chip away at the low-end of the appliance market relegating it to large-scale, highly targeted implementations by the end of 2015, with private and public cloud implementations emerging as the dominant player for the entry-level, SMB, and development environments
  • The company that solves the big data ecosystem seamlessly, providing compatible solutions from the low-end all the way up to the high-end with tight ecosystem integration will have the greatest successes (IBM, Oracle, and Teradata are all well-positioned to take this role!)

So let’s enjoy the last days and hours of 2013, and look to the future in the highly competitive big data landscape.

And as we look to the future, we must remember the past, and recognize that the problem of big data is not a new one, but one that we have been dealing with for a long long time.  Indeed, I stumbled upon this lovely quote  from Daniel Boorstin dating back to 1983 when the effort was undertaken to computerize the Library of Congress:

Technology is so much fun but we can drown in our technology. The fog of information can drive out knowledge.

Sound familiar?

Happy New Year to all my Big Data colleagues!