Massive learning at the 2013 Strata Conference

by Orlando Saez
March 8, 2013

Summary: The 2013 Strata conference went deep into the world of Hadoop. We gathered big data use cases across many industries, discovered how companies are driving value and cost savings based on predictive analytics, data visualization and more. In case you missed it, the videos and slides are here.

This was CityScan's first time at the Strata Conference held in Santa Clara, CA this year. Our goal was to dive into the technology and business models to gain a better understanding of the industry supply-chain vendors and providers as we quickly ramp up our approach to delight our customers. Dave and I focused on the business angle while Cory was geeking it out with other data scientists.

AMAZING is the best word to describe everything we soaked up over three days.

Here's a list of takeaways and themes from the conference:

(1) The Hadoop distribution competition is hot.

(2) Bringing SQL-like querying capabilities to NoSQL databases is democratizing Big Data.

(3) Big boys like EMC, Intel are validating the Hadoop market.

(4) Small is the new big with regard to common, consistent taxonomies. Extract what's essential and drive (near) real-time results. Look into depth, not breadth.

(5) Streaming and deep data processing of small data in real-time trumps batch processing. Map then reduce.

(6) Predicting and identifying trends drive more accurate view of customers anywhere over just description and static analysis of data.

(7) Internet is growing as a mirror. Hyper-personalization is creating new problems. Netflix's House of Cards has about a dozen trailers playing based on your previous ratings and viewing habits. Amazon is using predictive models to drive impulse buying. News aggregators show mostly what I personally care about, and hide the rest. Is serendipitous discovery fading?

(8) Data systems are all about humans. To have fault tolerant systems, we need to architect around human fault tolerance first.

(9) Realizing value is critical, but remains elusive to many. Avoid the movie Groundhog Day, recognize the patterns or it's going to be deja vu. If we don't learn from past experience, we'll miss the moment.

(10) Bridging TRUST between data scientist and business/government leaders remain the #1 barrier for the widespread adoption and impact of big data.

(11) Smart data is a strategic asset and organizations need to treat it like one.

(12) Algorithms and human judgement are complimentary. Algorithms are great at ingesting data to arrive at possible predictions. Humans are best at looking at potential predictions and leveraging unstructured data, applying judgement on curation and fostering relationships (

(13) If software is eating the world, what is big data doing?

Notable projects, vendors and references.

  • Berkeley Stack: Spark, Shark, and Spark Streaming a strong machine learning tool
  • PlatforaTableau and Pentaho to visualize and analyze big data.
  • Revolution Analytics growing enterprise based statistical and predictive analytics platform
  • Fuzzy Logix claims a modular approach to analytics without lifting your data
  • Mu Sigma - Decision sciences and analytics solutions best left to professionals
  • ArcGIS Online by ESRI - Cloud distribution and analysis of geo-reference assets
  • MongoDB - we already use it and love it. SoftLayer taking it BDAAS
  • Real Time Big Data Analytics, great white paper on architecture and tools
  • DiscoverText unstructured text analytics can figure a lot from conversations
  • Skytree doing interesting things in machine learning with new algorithms, data representation and ingesting big.

Without a doubt, big data and prediction will fundamentally change many sectors, especially government where smarter decision tools are essential in an environment of doing more with less.


Repost from


Chicago startup guides

Best Companies to Work for in Chicago
Coolest Offices in Chicago Tech
Best Perks at Chicago Tech Companies
Women in Chicago Tech