Too Much Data? Cleversafe's Got Your Back

Written by Amina Elahi
Published on Jul. 12, 2012

Don’t you hate when you have petabytes of data and no efficient way to store and analyze them? Well, for most of us, this isn’t a problem. But for others who deal with huge quantities of data--the U.S. government, say--Cleversafe has plans for a Dispersed Compute Storage solution that will cut infrastructure costs, reduce required storage capacity and improve data integrity. If that’s not enough, they’ll also facilitate the convergence of computation and storage for even those with unimaginable amounts of data.

Cleversafe plans to achieve this by marrying the power of Apache Hadoop’s MapReduce with their own Object-based Dispersed Storage System. Traditionally, data storage and analysis are treated as distinct phases of computation systems. When data has to move from place to place, it takes longer to process, which frequently results in bottlenecks. In small quantities, it’s less of an issue but for organizations dealing in massive amounts of data, traditional models are a nightmare.

“The key to reducing both cost and complexity is to combine computation with dispersed storage,” said Chris Gladwin, CEO and President of Cleversafe. “Cleversafe’s solution will provide infinitely scalable,  reliable, and cost effective storage for data to support massive computation while enhancing the analysis workflow.”

This video from ZDNet does a good job explaining how Cleversafe’s system works.

Hadoop works to bring computation to where the data is, rather than treating the two as separate entities. However, it struggles to keep up when the ask is too big. For example, since it houses metadata in one server, a failure can spell serious short-term or permanent data loss. It also backs data up by creating three copies, which is virtually unscalable for higher data sizes. Cleversafe, on the other hand, employs erasure coding, the method by which it is able to build large-scale clouds like the one it’s currently developing with Lockheed Martin to protect federal agencies’ data.

Bottom line: Cleversafe is proposing a system that cuts out the bottleneck by bringing computation (easily mobile) to data (hard to transport), making the process simultaneous rather than sequential.

Is this a failproof system? Of course not. Nothing is guaranteed. A recent TechCrunch article pointed out that erasure coding “is math heavy and requires considerable system resources to manage.” Erasure coding is certainly not suited for every application, but as the process grows more sophisticated, it is also becoming more efficient and able to handle higher performance needs.

Wrote David Vellante at Wikibon:

We expect erasure coding techniques to continually move up the performance spectrum, supporting not only archiving but increasingly more business critical applications...It's just a matter of time in our view.

Cleversafe’s proposal is complicated and technical at best. To put it in layman’s terms, though, it’s a way to manage mass amounts of data more efficiently and cheaply than ever before. Looks like I’ll finally have a way to store my data and analyze it, too.

Hiring Now
Atlassian
Cloud • Information Technology • Productivity • Security • Software