Lean DevOps: Lessons Learned From Innovation-Driven Companies

Written by New Relic
Published on May. 19, 2015
Lean DevOps: Lessons Learned From Innovation-Driven Companies

By Fredric Paul

These days, lots of people are trying to put together models for DevOps success. One of the most interesting approaches I’ve seen was presented by Xavier Amatriain at IEEE’s DevOps Unleashed symposium in Silicon Valley last month.

Amatriain, who recently became VP of engineering at Quora after spending significant time at Netflix and Telefonica, developed what he calls the CASSSH model of DevOps, supplemented with real-world lessons learned during multiple DevOps implementations. His presentation—Lean DevOps: Lessons Learned from Innovation-Driven Companies—focused on how to boost the speed of innovation:

CASSSH = Cost, Availability, Scalability, Speed, Security, DevOps Happiness

Amatriain’s CASSSH model boils DevOps down to a simple acronym. Let’s look at each component in detail:

Cost: In the world of DevOps, Amatriain said, it’s not so trivial to optimize for cost. You have to consider short-term vs. long-term costs, the differences between in-house and outsourced expenses, and whether it makes more sense to invest in additional servers or in lean DevOps processes that could make the most of those resources.

Availability: When it comes to availability, Amatriain said, the question is, How many “nines” of reliability do you need, and how much are you willing to pay for an extra nine? He invoked the Christmas Eve paradigm, noting that when AWS brought down Netflix on Christmas Eve 2012, the relatively short outage had an outsize effect on availability. You can work for a whole year to hit 99.99% availability, he said, but if your site goes down at the wrong time, everything can go down the drain in just a couple of hours.

Scalability: “Are you ready to be big?” Amatriain asked. And, if so, what is the probability of that happening? How much will that growth cost? Does it make sense to spend for scalability now, or should you outsource that possibility to the cloud so you don’t have to worry about it until it happens?

Security: It’s always better to be safe than sorry, Amatriain said, but security also carries its own costs. How much security can you afford?

DevOps Happiness: Last, but not least, Amatriain noted, in order to optimize innovation and productivity, you need a team that is happy to do their work. And he noted that DevOps Happiness is inversely correlated with the probability of being woken up in the middle of the night with a problem.

Xavier Amatriain

Xavier Amatriain presents at IEEE’s DevOps Unleashed symposium.

Tips for solving the DevOps dilemma

When you put it all together, Amatriain said, CASSSH is not so much a model as a dilemma. How do you optimize—or trade off—all these dimensions to find an ideal operating point?

Fortunately, Amatriain also shared a few useful lessons learned:

  1. It’s easy to address a particular cost variable. But it’s more effective to go to the root of the cost and try to fix things at that level.
  2. Quality pays off … eventually. You may have to invest a bit in code quality at the beginning, but it saves money in the long run. If you want to write a throwaway prototype, then why care about code quality? But for most production projects, you need to compute when the initial costs of investing in quality will outweigh the delayed costs of having to clean up the technical debt you created by doing things quick and dirty.
  3. All the CASSSH dimensions can be tested. It might not seem obvious, but testing actually allows you to move faster. Amatriain suggested inserting staff testers into small dev teams where they can be evangelists of testing and code quality. This can put pressure on the testers, he acknowledged, but it’s “better than in being a room with 50 other testers all doing the same things.”
  4. Pair programming code review rocks. “Nothing beats sitting down next to someone and asking questions about every line of code,” Amatriain said. Having one programmer say to another, “Convince me that what you’ve done is doing the right thing” can carry more weight than the same question coming from the outside.
  5. When things break, what you do about them makes a huge difference. It’s important to remember that notifications are not alerts. An alert is something that you can act on, not just something that’s “good to know.” If you can’t take action on it, it’s just a notification.
  6. Metrics, metrics, metrics. You need to be able to measure the right thing, Amatriain said, which means you need to understand what the right thing is.
  7. Everything is production, or “production is the new dev.” If people don’t pay the same attention to quality in a testing environment that they would for a production feature, Amatriain said, that can cause trouble when a feature that began as simply an experiment suddenly gets to production and start accumulating technical debt because it wasn’t fully tested earlier.
  8. Teams optimizing for competing dimensions is a recipe for disaster, Amatriain said. If one team is optimizing for cost while another is optimizing for speed, for example, you have a big problem. The solution is to make sure goals are shared across dimensions, so the cost team has some understanding of what speeds are acceptable (and vice versa). The idea is that within a particular cost range you can do whatever you want. But if you go beyond that, you’ll get pushback from the cost team.

Both DevOps teams and entire companies benefit from being lean, Amatriain concluded, but being lean can also add instability and risk. But improving processes by utilizing the CASSSH model can help encourage innovation and boost innovation velocity while also lowering risk.

About the Author

Fredric Paul (aka The Freditor) is Editor in Chief for New Relic. He's an award-winning writer, editor, and content strategist who has held senior editorial positions at ReadWrite, AllBusiness.com, InformationWeek, CNET, Electronic Entertainment, PC World, and PC|Computing. His writing has appeared in MIT Technology Review, Omni, Conde Nast Traveler, and Newsweek, among other places. View posts by Fredric Paul.

Hiring Now
Route
Consumer Web • eCommerce • Information Technology • Insurance • Mobile