Accelerate: The Science of Lean Software and DevOps - Thoughts and Review

Accelerate: The Science of Lean Software and DevOps - Thoughts and Review
"Accelerate: The Science of Lean Software and Devops" by Nicole Forsgren, Jez Humble, Gene Kim

My Story

During my 15 years as a Software Engineer, there were certain practices I learned that had a positive impact on software delivery and team morale. Examples include adopting CI/CD, adding automation tests, encouraging smaller items and having engineers involved in the entire workflow from customer to release.

For CI/CD and automation tests, the team agreed it was productive and a good idea, as manual processes are inefficient and always have room for human error. But the team was split on the benefits of having smaller items over large items. I prefer smaller items, as they have better code reviews and more refactoring requests, easier to stay focused and are generally completed on time. But for others on the team, splitting items into smaller ones wasn't worth the effort, as larger items were faster to finish by a single developer. Additionally, splitting up work introduced more "admin overhead" and reduced time for actual coding. So how do we subjectively debate which is better? Or perhaps more importantly, how do we measure which is better?

In the book, "Accelerate: The Science of Lean Software and DevOps", the authors and researchers not only drove into the best practices high-performing teams adopt but also discussed the research and theory as to why they were effective. And, just as importantly, they provided a way to measure high-performing teams.

Accelerate: The Science of Lean Software and DevOps

Puppet Labs, Dr Nicole Forsgren, Jez Humble, and Gene Kim conducted a four-year research into software delivery performance and how to measure performance and what actions drive it. In addition, they also examined how software delivery impacts company performance, efficiency, profits, customer satisfaction, and engineers' well-being.

You can read it more at https://cloud.google.com/architecture/devops or their book Accelerate: The Science of Lean Software and DevOps.

The book mainly presents the research findings and the science behind them. If you are interested in how to introduce or apply these engineering best practices, there are many better books on amazon and articles on google for it. However, I still highly recommend reading the book to understand better why certain practices are good or bad. And also valuable for those wanting data-driven evidence and sound theories of what makes a high-performing team before adopting change in your team or organisation.

The Research

The research was conducted from 2014-2017 and included data collected from the 'State of DevOps' survey.

Participants:

  • Over 2000 unique organisation
  • 23K respondents from all around the work
  • Different organisation sizes from under 5 to 10K employees
  • All industries include highly regulated finance, healthcare and government.
  • Startups to well establish companies
  • Greenfield to legacy product
  • Waterfall, DevOps and Agile delivery methods

The 4 Key Metrics

One of the core outcomes of the research was four key metrics that distinguish a high-performing team.

  • Lead Time: The time from code committed to running in production.
  • Deployment Frequency: How often deploys happen.
  • Mean Time To Restore (MTTR): How quickly can teams restore service after production outages.
  • Change Fail Rate: What percentage of deploys result in service impairment or an outage.

The researchers classified organisations into low, medium and high-performing teams based on the four metrics. They found that high-performing teams were twice as likely to exceed organisational business goals such as profitability, productivity and market share. They were also twice as likely to exceed non-commercial goals such as customer satisfaction, quality of product or mission goals. And that employees were twice as likely to recommend the company to others.

As a benchmark, in 2016, high-performing teams had

  • Lead Time: Less than one hour
  • Deployment Frequency: On demand (multiple deploys per day)
  • Mean Time To Restore (MTTR): Less than one hour
  • Change Fail Rate: 0-15%

While low-performing teams had

  • Lead Time: Between one month and six months
  • Deployment Frequency: Between once per month and once every six months
  • Mean Time To Restore (MTTR): Less than one day
  • Change Fail Rate: 16-30%

Source: Accelerate: The Science of Lean Software and DevOps

An important note is that the four metrics aren't the goal of the business, nor are they "leading indicators" of what to improve. Instead, they are the overall performance of your engineering team. Think of the metrics similar to how good a car is. Speed, fuel efficiency, cargo space and handling are suitable measures of how great a car is, but it doesn't tell you precisely what needs to be improved or changed to make it better. So, for example, what exact car parts or changes increase fuel efficiency?

24 Key capabilities for high performing team

The research investigated what actions led to delivery performance, identified 24 Key capabilities, and grouped them into five categories.

Continuous delivery

  • Use version control for all production artifacts, including application code and application configuration
  • Automate your deployment processes
  • Implement continuous integration (CI)
  • Use trunk-based development with short-lived branches
  • Implement automated test
  • Support test data management
  • Shift left on security
  • Implement continuous delivery (CD)

Architecture

  • Use a loosely coupled architecture
  • Empowered teams to make their own decisions

Product and Process

  • Gather and implement customer feedback
  • Support the entire team to understand the complete workflow from business to customer
  • Work in small batches
  • Encourage experimentation

Lean management and Monitoring

  • Have a lightweight change approval process
  • Have monitor across applications and infrastructure to inform business decisions
  • proactively monitoring of system
  • Implement Work-in-progress (WIP) limits to help improve the process
  • Visualise work, monitor quality and communication throughout the team

Cultural

  • Support a Westrum generative culture
  • Create a culture of learning
  • Support and facilitate collaboration among teams
  • Endeavour to make work meaningful
  • Support transformational leadership

Key Takeaway - Continuous Delivery

Teams that automate builds, tests and deployments perform better in the key metrics. In addition, the research shows that you don't need to trade off between tempo and stability and that adopting Continuous Delivery improves both velocity and quality.

For those unfamiliar with CI/CD, Continuous Integration is a practice where automated unit tests and builds are done once code is merged into a central repository. Continue Delivery extends on this by deploying the build to a testing environment where further tests can be run, such as UI testing, load testing, and integration testing. Continue Deployment is when there is no manual approval before an update to production. See AWS Continuous Delivery Explained for more information.

Any manual process is something the team should always look to remove as it takes time away from people to do high-value work such as problem-solving. Manual processes are also slower and prone to errors. In the case of deployment, they can also lead to employee burnout, where manually deploying updates to production can cause fear and anxiety.

"Before implementing the technical practices and discipline of continuous Delivery on the Bing team at Microsoft, engineers reported work/life balance satisfaction scores of just 38%. After implementing these technical practices, the scores jumped to 75%" - Deployment Pain form Accelerate.

I can directly relate to this. When I was an Engineer, I also remember dreading being on call to support an upgrade of a client application. There was no Continuous Delivery, and Operations or Engineers would perform all upgrades after hours. In addition, upgrades for premium clients were scheduled over the weekend, so the engineering team could intervene and bring it back online if something went wrong. And there would generally be a handful of people directly involved or on standby support. I always felt anxious about getting late-night calls, and I would regularly check my message on Saturday morning to confirm everything ran smoothly before feeling OK to go out. While it was essential, I never thought it was sustainable as it impacted work-life balance.

Key Takeaway - Architecture

The researcher mentioned that building architecture around business outcomes based on loosely coupled products, and services is better than architecture based on tools, technology or systems.

The main benefits of decoupled architecture are

  • Able to make large-scale changes without permission from another team
  • Able to make large-scale changes without depending on another team
  • Able to make changes without communication with another team
  • Reduce reliance on integration tests
  • Able to be tested and released independently.

Up to now, I've always only looked at improving our team's outcome and members' well-being, but I haven't thought much about higher-level organisation architecture and how to improve it. However, we have experienced the inefficiency and delays associated with relying on another team to make large-scale changes, so it's great to have ideas to discuss with upper management on improving organisational performance by using loosely coupled architecture.

Key Takeaway - Product and Process

Short release cycles, a culture of experimentation, and regularly seeking feedback from clients had a notable impact on team performance. This is because delivering value and getting feedback from users as quickly as possible lead to better ideas.

To build on this further, the entire team, including engineers, should understand the workflow from the initial business idea to the end customer and have visibility of all the item statuses and features.

Those who practice Agile Scrum will probably be doing this already and know the value of it. In particular, focus on delivering value to users as early as possible, getting feedback, learning and adapting.

Key Takeaway - Lean Management and monitoring

Lean Management, originating from the Toyota production system (TPS), is based on three fundamental principles: delivering value defined by the customer, eliminating waste, and continuous improvement.

A dashboard to view progress and include key productivity metrics, such as items' progress and defects, can help identify waste and blockers for the workflow. The information should be visible and easily accessible to all team members, including Engineers and Leaders.

In addition, using work-in-progress (WIP) limits can help expose obstacles in the overall process and reduce the work burden on the team by encouraging working on a smaller number of tasks. However, a WIP limit alone doesn't bring much value. It's only effective when the team uses WIP and a visual dashboard and has a continuous improvement culture that actively looks for blockers and addresses them with process improvement.

Finally, Lean Management practices positively impact software delivery and team culture and reduce team member burnout.

Key Takeaway - Cultural

The researchers used Westrum's three Organisational cultures -Pathological, Bureaucratic and Generative, as the proxy to research the effects of organisation culture on software delivery performance. The findings were that organisations that use Generative Organisational culture had better performance.

Westrum's Generative culture emphasises high trust and encourages information flow and collaboration, leading to high software delivery performance. The reason is that information and collaboration lead to better decisions, and it's easier to undo a wrong decision if the team is open and transparent.

The author refers to Google's research of "The five keys to a successful Google team", which found similar results. In particular, "Who is on a team matters less than how the team members interact, structure their work, and view their contributions,". And in addition, one of the five key traits of a high-performing team is psychological safety.

The researchers' finding on culture was interesting to read. As an Engineer, I always found that collaboration in a team that focuses on team outcomes rather than individual output often resulted in better ideas, reduced delivery risk, and increased product quality. And as a bonus, it was also more enjoyable and rewarding.

From my experience, having engineers work in silos on a feature for weeks or months reduces overall team and product performance. While it seems more efficient on the surface and in the short term, it has hidden costs such as non-diverse ideas, higher re-works, less chance of code refactoring, key-man risk and lower morale of Engineers.

So it was great to read that a Generative team culture did result in better-performing teams. However, I do find that small focus teams are suitable for some project types. An example is a Proof of Concept (POC) feature, where the code will be thrown away after completion or re-implemented correctly afterwards.

Team Culture of Continous Improvement

The book has some great examples of why each of the 24 key practices leads to a high-performing team. But my biggest lesson from the book was that individual practices alone don't always bring value. And may negatively affect an Engineer's output. For example, a WIP limit might be extra noise on the board and stop the Engineers from picking up more dev items.

It's only combined with other practices where it brings better results. For example, the most significant difference between a high-performing and low-performing team is that high-performing teams always try to improve and focus on team outcomes, not individual developer output. Once your team has this culture of continuous improvement and team outcome focus, it's clearer to see that WIP limits help the team identify blockers in overall team progress, which the team can then act on to improve the system. Likewise, it's clearer why smaller items and fast release cycles allow the team to learn faster and deliver better outcomes.

I am sure in a few years, the 24 key practices will have changed. But so long as your team encourages continuous improvement, is empowered to adapt and acts on it, you'll always be in a good spot.

Thoughts

The book doesn't offer many new things for those who already practice DevOps, Agile and Lean methodologies. However, it does bring all the best and well-established best practices into one place and the data and theory behind why the practices lead to a high-performing team and organisation. For that, it's worthy of the read. For teams that still do manual deployments and manual tests or are sceptical of DevOps, Agile and Lean practices, the book will provide the evidence you need to understand how and where you can help improve your team.

Accelerate: The Science of Lean Software and Devops

Audible Audiobook – Unabridged