Site Reliability Engineering: How Google Runs Production Systems

Name: Site Reliability Engineering: How Google Runs Production Systems
Rating: 5 (2 reviews)

ISBN-10: 1491951176
ISBN-13: 9781491951170
Pages: 552
Language: English
Published: 2016-03-23
Publisher: "O'Reilly Media, Inc."
Authors: Chris Jones, Betsy Beyer, Jennifer Petoff

Description

The overwhelming majority of a software system’s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems? In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization. This book is divided into four sections: Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE) Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems Management—Explore Google's best practices for training, communication, and meetings that your organization can use

Get the book

Other editions

Site Reliability Engineering: How Google Runs Production Systems
- 2016-03-23
- 552 pages
- Paperback
- "O'Reilly Media, Inc."

Similar books

Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems
By Betsy Beyer, Heather Adkins, Paul Blankinship
Expensive For example, scientists lost the $125 million Mars Climate Orbiter because two separate engineering teams used different units of measurement (imperial versus metric). As before, strong types are a solution to this issue: they ...
The Site Reliability Workbook: Practical Ways to Implement SRE
By Betsy Beyer, Niall Richard Murphy, David K. Rensin
But not completely evenly distributed, as William Gibson might say. 15 See relevant research at https://devops-research.com/research.html. 16 See http://en.wikipedia.org/wiki/Goodhart%27s_law and ...
Database Reliability Engineering: Designing and Operating Resilient Database Systems
By Laine Campbell, Charity Majors
This book covers: Service-level requirements and risk management Building and evolving an architecture for operational visibility Infrastructure engineering and infrastructure management How to facilitate the release management process Data ...
Hands-on Site Reliability Engineering: Build Capability to Design, Deploy, Monitor, and Sustain Enterprise Software Systems at Scale (English Edition)
By Shamayel M. Farooqui, Vishnu Vardhan Chikoti
This book covers how to track and monitor application performance using Grafana, Prometheus, and Kibana along with how to extend monitoring more effectively by building full-stack observability into the system.
Seeking SRE: Conversations About Running Production Systems at Scale
By David N. Blank-Edelman
Inspired by that earlier work, this book explores a very different part of the SRE space. The more than two dozen chapters in Seeking SRE bring you into some of the important conversations going on in the SRE world right now.
Practical Site Reliability Engineering: Automate the process of designing, developing, and delivering highly reliable apps and services with SRE
By Pethuru Raj Chelliah, Shreyash Naithani, Shailender Singh
In this chapter, we are going to focus on the various ways and means of bringing up the reliability assurance factor by ... There are machine learning algorithms besides big and fast data analytics platforms to speed up and simplify the ...
97 Things Every SRE Should Know
By Emil Stolarsky, Jaime Woo
Knowing how to keep systems reliable has become a critical skill. With this practical book, newcomers and old hats alike will explore a broad range of conversations happening in SRE.
Site Reliability Engineering (Sre) Handbook: How Sre Implements Devops
By Stephen Fleming
Here is your chance to dive into the SRE role and know what it takes to be and implement best SRE practices. The DevOps, Continuous Delivery and SRE movements are here to stay and grow, its time you to ride the wave!
Continuous Delivery and Site Reliability Engineering (SRE) Handbook: Non-Programmer's Guide
By Stephen Fleming
This book goes in detail about DevOps Culture, Microservices Architecture, How to automate deployment using Kubernetes and How Google's SRE and DevOps philosophies overlap.
Real-World SRE: The Survival Guide for Responding to a System Outage and Maximizing Uptime
By Nat Welch
What you will learnMonitor for approaching catastrophic failureAlert your team to an outage emergencyDissect your incident response strategiesTest automation tools and build your own softwarePredict bottlenecks and fight for user ...