In 2016, Google’s Site Reliability Engineering book ignited an industry discussion on what it means to run production services today—and why reliability considerations are fundamental to service design. Now, Google engineers who worked on that bestseller introduce The Site Reliability Workbook, a hands-on companion that uses concrete examples to show you how to put SRE principles and practices to work in your environment. This new workbook not only combines practical examples from Google’s experiences, but also provides case studies from Google’s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t. Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is. You’ll learn: How to run reliable services in environments you don’t completely control—like cloud Practical applications of how to create, monitor, and run your services via Service Level Objectives How to convert existing ops teams to SRE—including how to dig out of operational overload Methods for starting SRE from either greenfield or brownfield
... Robin load shedding, Load Shedding and Graceful Degradation load tests, Overload Behavior and Load Tests lock services, Lock Service, Distributed Coordination and Locking Services logging, Examine Lustre, Storage M machines defined, ...
Expensive For example, scientists lost the $125 million Mars Climate Orbiter because two separate engineering teams used different units of measurement (imperial versus metric). As before, strong types are a solution to this issue: they ...
Inspired by that earlier work, this book explores a very different part of the SRE space. The more than two dozen chapters in Seeking SRE bring you into some of the important conversations going on in the SRE world right now.
Practical advice that does exist usually assumes that your team already has the infrastructure, tooling, and culture in place. In this book, recognized SLO expert Alex Hidalgo explains how to build an SLO culture from the ground up.
This book covers how to track and monitor application performance using Grafana, Prometheus, and Kibana along with how to extend monitoring more effectively by building full-stack observability into the system.
Establishing SRE Foundations offers a concise and practical introduction to SRE that focuses specifically on how to drive successful adoption in your own software delivery organization.
This book goes in detail about DevOps Culture, Microservices Architecture, How to automate deployment using Kubernetes and How Google's SRE and DevOps philosophies overlap.
Knowing how to keep systems reliable has become a critical skill. With this practical book, newcomers and old hats alike will explore a broad range of conversations happening in SRE.
Martin Fowler and James Lewis, who initially proposed the term microservices, define the architecture in their seminal blog post as: ...a particular way of designing software applications as suites of independently deployable services.
Here is your chance to dive into the SRE role and know what it takes to be and implement best SRE practices. The DevOps, Continuous Delivery and SRE movements are here to stay and grow, its time you to ride the wave!