Menu

gitpiper

SRE at Google: Using load shedding to survive a success disaster | Google Cloud Blog

Learn how to deploy load shedding, a technique that allows your system to serve nominal capacity, regardless of how much traffic is being sent to it, in order to maintain availability.

SRE at Google: Using load shedding to survive a success disaster | Google Cloud Blog

Loading Stats

Last Updated: 6 June 2025

Loading Readme


64 Projects and apps Similar to "SRE at Google: Using load shedding to survive a success disaster | Google Cloud Blog" in June 2025

  • The Realities of the Job of Delivering Reliability | USENIX

  • Fail at Scale - ACM Queue

  • AWS re:Invent 2014 | (PFC305) Embracing Failure: Fault-Injection and Service Reliability

    Complex distributed systems fail they fail more frequently and in different ways as they scale and evolve over time in this session you learn how netfli

  • 10 Years of Crashing Google | USENIX

  • How we break things at Twitter: failure testing

    How we break things at twitter failure testing

  • Reliable Cron across the Planet - ACM Queue

  • Push our limits - reliability testing at Twitter

    Push our limits reliability testing at twitter

  • The Verification of a Distributed System - ACM Queue

  • Weathering the Unexpected - ACM Queue

  • SRE Hour: Tech Talks by Box & Yelp

    First up demetri mouratis senior staff sre at box will speak on service discovery at box after facing issues with its puppet based solution box is exper

  • Simplicity: A Prerequisite for Reliability

    Independent consultant who helps nice companies embrace the good parts of the cloud

  • The Two Sides to Google Infrastructure for Everyone Else

    My talk from velocity santa clara the format was a debate between myself looking at the pros and cons around adopting software and practices from other organisations wholesale using the gifee meme as an example

  • How Embracing Continuous Release Reduced Change Complexity | USENIX

  • Making “Push On Green” a Reality | USENIX

  • BeyondCorp: A New Approach to Enterprise Security | USENIX

  • DevOpsDays Chicago 2015 - Brainstorming Failure by Jeff Smith

    Brainstorming failure by jeff smith help us caption translate this video http amara org v hdun

  • The Ripple Effect Of Outages And Downtime Cannot Be Underestimated

    Outages and downtime cdn performance series provided by dyn the internet is the front door to commerce in today s always on global environment an internet performance issue such as an outage

  • The infrastructure behind Twitter: efficiency and optimization

    The infrastructure behind twitter efficiency and optimization

  • Dickerson’s Hierarchy of Reliability

    Visibility observability incident response postmortem root cause analysis testing release procedures capacity planning product development

  • The Morning Paper on Operability | the morning paper

  • Production is all that matters - naildrivin5.com - David Bryant Copeland’s Website

    Production is all that matters june 16 2013 this is important it has to do with your treatment and reaction t

  • SRE at Google: How to avoid a self-inflicted DDoS Attack | Google Cloud Blog

    Learn about one of the most common software architecture design fails the self inflicted d do s and three methods you can use to avoid it in your own application

  • Don’t gamble when it comes to reliability

    How do you stay reliable when you can t keep the whole system in your head tom croucher discusses uber s approach to reliability

  • Resilience Engineering: Learning to Embrace Failure - ACM Queue

  • The Infrastructure Behind Twitter: Scale

    The infrastructure behind twitter scale

  • Scaling Reliability at Twitter: So You Want to Add a 9

    We choose microservice architectures for many different reasons including often improving reliability however there is a dark side modular systems have

  • PRINCIPLES OF CHAOS ENGINEERING - Principles of chaos engineering

  • Chaos Engineering

    Modern software based services are implemented as distributed systems with complex behavior and failure modes many large tech organizations are using experimentation to verify such systems reliability netflix engineers call this approach chaos engineering they ve determined several principles un

  • SRE at Google: What is availability and what does it mean | Google Cloud Blog

    This post defines what availability means and helps you determine if your system is succeeding

  • How Google Backs Up the Internet Along With Exabytes of Other Data - High Scalability -

    Raymond blum leads a team of site reliability engineers charged with keeping google s data secr

  • Performance, Scalability, and High Availability: 3 Key Infrastructure Adaptability Requirements - High Scalability -

    This is a guest post by tony branson performance scalability and ha are often used i

  • SRE at Google: Reliable releases and rollbacks | Google Cloud Blog

    Learn the three basic tasks engineers responsible for the system s reliability should consider

  • SRE at Google: How release canaries can save your bacon | Google Cloud Blog

    In software a canary process is usually the first instance that receives live production traffic about a new configuration update either a binary or configuration rollout

  • Things I Learned Managing Site Reliability for Some of the World’s Busiest Gambling Sites

    Tl dr for several years i managed the 3rd line site reliability operation for many of the world s busiest gambling sites working for a little known company that built and ran the core backen

  • Intro: Every Day Is Monday in Operations

    Editor s note this is part of the series every day is monday in operations all posts in the series will link back to this introduction and every post will be linked in the index below as they are published

  • Under the Hood: Ensuring Site Reliability — Squarespace / Engineering

    Squarespace hosts millions of websites on our cloud based website building platform the reliability of these sites is a top priority for us we re consistently implementing the best technologies and safeguards to enable quick load times and prevent outages the measures we take allow our customers

Subscribe to our Newsletter

Subscribe to get resources directly to your inbox. You won't receive any spam! ✌️

© 2025 GitPiper. All rights reserved

Rackpiper Technology Inc

Company

About UsBlogContact

Subscribe to our Newsletter

Subscribe to get resources directly to your inbox. You won't receive any spam! ✌️