Sie sind auf Seite 1von 44

CIRCUIT BREAKER

DESIGN PATTERN
MADHAV KELKAR
04/09/2020@12:05pm
OBJECTIVE
• Handling service slow downs gracefully
• Understand transient and non-transient failures
• How Circuit Breaker pattern handles non-transient failures?
• How Circuit Breaker works?
AGENDA
• Transient and Non-Transient Failures in Distributed Systems
• What is a Circuit Breaker?
• Circuit breaker for non-transient failures
• How circuit breaker works?
• CloudGuard Usage of circuit breaker & Future work
TRANSIENT FAILURE SCENARIO
RETRY & TRANSIENT FAILURES

Order Service Inventory Service


REST API

healthy

slow

down
RETRY & TRANSIENT FAILURES

Order Service Inventory Service

Thread 1 429 – Too many requests

healthy

slow

down
RETRY & TRANSIENT FAILURES

Order Service Inventory Service


REST API

Thread 1

Retry Thread 2

healthy

slow

down
NON-TRANSIENT FAILURE SCENARIO
RETRY & NON-TRANSIENT
FAILURES

Order Service Inventory Service

healthy

slow

down
RETRY & NON-TRANSIENT
FAILURES

Order Service Inventory Service

Timeout!
Thread 1 Thread 1

healthy

slow

down
RETRY & NON-TRANSIENT
FAILURES

Order Service Inventory Service

Timeout!
Thread 1
Thread 1
Retry Thread 2 Thread 2

healthy

slow

down
RETRY & NON-TRANSIENT
FAILURES

Order Service Inventory Service

Timeout!
Thread 1
Thread 1
Thread 2 Thread 2
Thread 3 Thread 3

healthy

slow

down
RETRY & NON-TRANSIENT
FAILURES

Order Service Inventory Service

Timeout!

healthy

slow

down
RETRY & NON-TRANSIENT
FAILURES

Order Inventory
Service Service

healthy

slow

down
RETRY & NON-TRANSIENT
FAILURES

Order Inventory
Service Service

healthy

slow

down
RETRY & NON-TRANSIENT
FAILURES

Order Inventory
Service Service

healthy

slow

down
RETRY & CASCADING FAILURES

Order Inventory
Service Service

healthy

slow

down
RETRY & CASCADING FAILURES

Order Inventory
Service Service

healthy

slow

down
LEARNINGS
• Retry helps in case of transient failures
LEARNINGS
• Retry helps in case of transient failures
• For non-transient failures, a different strategy is needed – this is where circuit breaker helps
WHAT IS A CIRCUIT BREAKER?
CIRCUIT BREAKER & NON-TRANSIENT FAILURE
CIRCUIT BREAKER & NON TRANSIENT FAILURES

Closed
Order Service Inventory Service
state

Circuit Breaker

healthy

slow

down
CIRCUIT BREAKER & NON TRANSIENT FAILURES

Closed
Order Service Inventory Service
state

Exception Timeout!
Thread 1 Thread 1

healthy

slow

down
CIRCUIT BREAKER & NON TRANSIENT FAILURES

Open
Order Service Inventory Service
state

Thread 1
Thread 1
Thread 2
Thread 3

healthy

slow

down
CIRCUIT BREAKER & NON TRANSIENT FAILURES

Half-
Order Service open Inventory Service
state

Thread 1
Thread 1
Thread 2 Thread 2

healthy

slow

down
CIRCUIT BREAKER & NON TRANSIENT FAILURES

Close
Order Service Inventory Service
state

Thread 1
Thread 1
Thread 2 Thread 2
Thread 3 Thread 3

healthy

slow

down
CIRCUIT BREAKER INTERNALS
• State Machine
• State Transition Configuration
• Thresholds
• Metrics
CIRCUIT BREAKER STATES

Closed

All Calls Permitted

Half Open Open

Few Calls Permitted No Calls Permitted


CIRCUIT BREAKER STATES

Closed
Failure rate > threshold
All Calls Permitted

Half Open Open

Few Calls Permitted No Calls Permitted


CIRCUIT BREAKER STATES

Closed
Failure rate > threshold
All Calls Permitted

Half Open Open


Timeout expired
Few Calls Permitted No Calls Permitted
CIRCUIT BREAKER STATES

Closed
Successful calls Failure rate > threshold
All Calls Permitted

Half Open Open


Timeout expired
Few Calls Permitted No Calls Permitted
CIRCUIT BREAKER CONFIG
CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom()

.slowCallDurationThreshold(Duration.ofSeconds(20))

.slowCallRateThreshold(50)

.failureRateThreshold(50)

.waitDurationInOpenState(Duration.ofSeconds(30))

.slidingWindow(10, 5, SlidingWindowType.COUNT_BASED).build();
CIRCUIT BREAKER METRICS
• Failure rate
• Slow call rate
• Failed calls
• Successful calls
CLOUDGUARD USAGE
• Uses resiliency4j 1.3.1
CLOUDGUARD USAGE
• Uses resiliency4j 1.3.1
• Outbound calls to OCI services
CLOUDGUARD USAGE
• Uses resiliency4j 1.3.1
• Outbound calls to OCI services
• Use default/cached values for failed service calls
CLOUDGUARD USAGE
• Uses resiliency4j 1.3.1
• Outbound calls to OCI services
• Use default/cached values for failed service calls
• Capturing Retry-metrics using micrometer
DEFAULT VALUES
Supplier<Map<String, Set<MetadataSignalKey>>> supplier =
CircuitBreaker.decorateSupplier(circuitBreaker, this::getTenantCompartmentMap);
tenantMetadataMap = Try.ofSupplier(supplier)
.recover(throwable -> {
log.warn("caught exception, using default value ", throwable);
return tenantMetadataMap;
}
).get();
return tenantMetadataMap;
METRICS
CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(circuitBreakerConfig);
circuitBreaker = registry.circuitBreaker(”identityService");

meterRegistry = new SimpleMeterRegistry();

TaggedCircuitBreakerMetrics
.ofCircuitBreakerRegistry(registry)
.bindTo(meterRegistry);
FUTURE WORK
• Fault simulation using http-callgraph analyzer
• http-callgraph-analyzer is a java agent that injects custom code in OCI-JAVA-SDK
• Instrumenting the HTTP client to simulate network latencies and partitions
• For more info, please see - callgraph-analyzer-readme
PEGASUS INTEGRATION
• OOTB Circuit Breaker integration with OCI-JAVA-SDK from version 1.15.4
• Uses resilience4j V1.2.0
• Collaborating with Pegasus team to integrate resilience4j/micrometer metrics
USEFUL LINKS
• Callgraph Analyzer -
https://bitbucket.oci.oraclecorp.com/projects/SECCEN/repos/streaming-apps/browse/http-
callgraph-analyzer/README.md
• Pegasus library - https://
bitbucket.oci.oraclecorp.com/projects/PEG/repos/service-framework-library/browse/sfw-circ
uitbreaker
• Resilience4j github - https://github.com/resilience4j/resilience4j
• Micrometer - https://micrometer.io/
• Scala - https://doc.akka.io/docs/akka/current/common/circuitbreaker.html
• Python - https://pypi.org/project/circuitbreaker/

Das könnte Ihnen auch gefallen