Beruflich Dokumente
Kultur Dokumente
DESIGN PATTERN
MADHAV KELKAR
04/09/2020@12:05pm
OBJECTIVE
• Handling service slow downs gracefully
• Understand transient and non-transient failures
• How Circuit Breaker pattern handles non-transient failures?
• How Circuit Breaker works?
AGENDA
• Transient and Non-Transient Failures in Distributed Systems
• What is a Circuit Breaker?
• Circuit breaker for non-transient failures
• How circuit breaker works?
• CloudGuard Usage of circuit breaker & Future work
TRANSIENT FAILURE SCENARIO
RETRY & TRANSIENT FAILURES
healthy
slow
down
RETRY & TRANSIENT FAILURES
healthy
slow
down
RETRY & TRANSIENT FAILURES
Thread 1
Retry Thread 2
healthy
slow
down
NON-TRANSIENT FAILURE SCENARIO
RETRY & NON-TRANSIENT
FAILURES
healthy
slow
down
RETRY & NON-TRANSIENT
FAILURES
Timeout!
Thread 1 Thread 1
healthy
slow
down
RETRY & NON-TRANSIENT
FAILURES
Timeout!
Thread 1
Thread 1
Retry Thread 2 Thread 2
healthy
slow
down
RETRY & NON-TRANSIENT
FAILURES
Timeout!
Thread 1
Thread 1
Thread 2 Thread 2
Thread 3 Thread 3
healthy
slow
down
RETRY & NON-TRANSIENT
FAILURES
Timeout!
healthy
slow
down
RETRY & NON-TRANSIENT
FAILURES
Order Inventory
Service Service
healthy
slow
down
RETRY & NON-TRANSIENT
FAILURES
Order Inventory
Service Service
healthy
slow
down
RETRY & NON-TRANSIENT
FAILURES
Order Inventory
Service Service
healthy
slow
down
RETRY & CASCADING FAILURES
Order Inventory
Service Service
healthy
slow
down
RETRY & CASCADING FAILURES
Order Inventory
Service Service
healthy
slow
down
LEARNINGS
• Retry helps in case of transient failures
LEARNINGS
• Retry helps in case of transient failures
• For non-transient failures, a different strategy is needed – this is where circuit breaker helps
WHAT IS A CIRCUIT BREAKER?
CIRCUIT BREAKER & NON-TRANSIENT FAILURE
CIRCUIT BREAKER & NON TRANSIENT FAILURES
Closed
Order Service Inventory Service
state
Circuit Breaker
healthy
slow
down
CIRCUIT BREAKER & NON TRANSIENT FAILURES
Closed
Order Service Inventory Service
state
Exception Timeout!
Thread 1 Thread 1
healthy
slow
down
CIRCUIT BREAKER & NON TRANSIENT FAILURES
Open
Order Service Inventory Service
state
Thread 1
Thread 1
Thread 2
Thread 3
healthy
slow
down
CIRCUIT BREAKER & NON TRANSIENT FAILURES
Half-
Order Service open Inventory Service
state
Thread 1
Thread 1
Thread 2 Thread 2
healthy
slow
down
CIRCUIT BREAKER & NON TRANSIENT FAILURES
Close
Order Service Inventory Service
state
Thread 1
Thread 1
Thread 2 Thread 2
Thread 3 Thread 3
healthy
slow
down
CIRCUIT BREAKER INTERNALS
• State Machine
• State Transition Configuration
• Thresholds
• Metrics
CIRCUIT BREAKER STATES
Closed
Closed
Failure rate > threshold
All Calls Permitted
Closed
Failure rate > threshold
All Calls Permitted
Closed
Successful calls Failure rate > threshold
All Calls Permitted
.slowCallDurationThreshold(Duration.ofSeconds(20))
.slowCallRateThreshold(50)
.failureRateThreshold(50)
.waitDurationInOpenState(Duration.ofSeconds(30))
.slidingWindow(10, 5, SlidingWindowType.COUNT_BASED).build();
CIRCUIT BREAKER METRICS
• Failure rate
• Slow call rate
• Failed calls
• Successful calls
CLOUDGUARD USAGE
• Uses resiliency4j 1.3.1
CLOUDGUARD USAGE
• Uses resiliency4j 1.3.1
• Outbound calls to OCI services
CLOUDGUARD USAGE
• Uses resiliency4j 1.3.1
• Outbound calls to OCI services
• Use default/cached values for failed service calls
CLOUDGUARD USAGE
• Uses resiliency4j 1.3.1
• Outbound calls to OCI services
• Use default/cached values for failed service calls
• Capturing Retry-metrics using micrometer
DEFAULT VALUES
Supplier<Map<String, Set<MetadataSignalKey>>> supplier =
CircuitBreaker.decorateSupplier(circuitBreaker, this::getTenantCompartmentMap);
tenantMetadataMap = Try.ofSupplier(supplier)
.recover(throwable -> {
log.warn("caught exception, using default value ", throwable);
return tenantMetadataMap;
}
).get();
return tenantMetadataMap;
METRICS
CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(circuitBreakerConfig);
circuitBreaker = registry.circuitBreaker(”identityService");
TaggedCircuitBreakerMetrics
.ofCircuitBreakerRegistry(registry)
.bindTo(meterRegistry);
FUTURE WORK
• Fault simulation using http-callgraph analyzer
• http-callgraph-analyzer is a java agent that injects custom code in OCI-JAVA-SDK
• Instrumenting the HTTP client to simulate network latencies and partitions
• For more info, please see - callgraph-analyzer-readme
PEGASUS INTEGRATION
• OOTB Circuit Breaker integration with OCI-JAVA-SDK from version 1.15.4
• Uses resilience4j V1.2.0
• Collaborating with Pegasus team to integrate resilience4j/micrometer metrics
USEFUL LINKS
• Callgraph Analyzer -
https://bitbucket.oci.oraclecorp.com/projects/SECCEN/repos/streaming-apps/browse/http-
callgraph-analyzer/README.md
• Pegasus library - https://
bitbucket.oci.oraclecorp.com/projects/PEG/repos/service-framework-library/browse/sfw-circ
uitbreaker
• Resilience4j github - https://github.com/resilience4j/resilience4j
• Micrometer - https://micrometer.io/
• Scala - https://doc.akka.io/docs/akka/current/common/circuitbreaker.html
• Python - https://pypi.org/project/circuitbreaker/