首页 > 解决方案 > Envoy Circuit Breaking Non Deterministic Behaviour

问题描述

Our experiments with envoy's circuit breaking revealed that the results were not deterministic. This was demonstrated by our attempts to intentionally trip circuits using a setup like so:

enter image description here

The service is a simple web server that returns a 200 with a 2-second time delay (the time delay ensures the server remains busy between asynchronous requests). A snapshot of our envoy sidecar's config shows that we enable circuit breaking (over http/1.1) with a maximum of 1 connection and 1 pending request:

circuit_breakers:
   thresholds:
     - priority: "DEFAULT"
       max_connections: 1
       max_pending_requests: 1

Next, we tested this worked by sending out single requests to the service, to which it reliably responds with 200's as expected.

However, if we now send 2 asynchronous requests to the service we see unexpected results. It sometimes returns 200 for both requests which it shouldn't be able to since the second request should trip the circuit breaker. On other occasions, one request returns a 200, and the other returns a 503 Service Unavailable which is what we expect to happen. Despite our best efforts, we were unable to achieve any kind of repeatability, leading us to think it has to do with envoy's underlying concurrency.

When we changed max_connections and max_pending_requests to larger numbers (>100) and again sent too many requests in an attempt to trip the circuit, we found this inconsistency remained. The number of permitted requests was approximately correct but was sometimes off by a few.

We are hoping to understand the reasoning for this lack of absolute determinism. Any help is much appreciated! See repo for code

EDIT: There is an issue detailing similar unexpected behavior but I am no closer to finding a soln.

I have included the logs of two requests to demonstrate the output:

  1. Sending 3 simultaneous requests, 1 makes it through.
❯ (printf '%s\n' {1..3}) | xargs -I % -P 20 curl -v "http://localhost:3000?status=200&sleep=2"
**    Trying ::1...
  Trying ::1...
**  TCP_NODELAY set
TCP_NODELAY set
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3000 (#0)
* Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
>>  GET /?status=200&sleep=2 HTTP/1.1
Host: localhost:3000
>>  Host: localhost:3000
User-Agent: curl/7.64.1
>>  User-Agent: curl/7.64.1
Accept: */*
>>  Accept: */*

>
* Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 81
< content-type: text/plain
< x-envoy-overloaded: true
< date: Wed, 12 Feb 2020 03:36:29 GMT
< server: envoy
<
* Connection #0 to host localhost left intact
upstream connect error or disconnect/reset before headers. reset reason: overflow* Closing connection 0
< HTTP/1.1 503 Service Unavailable
< content-length: 81
< content-type: text/plain
< x-envoy-overloaded: true
< date: Wed, 12 Feb 2020 03:36:29 GMT
< server: envoy
<
* Connection #0 to host localhost left intact
upstream connect error or disconnect/reset before headers. reset reason: overflow* Closing connection 0
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:36:31 GMT
< x-envoy-upstream-service-time: 2007
<
* Connection #0 to host localhost left intact
200* Closing connection 0
  1. Sending 3 simultaneous requests, all of them return 200.
❯ (printf '%s\n' {1..3}) | xargs -I % -P 20 curl -v "http://localhost:3000?status=200&sleep=2"
**    Trying ::1...
  Trying ::1...
**  TCP_NODELAY set
TCP_NODELAY set
* *  Trying ::1...
 *Connected to localhost (::1) port 3000 (#0)
*  TCP_NODELAY set
Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
> >Host: localhost:3000
 >GET /?status=200&sleep=2 HTTP/1.1
 User-Agent: curl/7.64.1
>>  Accept: */*
Host: localhost:3000
> >
 User-Agent: curl/7.64.1
> Accept: */*
>
* Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:40:50 GMT
< x-envoy-upstream-service-time: 2006
<
* Connection #0 to host localhost left intact
200* Closing connection 0
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:40:52 GMT
< x-envoy-upstream-service-time: 4011
<
* Connection #0 to host localhost left intact
200* Closing connection 0
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:40:54 GMT
< x-envoy-upstream-service-time: 6015
<
* Connection #0 to host localhost left intact
200* Closing connection 0

标签: networkingconcurrencyenvoyproxycircuit-breaker

解决方案


From one of the contributors on here:

The circuit breakers are intended to prevent too much load from propagating through the system, not enforce a strict limit. The system is implemented in a way that is simpler and more performant, but can slightly exceed the limits in some cases. Here's a comment from the implementation of the circuit breaker limit tracking


推荐阅读