networking - Envoy Circuit Breaking Non Deterministic Behaviour
问题描述
Our experiments with envoy's circuit breaking revealed that the results were not deterministic. This was demonstrated by our attempts to intentionally trip circuits using a setup like so:
The service is a simple web server that returns a 200
with a 2-second time delay (the time delay ensures the server remains busy between asynchronous requests). A snapshot of our envoy sidecar's config shows that we enable circuit breaking (over http/1.1) with a maximum of 1 connection and 1 pending request:
circuit_breakers:
thresholds:
- priority: "DEFAULT"
max_connections: 1
max_pending_requests: 1
Next, we tested this worked by sending out single requests to the service, to which it reliably responds with 200
's as expected.
However, if we now send 2 asynchronous requests to the service we see unexpected results. It sometimes returns 200
for both requests which it shouldn't be able to since the second request should trip the circuit breaker. On other occasions, one request returns a 200
, and the other returns a 503 Service Unavailable
which is what we expect to happen. Despite our best efforts, we were unable to achieve any kind of repeatability, leading us to think it has to do with envoy's underlying concurrency.
When we changed max_connections
and max_pending_requests
to larger numbers (>100) and again sent too many requests in an attempt to trip the circuit, we found this inconsistency remained. The number of permitted requests was approximately correct but was sometimes off by a few.
We are hoping to understand the reasoning for this lack of absolute determinism. Any help is much appreciated! See repo for code
EDIT: There is an issue detailing similar unexpected behavior but I am no closer to finding a soln.
I have included the logs of two requests to demonstrate the output:
- Sending 3 simultaneous requests, 1 makes it through.
❯ (printf '%s\n' {1..3}) | xargs -I % -P 20 curl -v "http://localhost:3000?status=200&sleep=2"
** Trying ::1...
Trying ::1...
** TCP_NODELAY set
TCP_NODELAY set
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3000 (#0)
* Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
>> GET /?status=200&sleep=2 HTTP/1.1
Host: localhost:3000
>> Host: localhost:3000
User-Agent: curl/7.64.1
>> User-Agent: curl/7.64.1
Accept: */*
>> Accept: */*
>
* Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 81
< content-type: text/plain
< x-envoy-overloaded: true
< date: Wed, 12 Feb 2020 03:36:29 GMT
< server: envoy
<
* Connection #0 to host localhost left intact
upstream connect error or disconnect/reset before headers. reset reason: overflow* Closing connection 0
< HTTP/1.1 503 Service Unavailable
< content-length: 81
< content-type: text/plain
< x-envoy-overloaded: true
< date: Wed, 12 Feb 2020 03:36:29 GMT
< server: envoy
<
* Connection #0 to host localhost left intact
upstream connect error or disconnect/reset before headers. reset reason: overflow* Closing connection 0
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:36:31 GMT
< x-envoy-upstream-service-time: 2007
<
* Connection #0 to host localhost left intact
200* Closing connection 0
- Sending 3 simultaneous requests, all of them return 200.
❯ (printf '%s\n' {1..3}) | xargs -I % -P 20 curl -v "http://localhost:3000?status=200&sleep=2"
** Trying ::1...
Trying ::1...
** TCP_NODELAY set
TCP_NODELAY set
* * Trying ::1...
*Connected to localhost (::1) port 3000 (#0)
* TCP_NODELAY set
Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
> >Host: localhost:3000
>GET /?status=200&sleep=2 HTTP/1.1
User-Agent: curl/7.64.1
>> Accept: */*
Host: localhost:3000
> >
User-Agent: curl/7.64.1
> Accept: */*
>
* Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:40:50 GMT
< x-envoy-upstream-service-time: 2006
<
* Connection #0 to host localhost left intact
200* Closing connection 0
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:40:52 GMT
< x-envoy-upstream-service-time: 4011
<
* Connection #0 to host localhost left intact
200* Closing connection 0
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:40:54 GMT
< x-envoy-upstream-service-time: 6015
<
* Connection #0 to host localhost left intact
200* Closing connection 0
解决方案
From one of the contributors on here:
The circuit breakers are intended to prevent too much load from propagating through the system, not enforce a strict limit. The system is implemented in a way that is simpler and more performant, but can slightly exceed the limits in some cases. Here's a comment from the implementation of the circuit breaker limit tracking
推荐阅读
- r - 在 R 中加速拆分和合并数据帧行
- javascript - 如何在映射和检查条件后将新对象添加到数组中
- oracle - oracle 触发器精确提取返回超过请求的行数
- .net - 如何为字符串 [] 分配起始索引值
- android - 您可以在 android/ios 应用程序中嵌入统一应用程序吗?
- javascript - 控制加载和播放动画 gif 的时间
- python - 检查日期时间列是否为空
- laravel - Gravatar 错误:请指定有效的电子邮件地址
- python - 错误:命令出错,退出状态为 1 | python安装包
- ansible - 如何为委派给本地主机的任务指定成为密码