首页 > 解决方案 > docker 容器中的 dotnet 进程故障排除

问题描述

我们正在构建一个基于 dotnet core 和 docker 的微服务平台。我们使用在 linux 主机上运行 linux 容器的 ECS 在 AWS 中托管它。

我遇到了一个问题,在负载测试完成后,主机上的 dotnet 进程卡在 100% cpu 上,但没有接收到流量。我一直在尝试解决与此相关的一些性能问题,我已经做了以下事情:

  1. 使 HttpClient 成为单例,以便它可以重用连接
  2. 我的容器的内存大小从 128mb 增加到 256mb(dotnet 容器希望增长到大于 128)

这些更新有所帮助,但我仍然看到主机上运行的 dotnet 进程出现奇怪的行为。在本地不会发生此问题,我可以运行负载测试,并且在测试运行时 cpu 很高,但一旦完成,它们就会恢复。在 EC2 主机上,进程在几分钟后显示为 100%。

有没有人经历过这样的事情,或者对如何解决这个问题有任何想法?我曾尝试查看主机上的进程信息,但看不到太多。

这是负载测试完成但服务器处于 100% cpu 后我的机器的外观示例:

主机上的顶部和 docker 统计信息

------------ 编辑 2018-10-01 ------------

我使用设置为调试的 dotnet 日志记录级别运行负载测试,结果如下:

    TIME 18:45:45 - Last requests goes through

dbug: Microsoft.AspNetCore.Server.Kestrel[9]
Connection id "0HLH7PMRMMAFR" completed keep alive response.
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[2]
Request finished in 25.9946ms 200 application/json; charset=utf-8
Date=2018-10-01T18:45:45&Service=user&RequestTime=157&PortalId=56&Path=/user/56/v1/user&Method=POST&Action=POST user/{portalid}/v1/user&IPAddress=_IP_&ApiKey=__Key&ResponseCode=200&RequestBody=_BodyData_&Response=_responseData_&ContainerId=f7e4bf541a31&RequestId=
dbug: Microsoft.AspNetCore.Server.Kestrel[9]
Connection id "0HLH7PMRMMAFP" completed keep alive response.
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[2]
Request finished in 160.5087ms 200 application/json; charset=utf-8

    TIME 18:46:01 - 18:47:45  See some HealthCheck requests come in

dbug: Microsoft.AspNetCore.Server.Kestrel[1]
Connection id "0HLH7PMRMMAFV" started.
dbug: Microsoft.AspNetCore.Server.Kestrel[1]
Connection id "0HLH7PMRMMAG0" started.
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[1]
Request starting HTTP/1.1 GET http://10.0.1.73:32800/apigateway/0/v1/info
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[2]
Request finished in 0.0899ms 200 application/json
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAG0" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAG0" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAG0" stopped.
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[1]
Request starting HTTP/1.1 GET http://10.0.1.73:32800/apigateway/0/v1/info
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[2]
Request finished in 0.056ms 200 application/json
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFV" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFV" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFV" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[1]

    TIME: 18:47:45 - 18:47:47 - Connections finally are closed?

dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAGA" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFN" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFM" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFM" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFN" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFM" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFN" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFK" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFL" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFU" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFK" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFU" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFL" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFK" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFU" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFL" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFS" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFS" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFS" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFO" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFP" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFR" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFQ" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFT" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFP" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFO" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFR" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFQ" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFT" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFP" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFO" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFR" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFQ" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFT" stopped.

在 18:47:47,cpu 终于回落了。看起来问题在于 Kestrel 将连接保持活动状态两分钟,而当它这样做时,CPU 已被最大化。

我应该如何解决?我可以考虑不尊重Keep-Alive响应的标题:Connection id "0HLH7PMRMMAFR" completed keep alive response.但是 Kestrel 不应该继续重用这个连接而不是创建一个新连接吗?

我想我不能在本地重现这个,因为它必须是 AWS ALB 插入了一个保持活动的标头?

标签: docker.net-coreamazon-ecsasp.net-core-2.1

解决方案


我想我找到了问题!

https://github.com/aspnet/KestrelHttpServer/issues/2694

更新到 2.1.4 并且它消失了。请记住,如果您使用的是最新版本的框架,请始终检查是否有任何新的更新和错误修复 :)


推荐阅读