首页 > 解决方案 > How to manage Java Spring applications autoscaling in Kubernetes PROPERLY?

问题描述

I'm trying to set up autoscaling in Kubernetes (hosted in Google Kubernetes Engine) for my Java Spring application. I have faced two problems:

  1. Spring application uses a lot of cpu at the start (something like 250mCPU*, but sometimes it is even 500mCPU) which really breaks autoscaling, because some instances of that application, after more or less than 1 minute (Spring context start etc.), use only 50mCPU. Because at some environments that aplication uses small amount of mCPU (and almost at every environment at night), I would like to set requested cpu=200mCPU max (=80% limit cpu) (or even less!). So then autoscaling would have much more sense. But I can't really do that, because of that heavy start of Spring, which won't be finished if i give him too less cpu.

  2. When application starts receiving traffic (when new pod is created because of autoscaling event) at the beginning its cpu usage can jump to something like 200% of standard usage, and then go back to that 100% - it doesn't look like it's because of too many request are being pushed to that new pod, it looks more like JVM is just slower at the start and he receives too much traffic at the begging. It looks like JVM would need something like warm up (so don't push 1/n of traffic to new pod suddenly, but switch traffic to that new pod slower). Thanks to that behaviour autoscaling sometimes get crazy - when it really needs just one pod more, it can scale up a lot of them, and then scale down...

* in GKE 1000mCPU = 1 core

On uploaded images we can see cpu charts. In the first, we can see that cpu usage after start is much smaller than at the beginning. In the second, we can spot both problems: high cpu usage at the start, then grace period (readiness probe initial* delay hasn't finished), and then high pick at the beginning of receiving traffic.

* I have set readiness probe initial delay to be longer than context loading.

Chart 1 Chart 2

The only thing that I've found in the internet is to add container to that pod, which will do nothing but "sleep x", and then die. And add set to that container requested mCPU to amount which will be used at spring app startup (then I would have to increase cpu limit for that spring app container, but it shouldn't harm anyway, because autoscaling should prevent spring app from starving other apps in the node).

I would really appreciate any advice.

标签: javaspringkubernetesjvmautoscaling

解决方案


的确,Spring 应用程序并不是最适合容器的东西,但您可以尝试以下几种方法:

  1. 在启动时,Spring 自动装配 bean 并执行依赖注入,在内存中创建对象等。所有这些都是 CPU 密集型的。如果你为你的 pod 分配更少的 CPU,它会在逻辑上增加启动时间。您可以在这里做的事情是:
  • 使用 astartupProbe并给您的应用程序启动时间。这里很好地解释了如何计算延迟和阈值

  • 调整部署策略中的maxSurgemaxUnavailable,使其最适合您的情况(例如,也许您有 10 个副本并且最大激增/最大不可用为 10%,因此您的 pod 将一个一个地缓慢推出)。这将有助于减少整个应用程序副本的流量峰值(文档在此处)。

  • 如果您的用例允许,您可以考虑延迟加载 Spring 应用程序,这意味着它不会在启动时创建所有对象,而是会等到它们被使用。由于在某些情况下可能无法在启动时发现问题,这可能有些危险。

  1. 如果您在部署中启用了 HPA + 定义replicas的值,您可能会在部署时遇到问题,我找不到相关的 GH 问题 ATM,但您可能希望在那里运行一些测试以了解其行为方式(超出应有的扩展范围等) )。您可以在这里做的事情是:
  • 调整自动缩放阈值和时间(默认为 3 分钟,afaik)以允许您的部署顺利推出而不会触发自动缩放。

  • 编写自定义自动缩放指标,而不是按 CPU 进行缩放。这需要一些工作,但可能会永久解决您的扩展问题(相关文档)。

最后,你对边车的建议看起来像一个黑客:) 虽然没有尝试过,所以不能真正说出利弊。

不幸的是,Spring Boot(或 Java)+ K8s 没有灵丹妙药,但情况比几年前要好。如果我找到一些有用的资源。我会回来并在这里链接它们。

希望以上有所帮助。

干杯


推荐阅读