首页 > 解决方案 > Google App Engine 随机自动重启

问题描述

我们在 GAE 上运行节点服务器,由于某种原因,我们的服务器每天有几次离线(有时可能需要几分钟才能恢复在线)。

请求全天都是相同的,也没有例外会导致重新启动。请求或任何可能导致它的特殊请求没有峰值。

发生时记录:

2020-04-18T23:48:51.881806Z [GET /v1/util/example [36m304 [35.262 ms - -[ A 
2020-04-18T23:50:17.119906Z [start] 2020/04/18 23:50:17.119185 Quitting on terminated signal A 
2020-04-18T23:50:17.175632Z [start] 2020/04/18 23:50:17.175267 Start program failed: user application failed with exit code -1 (refer to stdout/stderr logs for more detail): signal: terminated 
2020-04-18T23:51:38.772388Z GET 304 173 B 3.3 s Example-V2/3.1.13 (com.example.app; build:1; iOS 13.4.0) Alamofire/5.1.0 /v1/util/example GET 304 173 B 3.3 s Example-V2/3.1.13 (com.example.app; build:1; iOS 13.4.0) Alamofire/5.1.0 5e9b928a00ff0bc9244f94194c0001737e737065616b2d76322d32613166310001737065616b2d6170693a323032303034303374303630343431000100
2020-04-18T23:51:38.786760Z GET 404 324 B 2.4 s Unknown /_ah/start GET 404 324 B 2.4 s Unknown 5e9b928a00ff0c014898f5c27f0001737e737065616b2d76322d32613166310001737065616b2d6170693a323032303034303374303630343431000100
2020-04-18T23:51:39.529080Z [start] 2020/04/18 23:51:39.511828 No entrypoint specified, using default entrypoint: /serve 
2020-04-18T23:51:39.529642Z [start] 2020/04/18 23:51:39.528742 Starting app 
2020-04-18T23:51:39.529968Z [start] 2020/04/18 23:51:39.529100 Executing: /bin/sh -c exec /serve 
2020-04-18T23:51:39.590085Z [start] 2020/04/18 23:51:39.589751 Waiting for network connection open. Subject:"app/invalid" Address:127.0.0.1:8080 
2020-04-18T23:51:39.590571Z [start] 2020/04/18 23:51:39.590347 Waiting for network connection open. Subject:"app/valid" Address:127.0.0.1:8081 
2020-04-18T23:51:39.764383Z [serve] 2020/04/18 23:51:39.763656 Serve started. 
2020-04-18T23:51:39.764935Z [serve] 2020/04/18 23:51:39.764544 Args: {runtimeName:nodejs10 memoryMB:1024 positional:[]} 
2020-04-18T23:51:39.766562Z [serve] 2020/04/18 23:51:39.765904 Running /bin/sh -c exec node server.js 
2020-04-18T23:51:41.072621Z [start] 2020/04/18 23:51:41.071895 Wait successful. Subject:"app/valid" Address:127.0.0.1:8081 Attempts:296 Elapsed:1.481194491s 
2020-04-18T23:51:41.072978Z Express server started on port: 8081 
2020-04-18T23:51:41.073008Z [start] 2020/04/18 23:51:41.072411 Starting nginx 
2020-04-18T23:51:41.085901Z [start] 2020/04/18 23:51:41.085451 Waiting for network connection open. Subject:"nginx" Address:127.0.0.1:8080 
2020-04-18T23:51:41.132064Z [start] 2020/04/18 23:51:41.131572 Wait successful. Subject:"nginx" Address:127.0.0.1:8080 Attempts:9 Elapsed:45.911234ms 
2020-04-18T23:51:41.170786Z [GET /_ah/start [33m404 [11.865 ms - 61[

总是有超过 70% 的可用内存,所以这不是问题。仅在重新启动时才注意到非常高的 CPU 利用率(比正常情况高 10 倍)。

在底部图片中,您可以清楚地看到重新启动的时间: CPU 利用率

这是我的app.yaml

runtime: nodejs10
instance_class: B4
service: example-api

basic_scaling:
  max_instances: 1
  idle_timeout: 30m

handlers:
  - url: .*
    secure: always
    script: auto

这发生在我们的生产服务器上,因此我们非常欢迎任何帮助。

谢谢!

标签: node.jsgoogle-app-engine

解决方案


阅读此文档时,提到即使他们尝试无限期地保持基本和手动扩展实例运行,它们有时会重新启动以进行维护,或者它们可能由于某些其他原因而失败。这就是为什么将最大实例数保持为 1 不被视为最佳实践的原因,因为它很容易出现所有这些故障。正如另一个答案中提到的,我还建议增加实例的数量,以降低更多失败或同时重新启动的可能性。


推荐阅读