java - 为什么流量较高时vmop(Java GC选项)会增加
问题描述
我有一个使用 Java 的在线服务。如果qps在2000左右,延迟是正常的,但是如果qps增加到3000,延迟会高很多。
有没有关于改进的建议,尤其是关于 STW 时间的建议?
代码逻辑很简单(作为访问服务):
- 解码 Tcp 请求和 Protobuf(每个请求大约 200KB)
- 拆分请求并重新包装 protobuf
- 路由到后向服务并异步等待响应
- 处理来自后向服务的响应,用另一个 protobuf 包装并响应客户端。
金佛:
-Djava.net.preferIPv4Stack=true -Xmx6g -Xms6g -Xmn4g
-Djava.ext.dirs=./lib:/data/home/soft/jdk1.8.0_65/jre/lib/ext
-Dlog4j.configuration=./conf/log4j.properties -Dio.netty.noPreferDirect=false
-XX:MaxDirectMemorySize=4g -XX:+UseConcMarkSweepGC -XX:ParallelCMSThreads=8
-XX:+CMSScavengeBeforeRemark
-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1
-XX:OnOutOfMemoryError=./bin/stop.sh -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=. -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCDetails
-Xloggc:logs/gc.log -XX:+CMSParallelRemarkEnabled
-XX:ParallelGCThreads=32
我发现较高的 STW 是由 vmop 引起的。并且在测试过程中没有发生 FULL GC。
998.860: GenCollectForAllocation [ 253 2 2 ] [ 0 0 0 0 6 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
999.578: GenCollectForAllocation [ 253 2 2 ] [ 0 0 0 0 16 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
1000.273: GenCollectForAllocation [ 253 3 3 ] [ 0 0 0 0 11 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
1000.970: GenCollectForAllocation [ 253 2 2 ] [ 0 0 0 0 8 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
1001.666: GenCollectForAllocation [ 253 2 2 ] [ 0 0 0 0 7 ] 0
- 当qps在2000左右时,vmop一般低于10ms。
Total time for which application threads were stopped
小于 0.010 秒。 - Qps约为3000,vmop高于10ms,
我的问题是:
- 有没有关于性能改进的建议,尤其是关于 STW(vmop) 时间的建议?
- 什么导致高vmop?到达安全点还是复制记忆?
更多测试数据:后向服务保持qps=3000,我启动一/两个Java服务进行比较,得到性能差距。后向服务的延迟约为 6.8ms。一个Java进程的GC日志(每个进程qps=3000):
2019-05-16T14:54:56.742+0800: 67083.398: Total time for which application threads were stopped: 0.0104372 seconds, Stopping threads took: 0.0001939 seconds
2019-05-16T14:54:57.788+0800: 67084.444: [GC (Allocation Failure) 2019-05-16T14:54:57.788+0800: 67084.444: [ParNew: 5054250K->22742K(5662336K), 0.0109608 secs] 5776050K->744839K(7759488K), 0.0111268 secs] [Times: user=0.33 sys=0.01, real=0.02 secs]
2019-05-16T14:54:57.799+0800: 67084.455: Total time for which application threads were stopped: 0.0122376 seconds, Stopping threads took: 0.0001943 seconds
2019-05-16T14:54:58.848+0800: 67085.504: [GC (Allocation Failure) 2019-05-16T14:54:58.848+0800: 67085.504: [ParNew: 5055958K->26545K(5662336K), 0.0108330 secs] 5778055K->748938K(7759488K), 0.0109914 secs] [Times: user=0.34 sys=0.00, real=0.02 secs]
2019-05-16T14:54:58.859+0800: 67085.515: Total time for which application threads were stopped: 0.0121166 seconds, Stopping threads took: 0.0002086 seconds
2019-05-16T14:54:59.905+0800: 67086.561: [GC (Allocation Failure) 2019-05-16T14:54:59.905+0800: 67086.561: [ParNew: 5059761K->25093K(5662336K), 0.0115089 secs] 5782154K->747784K(7759488K), 0.0116708 secs] [Times: user=0.36 sys=0.00, real=0.01 secs]
2019-05-16T14:54:59.917+0800: 67086.573: Total time for which application threads were stopped: 0.0127589 seconds, Stopping threads took: 0.0001810 seconds
2019-05-16T14:55:00.957+0800: 67087.613: [GC (Allocation Failure) 2019-05-16T14:55:00.957+0800: 67087.613: [ParNew: 5058309K->27559K(5662336K), 0.0043041 secs] 5781000K->750551K(7759488K), 0.0044697 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
2019-05-16T14:55:00.962+0800: 67087.618: Total time for which application threads were stopped: 0.0055974 seconds, Stopping threads took: 0.0001974 seconds
2019-05-16T14:55:02.004+0800: 67088.660: [GC (Allocation Failure) 2019-05-16T14:55:02.004+0800: 67088.660: [ParNew: 5060775K->18771K(5662336K), 0.0032344 secs] 5783767K->742062K(7759488K), 0.0033973 secs] [Times: user=0.08 sys=0.00, real=0.00 secs]
两个Java进程的GC日志(每个进程qps=1500):
2019-05-16T14:49:19.774+0800: 66746.430: Total time for which application threads were stopped: 0.0075090 seconds, Stopping threads took: 0.0001816 seconds
2019-05-16T14:49:21.879+0800: 66748.535: [GC (Allocation Failure) 2019-05-16T14:49:21.879+0800: 66748.535: [ParNew: 5047304K->15661K(5662336K), 0.0070949 secs] 5601138K->569792K(7759488K), 0.0072536 secs] [Times: user=0.19 sys=0.00, real=0.00 secs]
2019-05-16T14:49:21.886+0800: 66748.542: Total time for which application threads were stopped: 0.0084102 seconds, Stopping threads took: 0.0001858 seconds
2019-05-16T14:49:23.977+0800: 66750.633: [GC (Allocation Failure) 2019-05-16T14:49:23.977+0800: 66750.633: [ParNew: 5048877K->10672K(5662336K), 0.0033657 secs] 5603008K->565100K(7759488K), 0.0035116 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]
2019-05-16T14:49:23.981+0800: 66750.637: Total time for which application threads were stopped: 0.0045580 seconds, Stopping threads took: 0.0001763 seconds
2019-05-16T14:49:26.090+0800: 66752.746: [GC (Allocation Failure) 2019-05-16T14:49:26.090+0800: 66752.746: [ParNew2019-05-16T14:49:23.981+0800: 66750.637: Total time for which application threads were stopped: 0.0045580 seconds, Stopping threads took: 0.0001763 seconds
2019-05-16T14:49:26.090+0800: 66752.746: [GC (Allocation Failure) 2019-05-16T14:49:26.090+0800: 66752.746: [ParNew: 5043888K->9061K(5662336K), 0.0036848 secs] 5598316K->563786K(7759488K), 0.0038302 secs] [Times: user=0.08 sys=0.00, real=0.00 secs]
解决方案
你有你的因果关系,它不是由vmop 引起的。vmop 是垃圾收集器请求的安全点,在该安全点下它可以执行停止世界任务。
如果要分析 GC 行为,则需要先查看 GC 日志,然后再查看安全点。
推荐阅读
- swiftui - 如何使用 JSON 在 SwiftUI 中对列表进行排序
- python - 使用python在mysql查询结果上添加字符串的问题
- google-cloud-data-fusion - 通过覆盖设置 BigQuery 到 Google Cloud Storage 管道
- cmd - 是否可以在指向文件的文本块中转义尖括号?
- grafana - 有没有办法按时间以外的字段值对 influxDb 进行排序?
- python - ValueError:缓冲区大小必须是元素大小的倍数
- c - if 条件下发生了什么?
- php - 我正在尝试使用图像制作自定义进度条,但我不确定从哪里开始
- python - Python - 无法解析我拥有的列表数据,如果它在该列表数据中找到特定字母,则删除变量
- php - Symfony - 能够加载配置的扩展