ignite - Apache ignite 2.7.5 的间歇性性能问题
问题描述
我们面临着 ignite 的间歇性性能问题,其中响应时间变得非常高,并且我们在日志中看到以下错误。我们有 10 个索引列,我没有看到索引有任何问题,因为“where”子句中的所有列都被索引了。联接发生在具有亲和共置的字段上,这意味着联接仅发生在特定节点中的数据上,而不是跨节点发生。
[21:48:30,765][WARNING][jvm-pause-detector-worker][IgniteKernal%PincodeGrid] Possible too long JVM pause: 4939 milliseconds.
[21:48:30,783][WARNING][query-#120%PincodeGrid%][IgniteH2Indexing] Query execution is too long [time=5052 ms, sql='SELECT
如果您能提供任何帮助,请告诉我。
Apache Ignite 版本:2.7.5
点燃持久性已启用 (true)
分区模式下的 2 节点集群
RAM - 每个节点 150 GB
JVM xms 和 xmx 20G
记录数 - 1.6 亿
JVM 选项 -
/usr/java/jdk1.8.0_144/bin/java -XX:+AggressiveOpts -server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError - XX:HeapDumpPath=/etappdata/ignite/logs/PROD/etail-prod-ignite76-163/logs -XX:+ExitOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX: NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Xloggc:/etappdata/ignite/logs/PROD/etail-prod-ignite76-163/gc.log -XX:+PrintAdaptiveSizePolicy -XX:+UseTLAB -verbose:gc -XX:+ ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Addresses=true -Djava.net.preferIPv6Stack=false -Djava.net.preferIPv6Addresses=false -Dcom.sun.management。 jmxremote -Dcom.sun.management.jmxremote.port=8996 -Dcom.sun.management。jmxremote.rmi.port=8996 -Dcom.sun.management.jmxremote.ssl=假-Dcom.sun.management.jmxremote.authenticate=假-Dcom.sun.management.jmxremote.local.only=假-Djava.rmi。 server.hostname=etail-prod-ignite76-163 -XX:MaxDirectMemorySize=4g -javaagent:/tmp/apminsight-javaagent-prod/apminsight-javaagent.jar -Dfile.encoding=UTF-8 -XX:+UseG1GC -DIGNITE_QUIET=错误 -DIGNITE_SUCCESS_FILE=/ignite/apache-ignite-2.7.5-bin/work/ignite_success_7d9ec20d-9728-475a-aa80-4355eb8eaf02 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=49112 -Dcom .sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -DIGNITE_HOME=/ignite/apache-ignite-2.7.5-bin -DIGNITE_PROG_NAME=./bin/ignite.sh -cp /点燃/apache-ignite-2.7.5-bin/libs/authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=etail-prod-ignite76-163 -XX:MaxDirectMemorySize=4g -javaagent:/tmp/apminsight-javaagent-prod /apminsight-javaagent.jar -Dfile.encoding=UTF-8 -XX:+UseG1GC -DIGNITE_QUIET=false -DIGNITE_SUCCESS_FILE=/ignite/apache-ignite-2.7.5-bin/work/ignite_success_7d9ec20d-9728-475a-aa80-4355eb8eaf02 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=49112 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -DIGNITE_HOME=/ignite/ apache-ignite-2.7.5-bin -DIGNITE_PROG_NAME=./bin/ignite.sh -cp /ignite/apache-ignite-2.7.5-bin/libs/authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=etail-prod-ignite76-163 -XX:MaxDirectMemorySize=4g -javaagent:/tmp/apminsight-javaagent-prod /apminsight-javaagent.jar -Dfile.encoding=UTF-8 -XX:+UseG1GC -DIGNITE_QUIET=false -DIGNITE_SUCCESS_FILE=/ignite/apache-ignite-2.7.5-bin/work/ignite_success_7d9ec20d-9728-475a-aa80-4355eb8eaf02 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=49112 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -DIGNITE_HOME=/ignite/ apache-ignite-2.7.5-bin -DIGNITE_PROG_NAME=./bin/ignite.sh -cp /ignite/apache-ignite-2.7.5-bin/libs//tmp/apminsight-javaagent-prod/apminsight-javaagent.jar -Dfile.encoding=UTF-8 -XX:+UseG1GC -DIGNITE_QUIET=false -DIGNITE_SUCCESS_FILE=/ignite/apache-ignite-2.7.5-bin/work/ignite_success_7d9ec20d -9728-475a-aa80-4355eb8eaf02 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=49112 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl =false -DIGNITE_HOME=/ignite/apache-ignite-2.7.5-bin -DIGNITE_PROG_NAME=./bin/ignite.sh -cp /ignite/apache-ignite-2.7.5-bin/libs//tmp/apminsight-javaagent-prod/apminsight-javaagent.jar -Dfile.encoding=UTF-8 -XX:+UseG1GC -DIGNITE_QUIET=false -DIGNITE_SUCCESS_FILE=/ignite/apache-ignite-2.7.5-bin/work/ignite_success_7d9ec20d -9728-475a-aa80-4355eb8eaf02 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=49112 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl =false -DIGNITE_HOME=/ignite/apache-ignite-2.7.5-bin -DIGNITE_PROG_NAME=./bin/ignite.sh -cp /ignite/apache-ignite-2.7.5-bin/libs/ssl=false -DIGNITE_HOME=/ignite/apache-ignite-2.7.5-bin -DIGNITE_PROG_NAME=./bin/ignite.sh -cp /ignite/apache-ignite-2.7.5-bin/libs/ssl=false -DIGNITE_HOME=/ignite/apache-ignite-2.7.5-bin -DIGNITE_PROG_NAME=./bin/ignite.sh -cp /ignite/apache-ignite-2.7.5-bin/libs/:/ignite/apache-ignite-2.7.5-bin/libs/ignite-indexing/ :/ignite/apache-ignite-2.7.5-bin/libs/ignite-spring/ :/ignite/apache-ignite-2.7。 5-bin/libs/licenses/ org.apache.ignite.startup.cmdline.CommandLineStartup config/config-cache.xml
添加了其他详细信息
[04:03:48,251][INFO][main][IgniteKernal%PincodeGrid] IgniteConfiguration [igniteInstanceName=PincodeGrid, pubPoolSize=30, svcPoolSize=30, callbackPoolSize=30, stripedPoolSize=30, sysPoolSize=30, mgmtPoolSize=4, igfsPoolSize= 30,dataStreamerPoolSize=30,utilityCachePoolSize=30,utilityCacheKeepAliveTime=60000,p2pPoolSize=2,qryPoolSize=30,igniteHome=/ignite/apache-ignite-2.7.5-bin,igniteWorkDir=/ignite/apache-ignite-2.7.5- bin/work, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@13221655, nodeId=6aee7bb4-2804-4396-a9ec-65abdc9483e3, marsh=BinaryMarshaller [], marshLocJobs=false, daemon=false, p2pEnabled=true, netTimeout= 5000,sndRetryDelay=1000,sndRetryCnt=3,metricsHistSize=10000,metricsUpdateFreq=2000,metricsExpTime=9223372036854775807,discoSpi=TcpDiscoverySpi [addrRslvr=null,sockTimeout=0,ackTimeout=0,marsh=null,reconCnt=10,reconDelay=2000,maxAckTimeout=600000,forceSrvMode=false,clientReconnectDisabled=false,internalLsnr=null],segPlc=STOP,segResolveAttempts=2,waitForSegOnStart=true,allResolversPassReq=true,segChkFreq=10000,commSpi=TcpCommunicationSpi [connectGate=null,connPlc =org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy@34123d65, enableForcibleNodeKill=false, enableTroubleshootingLog=false, locAddr=null, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1, directBuf= true, directSndBuf=false, idleConnTimeout=600000, connTimeout=5000, maxConnTimeout=600000, reconCnt=10, sockSndBuf=32768, sockRcvBuf=65536, msgQueueLimit=2048, slowClientQueueLimit=0, nioSrvr=null, shmemSrv=null, usePairedConnections=true, connectionsPerNode=10,tcpNoDelay=true,filterReachableAddresses=false,ackSndThreshold=32,unackedMsgsBufSize=0,sockWriteTimeout=10000,boundTcpPort=-1,boundTcpShmemPort=-1,selectorsCnt=15,selectorSpins=0,addrRslvr=null,ctxInitLatch=java.util.concurrent.CountDownLatch@59474f18[Count = 1],stopping=false ], evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@65fb9ffc, colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [], indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@3590fc5b, addrRslvr=空,encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi@397fbdb,clientMode=false,rebalanceThreadPoolSize=16,txCfg=TransactionConfiguration [txSerEnabled=false,dfltIsolation=REPEATABLE_READ,dfltConcurrency=PESSIMISTIC,dfltTxTimeout=0,txTimeoutOnPartitionMapExchange= 0, pessimisticTxLogSize=0, pessimisticTxLogLinger=10000, tmLookupClsName=null, txManagerFactory=null, useJtaSync=false],cacheSanityCheckEnabled=true,discoStartupDelay=60000,deployMode=SHARED,p2pMissedCacheSize=100,locHost=null,timeSrvPortBase=31100,timeSrvPortRange=100,failureDetectionTimeout=80000,sysWorkerBlockedTimeout=30000,clientFailureDetectionTimeout=120000,metricsLogFreq=60000fg=Cf=C,连接器ConnectorConfiguration [jettyPath=null, host=null, port=11211, noDelay=true, directBuf=false, sndBufSize=32768, rcvBufSize=32768, idleQryCurTimeout=600000, idleQryCurCheckFreq=60000, sndQueueLimit=0, selectorCnt=4, idleTimeout=7000, sslEnabled=false,sslClientAuth=false,sslCtxFactory=null,sslFactory=null,portRange=100,threadPoolSize=30,msgInterceptor=null],odbcCfg=null,warmupClos=null,atomicCfg=AtomicConfiguration [seqReserveSize=1000,cacheMode=PARTITIONED,备份=1,aff=null,grpName=null],classLdr=null,sslCtxFactory=null,platformCfg=null, binaryCfg=null, memCfg=null, pstCfg=null, dsCfg=DataStorageConfiguration [sysRegionInitSize=41943040, sysRegionMaxSize=104857600, pageSize=4096, concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=Default_Region, maxSize=12884901880, in 112742891520,swapPath=null,pageEvictionMode=DISABLED,evictionThreshold=0.9,emptyPagesPoolSize=100,metricsEnabled=false,metricsSubIntervalCount=5,metricsRateTimeInterval=60000,persistenceEnabled=true,checkpointPageBufSize=4294967296],dataRegions=null,storagePath=/ignite/persistence, checkpointFreq=180000,lockWaitTime=10000,checkpointThreads=8,checkpointWriteOrder=SEQUENTIAL,walHistSize=20,maxWalArchiveSize=1073741824,walSegments=10,walSegmentSize=1073741824,walPath=/wal/pincode,walArchivePath=/wal/pincode=archive,metrics假,walMode=背景,walTlbSize=131072,walBuffSize=0,walFlushFreq=2000,walFsyncDelay=1000,walRecordIterBuffSize=67108864,alwaysWriteFullPages=false,fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory@2d3379b4,metricsSubIntervalCnt=5, metricsRateTimeInterval=60000, walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=false, walCompactionEnabled=false, walCompactionLevel=1, checkpointReadLockTimeout=null], activeOnStart=true, autoActivation=true, longQryWarnTimeout=3000, sqlConnCfg=null, cliConnCfg=ClientConnectorConfiguration [host=null,端口=10800,portRange=100,sockSndBufSize=0,sockRcvBufSize=0,tcpNoDelay=true,maxOpenCursorsPerConn=128,threadPoolSize=30,idleTimeout=0,jdbcEnabled=true,odbcEnabled=true,thinCliEnabled=true,sslEnabled=false,useIgniteSslCtxFactory=真,sslClientAuth=假,sslCtxFactory=null],mvccVacuumThreadCnt=2,mvccVacuumFreq=5000,authEnabled=false,failureHnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],commFailureRslvr=null]
编辑 -2 - GC 日志
2020-12-01T22:49:31.729+0530: 15.630: [GC pause (Metadata GC Threshold) (young) (initial-mark) 15.630: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 0, predicted base time: 38.43 ms, remaining time: 161.57 ms, target pause time: 200.00 ms]
15.630: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 24 regions, survivors: 2 regions, predicted young region time: 356.58 ms]
15.630: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 24 regions, survivors: 2 regions, old: 0 regions, predicted pause time: 395.01 ms, target pause time: 200.00 ms]
15.657: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: concurrent cycle is about to start], 0.0274990 secs]
[Parallel Time: 15.8 ms, GC Workers: 21]
[GC Worker Start (ms): Min: 15630.2, Avg: 15630.5, Max: 15630.8, Diff: 0.7]
[Ext Root Scanning (ms): Min: 1.6, Avg: 3.4, Max: 11.4, Diff: 9.8, Sum: 71.8]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 1.2, Max: 12.6, Diff: 12.6, Sum: 24.2]
[Object Copy (ms): Min: 0.0, Avg: 9.5, Max: 12.0, Diff: 11.9, Sum: 199.9]
[Termination (ms): Min: 0.0, Avg: 0.7, Max: 0.8, Diff: 0.7, Sum: 14.8]
[Termination Attempts: Min: 1, Avg: 2.2, Max: 4, Diff: 3, Sum: 47]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.1, Sum: 1.0]
[GC Worker Total (ms): Min: 14.5, Avg: 14.8, Max: 15.2, Diff: 0.7, Sum: 311.8]
[GC Worker End (ms): Min: 15645.3, Avg: 15645.3, Max: 15645.4, Diff: 0.1]
[Code Root Fixup: 0.6 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.5 ms]
[Other: 10.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 8.3 ms]
[Ref Enq: 0.5 ms]
[Redirty Cards: 0.5 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.2 ms]
[Eden: 192.0M(1008.0M)->0.0B(984.0M) Survivors: 16.0M->40.0M Heap: 198.0M(20.0G)->33.7M(20.0G)]
[Times: user=0.31 sys=0.00, real=0.03 secs]
2020-12-01T22:49:31.757+0530: 15.657: [GC concurrent-root-region-scan-start]
2020-12-01T22:49:31.764+0530: 15.664: [GC concurrent-root-region-scan-end, 0.0067826 secs]
2020-12-01T22:49:31.764+0530: 15.664: [GC concurrent-mark-start]
2020-12-01T22:49:31.765+0530: 15.666: [GC concurrent-mark-end, 0.0015043 secs]
2020-12-01T22:49:31.766+0530: 15.666: [GC remark 2020-12-01T22:49:31.766+0530: 15.666: [Finalize Marking, 0.0010641 secs] 2020-12-01T22:49:31.767+0530: 15.667: [GC ref-proc, 0.0100232 secs] 2020-12-01T22:49:31.777+0530: 15.677: [Unloading, 0.0072592 secs], 0.0191010 secs]
[Times: user=0.20 sys=0.00, real=0.02 secs]
2020-12-01T22:49:31.785+0530: 15.685: [GC cleanup 37M->37M(20G), 0.0085803 secs]
[Times: user=0.04 sys=0.00, real=0.01 secs]
2020-12-01T22:53:45.090+0530: 268.990: [GC pause (G1 Evacuation Pause) (young) 268.990: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 0, predicted base time: 30.72 ms, remaining time: 169.28 ms, target pause time: 200.00 ms]
268.990: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 123 regions, survivors: 5 regions, predicted young region time: 1342.47 ms]
268.990: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 123 regions, survivors: 5 regions, old: 0 regions, predicted pause time: 1373.18 ms, target pause time: 200.00 ms]
269.040: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: candidate old regions not available]
, 0.0494933 secs]
[Parallel Time: 31.8 ms, GC Workers: 21]
[GC Worker Start (ms): Min: 268991.8, Avg: 268992.1, Max: 268992.5, Diff: 0.7]
[Ext Root Scanning (ms): Min: 0.9, Avg: 1.9, Max: 5.7, Diff: 4.7, Sum: 39.6]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 1.0, Max: 7.0, Diff: 7.0, Sum: 20.2]
[Object Copy (ms): Min: 21.7, Avg: 28.1, Max: 29.1, Diff: 7.4, Sum: 591.0]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 21]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.8]
解决方案
如果你有 20G 堆,你可能期望最终的完整 GC 需要 4 秒。我可以相信。为什么 Apache Ignite 需要这么多堆?在正常操作过程中使用了多少堆?您还可以进行堆转储并搜索内存泄漏。
Apache Ignite 默认不将数据存储在堆上的节点,因此不能仅用数据量来解释。
我通过 gceasy.io 运行了你的 GC 日志,它发现了几个 GC 跨越大约 2 秒。我不确定它是否解释了您观察到的 4 秒停顿,但您可能会预期 2 秒的停顿显然来自 GC,这在同一个球场上。
所以你需要弄清楚为什么你的 JVM 有时会变慢。也许是 IO、虚拟化暂停等。此外,如果您的堆从不大于 2G,也许您应该使用类似的东西运行-Xmx4G
?
推荐阅读
- list - Ansible 从两个列表中生成列表(归因于两个列表中的第一个列表)
- javascript - Axios POST 请求仅在页面刷新且不起作用后记录
- react-native - 使用动态参数调用函数
- ssl - `openssl genrsa`和`openssl genpkey -algorithm rsa`之间的区别?
- python - 有没有办法根据列表的长度动态更改变量
- vb.net - sqldataadapter在目标表中插入重复行
- amazon-web-services - 安装 kube-prometehus-stack 卡在钩子并发症上
- javascript - JavaScript IF 语句和 While 循环无法正常工作
- python - (Python)如何将 3 个递减列表合并为一个递减列表?
- output - uniq -u 什么都不返回