ignite - 在 WAL 归档期间点燃 FileAlreadyExistsException
问题描述
我们正在使用 Gridgain 版本:8.8.10 JDK 版本:1.8
我们在 Azure Kubernetes 中有 3 个节点的 Ignite 集群。我们启用了本机持久性。我们的一些 Ignite pod 将进入 CrashLoopBackOff,但有以下例外
[07:45:45,477][WARNING][main][FileWriteAheadLogManager] Content of WAL working directory needs rearrangement, some WAL segments will be moved to archive: /gridgain/walarchive/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1. Segments from 0000000000000001.wal to 0000000000000008.wal will be moved, total number of files: 8. This operation may take some time.
[07:45:45,480][SEVERE][main][IgniteKernal] Exception during start processors, node will be stopped and close connections
class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter []
at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1938)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1159)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1787)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1711)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1141)
at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1059)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:945)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:844)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:714)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:683)
at org.apache.ignite.Ignition.start(Ignition.java:344)
at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:290)
Caused by: class org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to move WAL segment [src=/gridgain/wal/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1/0000000000000001.wal, dst=/gridgain/walarchive/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1/0000000000000001.wal]
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.moveSegmentsToArchive(FileWriteAheadLogManager.java:3326)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.prepareAndCheckWalFiles(FileWriteAheadLogManager.java:1542)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:494)
at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:60)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:605)
at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1935)
... 11 more
Caused by: java.nio.file.FileAlreadyExistsException: /gridgain/walarchive/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1/0000000000000001.wal
at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:450)
at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
at java.base/java.nio.file.Files.move(Files.java:1422)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.moveSegmentsToArchive(FileWriteAheadLogManager.java:3307)
... 16 more
[07:45:45,482][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine).
似乎在 WAL 归档期间创建了同名文件,并且无法覆盖该文件。我们缺少 WAL 归档期间的任何特定配置吗?
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<!-- set the size of wal segments to 128MB -->
<property name="walSegmentSize" value="#{128 * 1024 * 1024}"/>
<property name="writeThrottlingEnabled" value="true"/>
<!-- Set the page size to 8 KB -->
<property name="pageSize" value="#{8 * 1024}"/>
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="name" value="Default_Region"/>
<!-- Memory region of 20 MB initial size. -->
<property name="initialSize" value="#{20 * 1024 * 1024}"/>
<!-- Memory region of 8 GB max size. -->
<property name="maxSize" value="#{8L * 1024 * 1024 * 1024}"/>
<!-- Enabling eviction for this memory region. -->
<property name="pageEvictionMode" value="RANDOM_2_LRU"/>
<property name="persistenceEnabled" value="true"/>
<!-- Increasing the buffer size to 1 GB. -->
<property name="checkpointPageBufferSize" value="#{1024L * 1024 * 1024}"/>
</bean>
</property>
<property name="walPath" value="/gridgain/wal"/>
<property name="walArchivePath" value="/gridgain/walarchive"/>
</bean>
</property>
任何人都曾在使用 Ignite Kubernetes Cluster 时遇到过类似的问题。
我们在 GKE 中观察到这一点。在 AKS 中它工作正常。我们正在使用 Apache Ignite Operator。
https://ignite.apache.org/docs/latest/installation/kubernetes/gke-deployment
解决方案
推荐阅读
- javascript - 如果电子邮件已注册,Firebase SDK Facebook 登录将无法正常工作
- java - 尝试在 Minecraft 中播放声音会导致 [Client thread/WARN]: Unable to play unknown soundEvent
- html - 网站上的照片只显示一行,我需要添加多行
- php - WooCommerce 如何获取我的帐户项目列表?
- java - 停止 Java 服务器按顺序发送消息
- c++ - C++ QT 库模块
- html - 使css动画无限运行
- javascript - 在运行反应钩子之前等待 500 毫秒
- c# - 使用 ICollectionview 在 ObservableCollection 中搜索和过滤数据
- javascript - 我需要获取数据的 URL (html)