首页 > 解决方案 > UserCodeException:java.lang.OutOfMemoryError:流式自动缩放时的 Java 堆空间

问题描述

我有两个流式传输管道在到目前为止没有问题的产品上运行(都使用 n1-standard-4)。但是,当我决定尝试自动缩放时,它给了我所说的错误。我已经尝试过使用普通的自动缩放和流引擎(使用 n2-highmem1、n1-highmen1),但没有任何效果。

这是我的管道的结构。它是 BigQuery 的 Pubsub。 在此处输入图像描述

java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.OutOfMemoryError: Java heap space
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:194)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:165)
org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:125)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1203)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.beam.sdk.util.UserCodeException: java.lang.OutOfMemoryError: Java heap space
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:34)
drivemode.com.dataflow.reader.PubsubToAnalyticsTableRowFN$DoFnInvoker.invokeSetup(Unknown Source)
org.apache.beam.runners.dataflow.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.deserializeCopy(DoFnInstanceManagers.java:80)
org.apache.beam.runners.dataflow.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.peek(DoFnInstanceManagers.java:62)
org.apache.beam.runners.dataflow.worker.UserParDoFnFactory.create(UserParDoFnFactory.java:95)
org.apache.beam.runners.dataflow.worker.DefaultParDoFnFactory.create(DefaultParDoFnFactory.java:75)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.createParDoOperation(IntrinsicMapTaskExecutorFactory.java:264)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.access$000(IntrinsicMapTaskExecutorFactory.java:86)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:183)\
 11 more\nCaused by: java.lang.OutOfMemoryError: Java heap space
org.tukaani.xz.lz.LZDecoder.<init>(Unknown Source)
org.tukaani.xz.LZMA2InputStream.<init>(Unknown Source)
org.tukaani.xz.LZMA2InputStream.<init>(Unknown Source)
org.apache.commons.compress.archivers.sevenz.LZMA2Decoder.decode(LZMA2Decoder.java:39)
org.apache.commons.compress.archivers.sevenz.Coders.addDecoder(Coders.java:76)
org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecoderStack(SevenZFile.java:933)
org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecodingStream(SevenZFile.java:909)
org.apache.commons.compress.archivers.sevenz.SevenZFile.getNextEntry(SevenZFile.java:222)
net.iakovlev.timeshape.TimeZoneEngine.lambda$initialize$0(TimeZoneEngine.java:111)
net.iakovlev.timeshape.TimeZoneEngine$$Lambda$76/1740730188.apply(Unknown Source)
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151 java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) net.iakovlev.timeshape.Index.build(Index.java:73) net.iakovlev.timeshape.TimeZoneEngine.initialize(TimeZoneEngine.java:126) net.iakovlev.timeshape.TimeZoneEngine.initialize(TimeZoneEngine.java:94) drivemode.com.dataflow.reader.AnalyticsTableRowFN.setUp(AnalyticsTableRowFN.java:60) drivemode.com.dataflow.reader.PubsubToAnalyticsTableRowFN$DoFnInvoker.invokeSetup(Unknown Source)
org.apache.beam.runners.dataflow.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.deserializeCopy(DoFnInstanceManagers.java:80) org.apache.beam.runners.dataflow.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.peek(DoFnInstanceManagers.java:62) org.apache.beam.runners.dataflow.worker.UserParDoFnFactory.create(UserParDoFnFactory.java:95) org.apache.beam.runners.dataflow.worker.DefaultParDoFnFactory.create(DefaultParDoFnFactory.java:75)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.createParDoOperation(IntrinsicMapTaskExecutorFactory.java:264) org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.access$000(IntrinsicMapTaskExecutorFactory.java:86) org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:183)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:165)
org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)

或者有时我得到

org.apache.beam.vendor.guava.v20_0.com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: Could not initialize class drivemode.com.dataflow.reader.AnalyticsTableRowFN
 org.apache.beam.vendor.guava.v20_0.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2212)
 org.apache.beam.vendor.guava.v20_0.com.google.common.cache.LocalCache.get(LocalCache.java:4053)
 org.apache.beam.vendor.guava.v20_0.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4899)
 org.apache.beam.runners.dataflow.worker.UserParDoFnFactory.create(UserParDoFnFactory.java:91)
 org.apache.beam.runners.dataflow.worker.DefaultParDoFnFactory.create(DefaultParDoFnFactory.java:75)
 org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.createParDoOperation(IntrinsicMapTaskExecutorFactory.java:264)
 org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.access$000(IntrinsicMapTaskExecutorFactory.java:86)
 org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:183)
 org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:165)
 org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
 org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
 org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
 org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:125)
 org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1203)
 org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
 org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 java.lang.Thread.run(Thread.java:745
Caused by: java.lang.NoClassDefFoundError: Could not initialize class drivemode.com.dataflow.reader.AnalyticsTableRowFN java.io.ObjectStreamClass.hasStaticInitializer(Native Method)
java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1787) java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72) java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:253)
java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:251)
 java.security.AccessController.doPrivileged(Native Method)
java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:250)
java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:611)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:71) org.apache.beam.runners.dataflow.worker.UserParDoFnFactory$UserDoFnExtractor.getDoFnInfo(UserParDoFnFactory.java:62) org.apache.beam.runners.dataflow.worker.UserParDoFnFactory.lambda$create$0(UserParDoFnFactory.java:93)
org.apache.beam.vendor.guava.v20_0.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4904) org.apache.beam.vendor.guava.v20_0.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3628) org.apache.beam.vendor.guava.v20_0.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2336)
org.apache.beam.vendor.guava.v20_0.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2295)
org.apache.beam.vendor.guava.v20_0.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2208)
... 18 more

标签: google-cloud-dataflowapache-beam

解决方案


根据粘贴的日志,您的某些 ParDo 函数或 MapElements 消耗的内存可能超出其应有的内存,请查看 PubsubToAnalyticsTableRowFN。此梁文档链接可能会帮助您调整管道。


推荐阅读