首页 > 解决方案 > NiFi ConvertAvroToParquet IllegalArgumentException 您不能多次调用 toBytes() 而不调用 reset()

问题描述

我无法使用 ConvertAvroToParquet 将 3.7GB avro 文件转换为 parquet 格式。

我的设置:ExecuteSQL 1.10.0 > ConvertAvroToParquet 1.10.0 > PutS3Object 1.10.0。

默认情况下, ConvertAvroToParquet设置。

2020-09-24 20:54:40,534 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@6c8e0773 checkpointed with 645 Records and 0 Swap Files in 5 milliseconds (Stop-the-world time = 1 milliseconds, Clear Edit Logs time = 1 millis), max Transaction ID 9971
2020-09-24 20:54:48,015 INFO [pool-12-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2020-09-24 20:54:48,015 INFO [pool-12-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 2 records in 0 milliseconds
2020-09-24 20:55:03,820 INFO [Timer-Driven Process Thread-7] o.a.p.hadoop.InternalParquetRecordWriter Flushing mem columnStore to file. allocated memory: 100899470
2020-09-24 20:55:03,953 ERROR [Timer-Driven Process Thread-7] o.a.n.p.parquet.ConvertAvroToParquet ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] failed to process session due to java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset(); Processor Administratively Yielded for 1 sec: java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
    at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
    at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:65)
    at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:148)
    at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
    at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)
    at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:308)
    at org.apache.nifi.processors.parquet.ConvertAvroToParquet.lambda$onTrigger$0(ConvertAvroToParquet.java:159)
    at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2990)
    at org.apache.nifi.processors.parquet.ConvertAvroToParquet.onTrigger(ConvertAvroToParquet.java:141)
    at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
    at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
    at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
    at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
2020-09-24 20:55:03,954 WARN [Timer-Driven Process Thread-7] o.a.n.controller.tasks.ConnectableTask Administratively Yielding ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] due to uncaught Exception: java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
    at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
    at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:65)
    at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:148)
    at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
    at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)
    at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:308)
    at org.apache.nifi.processors.parquet.ConvertAvroToParquet.lambda$onTrigger$0(ConvertAvroToParquet.java:159)
    at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2990)
    at org.apache.nifi.processors.parquet.ConvertAvroToParquet.onTrigger(ConvertAvroToParquet.java:141)
    at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
    at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
    at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
    at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
2020-09-24 20:55:52,897 INFO [Timer-Driven Process Thread-4] o.a.p.hadoop.InternalParquetRecordWriter Flushing mem columnStore to file. allocated memory: 101841856

这会是什么?

标签: javaapache-nifiavroparquet

解决方案


我按照你的建议做了,Pdeuxa,它非常适合一张小桌子,但不适用于大桌子。所以,我在文件中增加了 JVM 的堆内存nifi-1.10.0/conf/bootstrap.conf,它对我有用。

JVM内存设置

#java.arg.2=-Xms512m
#java.arg.3=-Xmx512m
java.arg.2=-Xms2048m
java.arg.3=-Xmx2048m

感谢您的时间和关注,Pdeuxa。


推荐阅读