java - NiFi ConvertAvroToParquet IllegalArgumentException 您不能多次调用 toBytes() 而不调用 reset()
问题描述
我无法使用 ConvertAvroToParquet 将 3.7GB avro 文件转换为 parquet 格式。
我的设置:ExecuteSQL 1.10.0 > ConvertAvroToParquet 1.10.0 > PutS3Object 1.10.0。
默认情况下, ConvertAvroToParquet设置。
2020-09-24 20:54:40,534 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@6c8e0773 checkpointed with 645 Records and 0 Swap Files in 5 milliseconds (Stop-the-world time = 1 milliseconds, Clear Edit Logs time = 1 millis), max Transaction ID 9971
2020-09-24 20:54:48,015 INFO [pool-12-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2020-09-24 20:54:48,015 INFO [pool-12-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 2 records in 0 milliseconds
2020-09-24 20:55:03,820 INFO [Timer-Driven Process Thread-7] o.a.p.hadoop.InternalParquetRecordWriter Flushing mem columnStore to file. allocated memory: 100899470
2020-09-24 20:55:03,953 ERROR [Timer-Driven Process Thread-7] o.a.n.p.parquet.ConvertAvroToParquet ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] failed to process session due to java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset(); Processor Administratively Yielded for 1 sec: java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:65)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:148)
at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:308)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.lambda$onTrigger$0(ConvertAvroToParquet.java:159)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2990)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.onTrigger(ConvertAvroToParquet.java:141)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-09-24 20:55:03,954 WARN [Timer-Driven Process Thread-7] o.a.n.controller.tasks.ConnectableTask Administratively Yielding ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] due to uncaught Exception: java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:65)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:148)
at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:308)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.lambda$onTrigger$0(ConvertAvroToParquet.java:159)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2990)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.onTrigger(ConvertAvroToParquet.java:141)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-09-24 20:55:52,897 INFO [Timer-Driven Process Thread-4] o.a.p.hadoop.InternalParquetRecordWriter Flushing mem columnStore to file. allocated memory: 101841856
这会是什么?
解决方案
我按照你的建议做了,Pdeuxa,它非常适合一张小桌子,但不适用于大桌子。所以,我在文件中增加了 JVM 的堆内存nifi-1.10.0/conf/bootstrap.conf
,它对我有用。
JVM内存设置
#java.arg.2=-Xms512m
#java.arg.3=-Xmx512m
java.arg.2=-Xms2048m
java.arg.3=-Xmx2048m
感谢您的时间和关注,Pdeuxa。
推荐阅读
- java - 302重定向加载空白页面
- algorithm - Ratcliff-Obershelp 字符串相似度
- android - 如何查找 Android 应用程序中是否正在使用 Java 包?
- ios - Safari bug - selenium Actions 点击方法
- sql - 在没有 FK 的情况下保持参照完整性
- json - 如何从 React Native 中的服务器响应中更新状态
- php - 为什么我在我的 PHP 中两次获得相同的列?PostgreSQL 查询?
- r - 使用主成分构建分数图
- nginx - Kubernetes(然后是牧场主)超时
- r - 对列名以特定字符串 (R) 结尾的列中的行求和