java - TensorFlowException:不成功的 TensorSliceReader 构造函数:未能找到 /mnt/yarn/usercache 的任何匹配文件
问题描述
我正在尝试对存储在 hive 表中的一些数据运行“onto_electra_base_uncased”模型,我在将数据保存到 hive 表之前在 df 上运行了 count() 并得到了这个异常。
Spark Shell 启动配置:
spark-shell spark.ui.port="4052" --driver-memory 20g --executor-memory 45g --conf spark.driver.memoryOverhead=4g --conf spark.executor.memoryOverhead=4g spark.driver.extraClassPath="spark-nlp-assembly-2.7.5.jar,bdl-voltage.jar,vibesimplejava.jar,voltage-hadoop-5.0.0.jar,vsconfig.jar" spark.executor.extraClassPath="spark-nlp-assembly-2.7.5.jar,bdl-voltage.jar,vibesimplejava.jar,voltage-hadoop-5.0.0.jar,vsconfig.jar" --jars "spark-nlp-assembly-2.7.5.jar,bdl-voltage.jar,vibesimplejava.jar,voltage-hadoop-5.0.0.jar,vsconfig.jar"
例外:
scala> nonNullPerson.count()
[Stage 9:> (0 + 8) / 200]21/08/23 10:07:23 WARN TaskSetManager: Lost task 4.0 in stage 9.0 (TID 309, ip-10-237-133-245.ec2.internal, executor 2): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$dfAnnotate$1: (array<array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>>) => array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.serializefromobject_doConsume_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.mapelements_doConsume_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.deserializetoobject_doConsume_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.serializefromobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.mapelements_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.deserializetoobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:212)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:107)
at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:105)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: org.tensorflow.TensorFlowException: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /mnt/yarn/usercache/hadoop/appcache/application_1629712606577_0001/container_1629712606577_0001_01_000003/tmp/7e8b5bbda7f3_ner3970436258867963111/variables
[[{{node save/RestoreV2}}]]
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1333)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings.getModelIfNotSet(BertEmbeddings.scala:172)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings.annotate(BertEmbeddings.scala:223)
at com.johnsnowlabs.nlp.AnnotatorModel$$anonfun$dfAnnotate$1.apply(AnnotatorModel.scala:35)
at com.johnsnowlabs.nlp.AnnotatorModel$$anonfun$dfAnnotate$1.apply(AnnotatorModel.scala:34)
... 34 more
Caused by: org.tensorflow.TensorFlowException: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /mnt/yarn/usercache/hadoop/appcache/application_1629712606577_0001/container_1629712606577_0001_01_000003/tmp/7e8b5bbda7f3_ner3970436258867963111/variables
[[{{node save/RestoreV2}}]]
at org.tensorflow.Session.run(Native Method)
at org.tensorflow.Session.access$100(Session.java:48)
at org.tensorflow.Session$Runner.runHelper(Session.java:326)
at org.tensorflow.Session$Runner.run(Session.java:276)
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:325)
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper.readObject(TensorflowWrapper.scala:248)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2295)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2404)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
... 43 more
我在 EMR 上运行它,它的配置是:
- 大师(1)-> r5.2xlarge
- 核心(2)-> r5.4xlarge
- 任务(1)-> r5.4xlarge
解决方案
这个问题的解决方案是使用 kryo 序列化,默认 spark-shell 或 spark-submit 调用使用 java 序列化,在 spark-nlp 中的 Annotate 类被实现为使用 Kryo 序列化,因此同样应该用于运行任何 spark-nlp 作业
推荐阅读
- sql - 查询执行时间过长
- java - 为什么 hashCode() 函数会产生错误
- java - 将 awssdk s3 v2 与 GCS 和 Minio 网关一起使用时出现 Etag 解码错误
- oracle - 为什么我得到一个 ORA-24247 代码在函数中但不在匿名块中?
- react-native - 如何使用 redux saga 实现 redux 持久化?
- python-3.x - 使用 Machin 方法估计 pi 的准确性问题
- javascript - 如何使已转换为 JSON 的 Excel 文件自动加载到表(DataTables)
- django - 在 Django Forms 中,如何向 SelectedMultiple 控件添加过滤器?
- java - 为什么 spring boot 要求我定义一个 'entityManagerFactory' bean?
- python - 如何将 XML 文件中的某些行转换为 csv