mongodb - 如何使用 spark 从 zepplin 中的 mongodb 读取数据?
问题描述
我正在使用 hdp 2.6 中的 zeppelin 我想使用 spark2 解释器从 mongodb 读取集合
util.Properties.versionString
spark.version
res22: String = version 2.11.8
res23: String = 2.2.0.2.6.4.0-91
当我尝试这个时,我正在使用 MongoDB 3.4.14 mongo-spark-connector 2.2.2 mongo-java-driver 3.5.0
val customReadConfig = ReadConfig(Map("readPreference.name" -> "secondaryPreferred" ,"uri" -> "mongodb://127.0.0.1:27017/test.collections"))
val df5 = spark.sparkSession.read.mongo(customReadConfig)
我收到这个错误
customReadConfig: com.mongodb.spark.config.ReadConfig.Self =ReadConfig(test,collections,Some(mongodb://127.0.0.1:27017/test.collections),1000,DefaultMongoPartitioner,Map(),15,ReadPreferenceConfig(secondaryPreferred,None),ReadConcernConfig(None),false)
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 1 times, most recent failure: Lost task 0.0 in stage 20.0 (TID 20, localhost, executor driver): java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at com.mongodb.spark.rdd.MongoRDD$MongoCursorIterator.<init>(MongoRDD.scala:174)
at com.mongodb.spark.rdd.MongoRDD.compute(MongoRDD.scala:152)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
解决方案
推荐阅读
- keras - 使用 Tensorflow 序列 API 进行标注
- java - JavaFX DatePicker 在错误的位置打开 -Dsun.java2d.uiScale=1.0
- azure - 避免从 EventHub 并行消费相同的事件
- pine-script - 错误信息:该研究引用了太多的历史蜡烛(10001)
- php - 删除用户 cookie
- algolia - 有没有办法在 algolia 部分更新方法中更新数组值?
- php - docker-compose mysql_pdo 连接失败
- c - C中别名的插图是否正确?
- azure - 在子网级别应用的 Azure NSG 是否会影响该子网上的 VM 之间的通信?
- c++ - 返回引用和常量引用有何不同?