scala - 退出状态:-100。诊断:容器在 *lost* 节点上释放
问题描述
我有 2 个输入文件(一个在 JSON 中,另一个在镶木地板中),我正在尝试对这 2 个大数据帧进行连接并将连接的数据帧写入 s3(作为 JSON)。这项工作永远卡住了(在将连接的 JSON 写入 s3 时)。我正在使用 70 个 r3.4xlarge(从站)。
df1.rdd.partitions.size = 34234(大小~4 TB)
df2.rdd.partitions.size = 1200(大小~58GB)
我尝试过但仍然没有改善的事情:
最大资源设置为 true 的动态分配静态分配:spark.executor.cores = 5
spark.executor.memory = 40G
spark.executor.instances = 209
更改分区,我通过将 spark.default.parallelism 和 spark.sql.shuffle.partitions 设置为 2000、4000、8000、10000、20000、35000 但没有用。
中间持久化 - 持久化(memory_disk 和 disk_only 类型)加入的 df 持久化两个输入(加入之前),对两个 dfs 执行一些操作,然后加入并写入 s3
调整“mapreduce.input.fileinputformat.split.minsize”和“mapreduce.input.fileinputformat.split.maxsize”(到 750000000)。
我也尝试过使用 30 r3.8xlarge。没有改善☹</p>
我不断收到这 2 个错误之一 -</p>
zeppelin-interpreter-spark-zeppelin-ip-10-0-1-213.log: WARN [2019-02-12 04:54:43,437] ({dispatcher-event-loop-8} Logging.scala[logWarning]:66) - Lost task 24117.0 in stage 3.0 (TID 32666, ip-10-0-1-242.ec2.internal, executor 5): ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Container marked as failed: container_1549914591854_0018_01_000010 on host: ip-10-0-1-242.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:213)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:166)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:166)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:166)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:435)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:471)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
at org.apache.spark.sql.DataFrameWriter.json(DataFrameWriter.scala:487)
... 48 elided
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2234 in stage 15.0 failed 4 times, most recent failure: Lost task 2234.3 in stage 15.0 (TID 136390, ip-10-0-1-56.ec2.internal, executor 8): ExecutorLostFailure (executor 8 exited caused by one of the running tasks) Reason: Slave lost
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1708)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1696)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1695)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1695)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:855)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:855)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:855)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1923)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1878)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1867)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:671)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:186)
... 82 more
有人可以告诉我我在这里做错了什么吗?
解决方案
由于内存问题,执行程序似乎丢失了。请尝试在 spark-default.cfg 文件中配置 spark 设置或尝试增加计算资源
推荐阅读
- r - 拆分一列字符向量并返回一个列表
- javascript - 从 RestEnd 点动态获取数据
- encryption - 使用.net core DataProtection 用密码解除数据保护?
- bash - 用于根据天数计算目录大小和删除的 Bash 脚本
- android - 内视图寻呼机的实现
- android - Android : XMPP MUCLight Smack 监听群聊消息
- scala - 使用 sparkSession,当我读取 parquet 文件时,我得到 type not yet supported 错误消息
- blueprism - 环境问题。将 URL 传递给 BluePrism 中的 Obj 和 Process Studio 时的变量
- ios - 在 iOS Objective C 中处理位置权限
- facebook - 获取在 Facebook 上标记用户的所有评论