python - socket.timeout:使用 pyspark 执行 python 时超时
问题描述
执行时出现以下异常,这里我们进行一些计算并将数据帧写入 parquet 文件。在保存时我面临套接字超时问题,并且还尝试在执行时使用 heartbeatInterval 但仍然没有得到解决
20/11/17 14:50:28 错误实用程序:中止任务 org.apache.spark.api.python.PythonException:回溯(最后一次调用):文件“C:\NMR_PySpark\ARCHimedes\PySpark.Package\spark- 3.0.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py",第 585 行,在主文件 "C:\NMR_PySpark\ARCHimedes\PySpark.Package\spark-3.0.0-bin -hadoop2.7\python\lib\pyspark.zip\pyspark\serializers.py”,第 593 行,在 read_int 长度 = stream.read(4) 文件“D:\obj\Windows-Release\37amd64_Release\msi_python\zip_amd64\ socket.py",第 589 行,进入 socket.timeout:超时
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:503)
at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:99)
at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:49)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:272)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:281)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:205)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/11/17 14:50:28 错误 FileFormatWriter:作业 job_20201117145009_0212 中止。20/11/17 14:50:28 错误执行程序:阶段 212.0 (TID 452) 中任务 0.0 中的异常 org.apache.spark.SparkException:写入行时任务失败。在 org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:291) 在 org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala: 205) 在 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) 在 org.apache.spark.scheduler.Task.run(Task.scala:127) 在 org.apache.spark.executor.Executor $TaskRunner.$anonfun$run$3(Executor.scala:444) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run( Executor.scala:447) 在 java.util.concurrent.ThreadPoolExecutor。
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:503)
at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:99)
at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:49)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:272)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:281)
... 9 more
解决方案
推荐阅读
- r - 解析错误(文本 = x,keep.source = FALSE):
:2:0: 输入 1 意外结束: OffenceAgainst ~ ^ - java - 将地图数据列表分组到 Java 中的嵌套 HashMap
- c# - linq 是返回整个数据源还是只返回指定的选择列?
- python-3.x - 如何取列的前 10 150 行的平均值并将平均值用作列的第一个值和列的第二个值,从第 151 行开始
- javascript - Safari 中的 svg 路径转换
- python - 合并列中的行并删除空白行
- java - Integer.parseInt() 方法以空字符串作为参数崩溃
- python - 如何使用 V2 API 对 DialogFlow 进行请求查询?
- python - 如何在 uuid 中使用 IN 运算符 - Django?
- testing - 具有自定义有效负载的 Botium - DefaultWelcome:bot 说未定义