首页 > 解决方案 > Why argo doesn't finish, when the spark submit of an apache spark app on k8s ends?

问题描述

I have an issue with Argo-Spark integration, the main issue is when the spark-submit application takes more processing time on k8s, so when Spark is already finished and de pod is completed, on the Argo side, the dag is still running when the spark application has finished.

The image shows when the spark app is already finished and the dag is still running.

enter image description here

The log shows the trace and is infinite until It's manually stopped.

21/10/27 04:09:41 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Pending)
21/10/27 04:09:41 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
     pod name: spark-compaction-driver-379250e309d84131960098d6d65bb355
     namespace: argo-workflows
     labels: spark-app-selector -> spark-5a47913d3a574b329c5a575397c0c5fe, spark-role -> driver
     pod uid: ed147197-4f89-41bb-9fb5-5d6dbb0351d7
     creation time: 2021-10-27T04:09:39Z
     service account name: argo-workflow
     volumes: spark-local-dir-1, spark-conf-volume-driver, argo-workflow-token-lrlv9
     node name: aks-intensive-11775020-vmss000009
     start time: 2021-10-27T04:09:39Z
     phase: Running
     container status: 
         container name: spark-kubernetes-driver
         container image: 744752950324.dkr.ecr.us-east-1.amazonaws.com/spark-compaction:0.0.9.9
         container state: running
         container started at: 2021-10-27T04:09:41Z
21/10/27 04:09:42 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:43 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:44 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:45 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:46 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:47 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:48 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:49 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:50 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:51 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:52 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:53 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:54 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:55 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:56 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:57 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:58 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:09:59 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:00 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:01 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:02 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:03 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:04 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:05 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:06 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:07 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:08 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:09 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:10 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:11 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:12 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:13 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:14 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:15 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:16 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:17 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:18 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:19 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:20 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:21 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:22 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:23 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:24 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:25 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:26 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:27 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:28 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)
21/10/27 04:10:29 INFO LoggingPodStatusWatcherImpl: Application status for spark-5a47913d3a574b329c5a575397c0c5fe (phase: Running)

The spark-submit

spark-submit:
    ''
    ./bin/spark-submit \
    --master k8s://https://xxxxxxxxxxxxxxxx \
    --deploy-mode cluster \
    --name spark-compaction-exec-{{workflow.outputs.parameters.golbal-generate-uuid-output}} \
    --class xxxxxx.Compaction \
    --conf spark.kubernetes.container.image=${imageSparkCompaction} \
    --conf spark.kubernetes.driver.pod.name=spark-compaction-driver-{{workflow.outputs.parameters.golbal-generate-uuid-output}} \
    --conf spark.kubernetes.namespace=argo \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=argo \
    --conf spark.executor.instances=5 \
    --verbose \
    local:///opt/spark/jars/spark-compaction-0.9.0.jar {{inputs.parameters.database-comp}} {{inputs.parameters.table-name-comp}} {{inputs.parameters.raw-zone-container-comp}} {{inputs.parameters.clean-zone-container-comp}} {{inputs.parameters.back-zone-container-comp}} {{inputs.parameters.data-source-comp}} {{inputs.parameters.file-format-comp}} {{inputs.parameters.erp-env-comp}} {{inputs.parameters.erp-database-comp}} {{inputs.parameters.object-format-comp}} {{inputs.parameters.date-col-ref-comp}} {{inputs.parameters.compaction-key-comp}} {{inputs.parameters.stg-zone-container-comp}}
    '' 

标签: apache-sparkkubernetesargo

解决方案


推荐阅读