首页 > 解决方案 > How to connect to Spark master in Swarm mode from host

问题描述

I want to run a python application on a Spark cluster, but my application is not inside the master container necessarily. The following is my compose yaml file:

version: '3'
services:
  spark-master:
    image: bde2020/spark-master
    deploy:
      placement:
        constraints:
          [node.hostname == master]
    environment:
      - INIT_DAEMON_STEP=setup_spark
    ports:
      - '6080:8080'
      - '6077:7077'
      - '6040:4040'
  spark-worker-1:
    image: bde2020/spark-worker
    depends_on:
      - 'spark-master'
    environment:
      - 'SPARK_MASTER=spark://spark-master:7077'
    ports:
      - '6081:8081'
      - '6041:4040'

When I create a stack and run these two containers on my Swarm cluster, and run my python application with the following SparkSession configuration, I receive connection refused error.

spark = SparkSession.builder \
    .master("spark://PRIVATE_HOST_IP:6077") \
    .appName("Spark Swarm") \
    .getOrCreate()

On the other hand when I run those containers in normal mode with docker-compose up, the same python application with the same SparkSession configuration works like a charm. Obviously, it is not desirable since I want to have the possibility of scaling up and down. Therefore I am looking for a way to run my application in Swarm mode.

The strange thing about my issue is that I am pretty sure that port mapping is done correctly because after setting up the stack, I am able to connect to Spark UI via 6080 port, which is the port that I have mapped to Spark:8080.

Another point is that I have successfully connected to other containers like Cassandra and Kafka via the same approach (mapping the serving ports of those containers to host ports, and connecting to the set ports on host), but the same avenue is not working for Spark container.

标签: pythondockerapache-sparkpysparkdocker-swarm

解决方案


推荐阅读