首页 > 解决方案 > Can Apache Spark use TCP listener as input?

问题描述

Can Apache Spark use TCP listener as input? If yes, maybe someone has examples of java code, that do the operation.

I try to find examples about this, but all tutorials show how to define input connection to the data server via TCP and not using a TCP listener that waits for incoming data.

标签: javaapache-sparkspark-streaming

解决方案


是的,可以使用 Spark 侦听 TCP 端口并处理任何传入数据。您正在寻找的是Spark Streaming

在文档github上有一个小指南可以收听 TCP 源。为了方便:

import org.apache.spark.*;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.*;
import org.apache.spark.streaming.api.java.*;
import scala.Tuple2;

// Create a local StreamingContext with two working thread and batch interval of 1 second
SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount");
JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1));

// Create a DStream that will connect to hostname:port, like localhost:9999
JavaReceiverInputDStream<String> lines = jssc.socketTextStream("localhost", 9999);

// Split each line into words
JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(x.split(" ")).iterator());

// Count each word in each batch
JavaPairDStream<String, Integer> pairs = words.mapToPair(s -> new Tuple2<>(s, 1));
JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey((i1, i2) -> i1 + i2);

// Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.print();

jssc.start();              // Start the computation
jssc.awaitTermination();   // Wait for the computation to terminate

推荐阅读