首页 > 技术文章 > output-operations-on-dstreams

rocky-AGE-24 2017-08-29 09:28 原文

Finally, this can be further optimized by reusing connection objects across multiple RDDs/batches. One can maintain a static pool of connection objects than can be reused as RDDs of multiple batches are pushed to the external system, thus further reducing the overheads.

Scala
Python
dstream.foreachRDD { rdd =>
  rdd.foreachPartition { partitionOfRecords =>
    // ConnectionPool is a static, lazily initialized pool of connections
    val connection = ConnectionPool.getConnection()
    partitionOfRecords.foreach(record => connection.send(record))
    ConnectionPool.returnConnection(connection)  // return to the pool for future reuse
  }
}

http://spark.apache.org/docs/1.6.1/streaming-programming-guide.html#output-operations-on-dstreams

推荐阅读