scala - Spark：报告集群的总量和可用内存

问题描述

我在 Amazon EMR 上运行 Spark 作业；我想继续从程序本身报告集群的总内存和可用内存。Spark API 中是否有任何方法可以提供有关集群内存的信息？

标签： scalaapache-sparkcluster-computingamazon-emrelastic-map-reduce

您可以使用 spark.metrics.conf

如何使用：在你的 spark conf 文件中初始化 spark.metrics.conf

spark.metrics.conf = /path/to/metrics.properties

在上面的路径中创建metrics.properties文件。在该文件中提到您想要从 spark 应用程序中获取的参数，甚至您可以指定格式和间隔。

例如，在这里我每 1 分钟获取一次 CSV 格式的数据：

driver.sink.csv.class=org.apache.spark.metrics.sink.CsvSink

# Polling period for the CsvSink
#*.sink.csv.period=1
# Unit of the polling period for the CsvSink
#*.sink.csv.unit=minutes

# Polling directory for CsvSink
driver.sink.csv.directory=/Path/at/which/data/will/be/dumped

# Polling period for the CsvSink specific for the worker instance
driver.sink.csv.period=1
# Unit of the polling period for the CsvSink specific for the worker instance
driver.sink.csv.unit=minutes

您可以在以下位置找到完整文档：https ://spark.apache.org/docs/latest/monitoring.html#metrics

scala - Spark：报告集群的总量和可用内存

问题描述

解决方案

推荐阅读