apache-spark - 生产中 Spark 的系统要求

问题描述

有人可以帮我解决 Spark 在生产环境中运行的系统要求。

我正在尝试为来自 Kafka Producer 的数据的批处理设置环境。

每天处理的数据量以 TB 为单位。数据来自HDFS，持久层也是HDFS。

我得到的信息是：-

4-8 disks per node, configured without RAID (just as separate mount points).
Allocating only at most 75% of the memory for Spark.
The rest for the operating system and buffer cache.
10 Gigabit or higher network is the best way to make these applications faster.

如果有人在 Prod 上使用 Spark，请分享您的知识。

感谢每台机器至少有 8-16 个内核。

有人可以帮我解决这个问题。

标签： apache-sparkapache-spark-sql

apache-spark - 生产中 Spark 的系统要求

问题描述

解决方案

推荐阅读