apache-spark - 优化高负载和 CPU 利用率的 EMR 作业
问题描述
我想优化 emr 工作。我检查了 Ganglia 报告(附件),它的 cpu 利用率很高。任何人都可以推荐如何使用各种机制进行优化
火花参数:
conf.set("spark.pyspark.python","python3"),
conf.set("spark.executor.memory","18G")
conf.set("spark.driver.memory","18G")
conf.set("spark.executor.cores","5")
conf.set("spark.num.executors","209")
conf.set("spark.driver.maxResultSize","2G")
conf.set("spark.yarn.executor.memoryOverhead","2G")
conf.set("spark.yarn.driver.memoryOverhead","2G")
conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer")
conf.set("spark.memory.storageFraction","0.30")
conf.set("spark.yarn.scheduler.reporterThread.maxFailures","5")
conf.set("spark.storage.level","MEMORY_AND_DISK_SER")
conf.set("spark.rdd.compress","true")
conf.set("spark.shuffle.compress","true")
conf.set("spark.shuffle.spill.compress","true")
conf.set("spark.default.parallelism","2100")
conf.set("spark.sql.shuffle.partitions","2100")
解决方案
推荐阅读
- ffmpeg - 通过 FFMPEG 管道使文件持续时间松散
- hive - 根据 Hive 中的大小进行过滤
- php - 使用 codeigniter 和 ajax 发送电子邮件,电子邮件已发送但出现错误消息
- objective-c - UICollectionViewFlowlayout 行为奇怪
- c++ - What's causing my prime numbers to all vanish? [C++]
- java - 提交 Spring 表单时无法点击控制器
- c# - WPF Webview's IsPrivateNetworkClientServerCapabilityEnabled ignored when using ItemTemplate
- netbeans - JMS ActiveMQ and Netbeans
- c# - 签名证书的指纹
- c# - Unity3d - 在 ios 构建上加载视频播放器的外部 url 时出错