首页 > 解决方案 > 参数太多

问题描述

我有一个具有单个入口点的应用程序,它是一个用于自动化一些数据工程师工作的库。

case class DeltaContextConfig(
  primaryKey: List[String],
  columnToOrder: String,
  filesCountFirstBatch: Int,
  destinationPath: String,
  sparkDf: DataFrame,
  sparkContext: SparkSession,
  operationType: String,
  partitionColumn: Option[String] = None,
  tableName: String,
  databaseName: String,
  autoCompaction: Option[Boolean] = Option(true),
  idealFileSize: Option[Int] = Option(128),
  deduplicationColumn: Option[String] = None,
  compactionIntervalTime: Option[Int] = Option(180),
  updateCondition: Option[String] = None,
  setExpression: Option[String] = None
)

这是我的案例类,我的单一入口点。

之后,所有这些参数都传递给其他对象,我有要在 Datalake 中写入的对象,压缩文件等等​​。而这些对象使用其中一些参数,例如,我有一个DeltaWriterConfig对象:

DeltaWriterConfig(
  sparkDf = deltaContextConfig.sparkDf,
  columnToOrder = deltaContextConfig.columnToOrder,
  destinationPath = deltaContextConfig.destinationPath,
  primaryKey = deltaContextConfig.primaryKey,
  filesCountFirstBatch = deltaContextConfig.filesCountFirstBatch,
  sparkContext = deltaContextConfig.sparkContext,
  operationType = deltaContextConfig.operationType,
  partitionColumn = deltaContextConfig.partitionColumn,
  updateCondition = deltaContextConfig.updateCondition,
  setExpression = deltaContextConfig.setExpression
)

我使用DeltaWriterConfig, 将这些参数传递给我的类DeltaWriter。我在 MAIN 上创建了所有这些配置对象,但我认为这并不好,因为我有 3 个要填充的配置对象,所以我在应用程序主上有 3 个大构造函数。

有什么模式可以解决这个问题吗?

标签: scaladesign-patterns

解决方案


我认为至少将创建另一个配置从第一个配置替换为以下的伴随对象会更好DeltaWriterConfig

case class DeltaWriterConfig(
                              sparkDf: DataFrame,
                              columnToOrder: String,
                              destinationPath: String,
                              primaryKey: List[String],
                              filesCountFirstBatch: Int,
                              sparkContext: SparkSession,
                              operationType: String,
                              partitionColumn: Option[String] = None,
                              updateCondition: Option[String] = None,
                              setExpression: Option[String] = None
                            )
case object DeltaWriterConfig {
  def from(deltaContextConfig: DeltaContextConfig): DeltaWriterConfig =
    DeltaWriterConfig(
      sparkDf = deltaContextConfig.sparkDf,
      columnToOrder = deltaContextConfig.columnToOrder,
      destinationPath = deltaContextConfig.destinationPath,
      primaryKey = deltaContextConfig.primaryKey,
      filesCountFirstBatch = deltaContextConfig.filesCountFirstBatch,
      sparkContext = deltaContextConfig.sparkContext,
      operationType = deltaContextConfig.operationType,
      partitionColumn = deltaContextConfig.partitionColumn,
      updateCondition = deltaContextConfig.updateCondition,
      setExpression = deltaContextConfig.setExpression
    )
}

它让我们有机会在一行中创建新配置:

val deltaContextConfig: DeltaContextConfig = ???
val deltaWriterConfig = DeltaWriterConfig.from(deltaContextConfig)

但更好的解决方案是只有唯一的配置。例如,如果我们有重复的字段,DeltaContextConfig以及DeltaWriterConfig为什么我们不能只组合配置而不重复这些字段:

// instead of this DeltaContextConfig declaration
case class DeltaContextConfig(
                               tableName: String,
                               databaseName: String,
                               autoCompaction: Option[Boolean] = Option(true),
                               idealFileSize: Option[Int] = Option(128),
                               deduplicationColumn: Option[String] = None,
                               compactionIntervalTime: Option[Int] = Option(180),

                               sparkDf: DataFrame,
                               columnToOrder: String,
                               destinationPath: String,
                               primaryKey: List[String],
                               filesCountFirstBatch: Int,
                               sparkContext: SparkSession,
                               operationType: String,
                               partitionColumn: Option[String] = None,
                               updateCondition: Option[String] = None,
                               setExpression: Option[String] = None
                             )

case class DeltaWriterConfig(
                              sparkDf: DataFrame,
                              columnToOrder: String,
                              destinationPath: String,
                              primaryKey: List[String],
                              filesCountFirstBatch: Int,
                              sparkContext: SparkSession,
                              operationType: String,
                              partitionColumn: Option[String] = None,
                              updateCondition: Option[String] = None,
                              setExpression: Option[String] = None
                            )

我们使用这样的配置结构:

case class DeltaContextConfig(
                               tableName: String,
                               databaseName: String,
                               autoCompaction: Option[Boolean] = Option(true),
                               idealFileSize: Option[Int] = Option(128),
                               deduplicationColumn: Option[String] = None,
                               compactionIntervalTime: Option[Int] = Option(180),
                               deltaWriterConfig: DeltaWriterConfig
                             )

case class DeltaWriterConfig(
                              sparkDf: DataFrame,
                              columnToOrder: String,
                              destinationPath: String,
                              primaryKey: List[String],
                              filesCountFirstBatch: Int,
                              sparkContext: SparkSession,
                              operationType: String,
                              partitionColumn: Option[String] = None,
                              updateCondition: Option[String] = None,
                              setExpression: Option[String] = None
                            )

但请记住,您应该在配置文件中使用相同的配置结构。


推荐阅读