首页 > 解决方案 > scala sbt libraryDependencies 提供 - 避免下载 3rd 方库

问题描述

我有以下引用 3rd 方库的 Spark Scala 代码,

package com.protegrity.spark

import org.apache.spark.sql.api.java.UDF2
import com.protegrity.spark.udf.ptyProtectStr
import com.protegrity.spark.udf.ptyProtectInt

class ptyProtectStr extends UDF2[String, String, String] {
  
  def call(input: String, dataElement: String): String = {
    return ptyProtectStr(input, dataElement);
  }
}

class ptyUnprotectStr extends UDF2[String, String, String] {
  
  def call(input: String, dataElement: String): String = {
    return ptyUnprotectStr(input, dataElement);
  }
}

class ptyProtectInt extends UDF2[Integer, String, Integer] {
  
  def call(input: Integer, dataElement: String): Integer = {
    return ptyProtectInt(input, dataElement);
  }
}

class ptyUnprotectInt extends UDF2[Integer, String, Integer] {
       
       def call(input: Integer, dataElement: String): Integer = {
                     return ptyUnprotectInt(input, dataElement);
       }
}

我想使用 SBT 创建 JAR 文件。我的 build.sbt 如下所示,

name := "Protegrity UDF"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
    "com.protegrity.spark" % "udf" % "2.3.2" % "provided",
    "org.apache.spark" %% "spark-core" % "2.3.2" % "provided",
    "org.apache.spark" %% "spark-sql" % "2.3.2" % "provided"
)

如您所见,我尝试使用“提供”选项创建一个精简的 JAR 文件,因为我的 Spark 环境已经包含这些库。

尽管使用了“提供”,sbt 还是试图从 maven 下载并抛出错误,

[warn]  Note: Unresolved dependencies path:
[error] sbt.librarymanagement.ResolveException: Error downloading com.protegrity.spark:udf:2.3.2
[error]   Not found
[error]   Not found
[error]   not found: C:\Users\user1\.ivy2\local\com.protegrity.spark\udf\2.3.2\ivys\ivy.xml
[error]   not found: https://repo1.maven.org/maven2/com/protegrity/spark/udf/2.3.2/udf-2.3.2.pom
[error]         at lmcoursier.CoursierDependencyResolution.unresolvedWarningOrThrow(CoursierDependencyResolution.scala:249)
[error]         at lmcoursier.CoursierDependencyResolution.$anonfun$update$35(CoursierDependencyResolution.scala:218)
[error]         at scala.util.Either$LeftProjection.map(Either.scala:573)
[error]         at lmcoursier.CoursierDependencyResolution.update(CoursierDependencyResolution.scala:218)
[error]         at sbt.librarymanagement.DependencyResolution.update(DependencyResolution.scala:60)
[error]         at sbt.internal.LibraryManagement$.resolve$1(LibraryManagement.scala:52)
[error]         at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$12(LibraryManagement.scala:102)
[error]         at sbt.util.Tracked$.$anonfun$lastOutput$1(Tracked.scala:69)
[error]         at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$20(LibraryManagement.scala:115)
[error]         at scala.util.control.Exception$Catch.apply(Exception.scala:228)
[error]         at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$11(LibraryManagement.scala:115)
[error]         at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$11$adapted(LibraryManagement.scala:96)
[error]         at sbt.util.Tracked$.$anonfun$inputChanged$1(Tracked.scala:150)
[error]         at sbt.internal.LibraryManagement$.cachedUpdate(LibraryManagement.scala:129)
[error]         at sbt.Classpaths$.$anonfun$updateTask0$5(Defaults.scala:2950)
[error]         at scala.Function1.$anonfun$compose$1(Function1.scala:49)
[error]         at sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:62)
[error]         at sbt.std.Transform$$anon$4.work(Transform.scala:67)
[error]         at sbt.Execute.$anonfun$submit$2(Execute.scala:281)
[error]         at sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:19)
[error]         at sbt.Execute.work(Execute.scala:290)
[error]         at sbt.Execute.$anonfun$submit$1(Execute.scala:281)
[error]         at sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:178)
[error]         at sbt.CompletionService$$anon$2.call(CompletionService.scala:37)
[error]         at java.util.concurrent.FutureTask.run(Unknown Source)
[error]         at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
[error]         at java.util.concurrent.FutureTask.run(Unknown Source)
[error]         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[error]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[error]         at java.lang.Thread.run(Unknown Source)
[error] (update) sbt.librarymanagement.ResolveException: Error downloading com.protegrity.spark:udf:2.3.2
[error]   Not found
[error]   Not found
[error]   not found: C:\Users\user1\.ivy2\local\com.protegrity.spark\udf\2.3.2\ivys\ivy.xml
[error]   not found: https://repo1.maven.org/maven2/com/protegrity/spark/udf/2.3.2/udf-2.3.2.pom

为了跳过“com.protegrity.spark”的 maven 下载,我应该对 build.sbt 进行哪些更改?有趣的是,我在同一个版本中没有遇到“org.apache.spark”这个问题

标签: scalaapache-sparksbt

解决方案


假设您在编译代码的任何地方都有可用的 JAR 文件(但不是通过 Maven 或其他工件存储库),只需将 JAR 放在(默认情况下)lib项目中的目录中(路径可以通过unmanagedBase设置更改build.sbt如果您出于某种原因需要这样做)。

请注意,这将导致非托管 JAR 包含在程序集 JAR 中。如果您想构建一个排除非托管 JAR 的“稍微少一点”的 JAR,则必须将其过滤掉。实现这一目标的一种方法是

assemblyExcludedJars in assembly := {
  val cp = (fullClasspath in assembly).value
  cp.filter(_.data.getName == "name-of-unmanaged.jar")
}

如果您手边没有 JAR(或者可能非常接近 JAR),您希望编译器如何对 JAR 中的调用进行类型检查?


推荐阅读