首页 > 解决方案 > 如何使用 SPARK 连接到 IBM COS(云对象存储),如何解决“No FileSystem for scheme:cos”

问题描述

我正在尝试使用 Spark 创建与 IBM COS(云对象存储)的连接。Spark 版本 = 2.4.4,Scala 版本 = 2.11.12。

我使用正确的凭据在本地运行它,但我观察到以下错误 - “方案没有文件系统:cos”

我正在共享代码片段以及错误日志。有人可以帮我解决这个问题。

提前致谢 !

代码片段:

import com.ibm.ibmos2spark.CloudObjectStorage
import org.apache.spark.sql.SparkSession

object CosConnection extends App{
  var credentials = scala.collection.mutable.HashMap[String, String](
      "endPoint"->"ENDPOINT",
      "accessKey"->"ACCESSKEY",
      "secretKey"->"SECRETKEY"
  )
  var bucketName = "FOO"
  var objectname = "xyz.csv"

  var configurationName = "softlayer_cos" 

  val spark = SparkSession
    .builder()
    .appName("Connect IBM COS")
    .master("local")
    .getOrCreate()


  spark.sparkContext.hadoopConfiguration.set("fs.stocator.scheme.list", "cos")
  spark.sparkContext.hadoopConfiguration.set("fs.stocator.cos.impl", "com.ibm.stocator.fs.cos.COSAPIClient")
  spark.sparkContext.hadoopConfiguration.set("fs.stocator.cos.scheme", "cos")

  var cos = new CloudObjectStorage(spark.sparkContext, credentials, configurationName=configurationName)

  var dfData1 = spark.
    read.format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").
    option("header", "true").
    option("inferSchema", "true").
    load(cos.url(bucketName, objectname))

  dfData1.printSchema()
  dfData1.show(5,0)
}

错误:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: cos
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

标签: apache-sparkapache-spark-sqlibm-cloudpersistent-object-storecloud-object-storage

解决方案


此问题已通过使用 SPARK 版本 = 2.4.4、SCALA 版本 = 2.11.12 映射以下 stocator 依赖关系得到解决

// https://mvnrepository.com/artifact/com.ibm.stocator/stocator
libraryDependencies += "com.ibm.stocator" % "stocator" % "1.0.24"

确保stocator-1.0.24-jar-with-dependencies.jar在构建包时有外部库

还要确保您将端点传递s3.us.cloud-object-storage.appdomain.cloudhttps://s3.us.cloud-object-storage.appdomain.cloud

您可以手动构建 stocator jar 并将 jar 包含target/stocator-1.0.24-SNAPSHOT-IBM-SDK.jar到 ClassPath 中(如果需要) -

git clone https://github.com/SparkTC/stocator
cd stocator
git fetch
git checkout -b 1.0.24-ibm-sdk origin/1.0.24-ibm-sdk
mvn clean install –DskipTests

推荐阅读