首页 > 解决方案 > 为什么使用 spark-submit 将 4000 张图片加载到 redis 需要的时间(9 分钟)比将相同的图片加载到 HBase(2.5 分钟)要长?

问题描述

将图像加载到 Redis 应该比使用 Hbase 执行相同的操作要快得多,因为 Redis 处理 RAM,而 HBase 使用 HDFS 存储数据。当我将 4000 张图片加载到 Redis 时,我很惊讶,花了 9 分钟才完成!虽然我使用 HBase 完成了相同的过程,但只用了 2.5 分钟。对此有解释吗?有什么建议可以改进我的代码吗?这是我的代码:

// The code for loading the images into Hbase (adopted from NIST)
val conf = new SparkConf().setAppName("Fingerprint.LoadData") 
val sc = new SparkContext(conf) 
Image.dropHBaseTable() Image.createHBaseTable() 
val checksum_path = args(0) 
println("Reading paths from: %s".format(checksum_path.toString)) 
val imagepaths = loadImageList(checksum_path) println("Got %s images".format(imagepaths.length))
imagepaths.foreach(println) 
println("Reading files into RDD") 
val images = sc.parallelize(imagepaths).map(paths => Image.fromFiles(paths._1, paths._2)) 
println(s"Saving ${images.count} images to HBase")
Image.toHBase(images) 
println("Done")

} val conf = new SparkConf().setAppName("Fingerprint.LoadData") val sc = new SparkContext(conf) Image.dropHBaseTable() Image.createHBaseTable() val checksum_path = args(0) println("Reading paths from: %s".format(checksum_path.toString)) val imagepaths = loadImageList(checksum_path) println("Got %s images".format(imagepaths.length)) imagepaths.foreach(println) println("Reading files into RDD") val images = sc.parallelize(imagepaths) .map(paths => Image.fromFiles(paths._1, paths._2)) println(s"Saving ${images.count} images to HBase") Image.toHBase(images) println("Done")

} def toHBase(rdd: RDD[T]): Unit = {

     val cfg = HBaseConfiguration.create()
     cfg.set(TableOutputFormat.OUTPUT_TABLE, tableName)
     val job = Job.getInstance(cfg)
     job.setOutputFormatClass(classOf[TableOutputFormat[String]])
     rdd.map(Put).saveAsNewAPIHadoopDataset(job.getConfiguration)

} 

//加载图片到Redis的代码

  val images = sc.parallelize(imagepaths).map(paths => Image.fromFiles(paths._1, paths._2)).collect
        for(i <- images){
val stringRdd = sc.parallelize(Seq((i.uuid, new String(i.Png, StandardCharsets.UTF_8))))
        sc.toRedisKV(stringRdd)(redisConfig)
        stringRdd.collect}                    
        println("Done")

标签: redishbasespark-submit

解决方案


推荐阅读