首页 > 解决方案 > Reading empty files with scala

问题描述

I wrote the following function in scala which reads a file into a list of strings. My aim is to make sure that if the file input is empty that the returned list is empty too. Any idea how to do this in an elegant way:

def linesFromFile(file: String): List[String] = {

  def materialize(buffer: BufferedReader): List[String] = materializeReverse(buffer, Nil).reverse

  def materializeReverse(buffer: BufferedReader, accumulator: List[String]): List[String] = {
    buffer.readLine match {
      case null => accumulator
      case line => materializeReverse(buffer, line :: accumulator)
    }
  }

  val buffer = new BufferedReader(new FileReader(file))
  materialize(buffer)
}

标签: scalareadline

解决方案


您的代码应该可以工作,但它在内存使用方面效率很低:您将整个文件读入内存,然后浪费更多内存并以正确的顺序处理这些行。

使用Source.fromFile标准库中的方法是您最好的选择(它还支持各种文件编码),如其他评论/答案中所述。

但是,如果您必须自己滚动,我认为使用 a Stream(一种惰性形式的列表)比 a 更有意义List。您可以一次返回每一行,并且可以在到达文件末尾时终止流。这可以按如下方式完成:

import java.io.{BufferedReader, FileReader}

def linesFromFile(file: String): Stream[String] = {

  // The value of buffer is available to the following helper function. No need to pass as
  // an argument.
  val buffer = new BufferedReader(new FileReader(file))

  // Helper: retrieve next line from file. Called only when next value requested.
  def materialize: Stream[String] = {

    // Uncomment to demonstrate non-recursive nature of this method.
    //println("Materialize called!")

    // Read the next line and wrap in an option. This avoids the hated null.
    Option(buffer.readLine) match {

      // If we've seen the end of the file, return an empty stream. We're done reading.
      case None => {
        buffer.close()
        Stream.empty
      }

      // Otherwise, prepend the line read to another call to this helper.
      case Some(line) => line #:: materialize
    }
  }

  // Start the process.
  materialize
}

虽然看起来materialize是递归的,但实际上它只是在需要检索另一个值时才调用,因此您无需担心堆栈溢出或递归。println您可以通过取消注释调用来验证这一点。

例如(在Scala REPL会话中):

$ scala
Welcome to Scala 2.12.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171).
Type in expressions for evaluation. Or try :help.

scala> import java.io.{BufferedReader, FileReader}
import java.io.{BufferedReader, FileReader}

scala> def linesFromFile(file: String): Stream[String] = {
     |
     |   // The value of buffer is available to the following helper function. No need to pass as
     |   // an argument.
     |   val buffer = new BufferedReader(new FileReader(file))
     |
     |   // Helper: retrieve next line from file. Called only when next value requested.
     |   def materialize: Stream[String] = {
     |
     |     // Uncomment to demonstrate non-recursive nature of this method.
     |     println("Materialize called!")
     |
     |     // Read the next line and wrap in an option. This avoids the hated null.
     |     Option(buffer.readLine) match {
     |
     |       // If we've seen the end of the file, return an empty stream. We're done reading.
     |       case None => {
     |         buffer.close()
     |         Stream.empty
     |       }
     |
     |       // Otherwise, prepend the line read to another call to this helper.
     |       case Some(line) => line #:: materialize
     |     }
     |   }
     |
     |   // Start the process.
     |   materialize
     | }
linesFromFile: (file: String)Stream[String]

scala> val stream = linesFromFile("TestFile.txt")
Materialize called!
stream: Stream[String] = Stream(Line 1, ?)

scala> stream.head
res0: String = Line 1

scala> stream.tail.head
Materialize called!
res1: String = Line 2

scala> stream.tail.head
res2: String = Line 2

scala> stream.foreach(println)
Line 1
Line 2
Materialize called!
Line 3
Materialize called!
Line 4
Materialize called!

请注意materialize,仅当我们尝试从文件中读取另一行时才调用 how 。此外,如果我们已经检索到一行,则不会调用它(例如,输出中的Line 1和仅在第一次引用时才在前面)。Line 2Materialize called!

关于空文件,在这种情况下,将返回一个空流:

scala> val empty = linesFromFile("EmptyFile.txt")
Materialize called!
empty: Stream[String] = Stream()

scala> empty.isEmpty
res3: Boolean = true

推荐阅读