首页 > 解决方案 > 当我达到空字符串时如何继续处理文件

问题描述

我正在尝试读取包含 DNA 序列的文件。在我的程序中,我想读取长度为 4 的 DNA 的每个子序列,并将其存储在我的哈希图中以计算每个子序列的出现次数。例如,如果我有序列CCACACCACACCCACACACCCAC,并且我想要 的每个子序列,length 4前 3 个子序列将是:
CCAC, CACA, ACAC等。
所以为了做到这一点,我必须多次迭代字符串,这是我的实现

try
    {
        String file = sc.nextLine();
        BufferedReader reader = new BufferedReader(new FileReader(file + ".fasta")); 

        Map<String, Integer> frequency = new HashMap<>(); 

        String line = reader.readLine();

        while(line != null)
        {
            System.out.println("Processing Line: " + line);
            String [] kmer = line.split("");

            for(String nucleotide : kmer)
            {
                System.out.print(nucleotide);
                int sequence = nucleotide.length(); 
                for(int i = 0; i < sequence; i++)
                {
                    String subsequence = nucleotide.substring(i, i+5); 
                    if(frequency.containsKey(subsequence))
                    {
                        frequency.put(subsequence, frequency.get(subsequence) +1);
                    }
                    else
                    {
                        frequency.put(subsequence, 1);
                    }
                }
            }
            System.out.println();
            line = reader.readLine();
        }
        System.out.println(frequency);            
    }
    catch(StringIndexOutOfBoundsException e)
    {
        System.out.println();
    }

到达字符串末尾时出现问题,由于错误,它不会继续处理。我将如何解决这个问题?

标签: javastringhashmap

解决方案


根据您帖子的标题...尝试更改您的while循环的条件。而不是使用当前:

String line = reader.readLine();
while(line != null) {
    // ...... your code .....
}

使用此代码:

String line;
while((line = reader.readLine()) != null) {
    // If file line is blank then skip to next file line.
    if (line.trim().equals("")) {
        continue;
    }
    // ...... your code .....
}

这将涵盖处理空白文件行。

现在关于您遇到的StringIndexOutOfBoundsException异常。我相信现在你已经基本知道为什么你会收到这个异常,因此你需要决定你想做什么。如果要将字符串拆分为特定长度的块,并且如果特定文件行字符,则该长度不能与总长度等分,那么显然有几个选项可用:

  • 忽略文件行末尾的剩余字符。虽然是一个简单的解决方案,但它不是很可行,因为它会产生不完整的数据。我对DNA一无所知,但我敢肯定这不是要走的路。
  • 将剩余的 DNA 序列(即使它很短)添加到Map中。再说一次,我对 DNA 一无所知,我不确定这是否不是一个可行的解决方案。也许是这样,我根本不知道。
  • 将剩余的短 DNA 序列添加到下一个传入文件行的开头,并继续将该行分成 4 个字符块。继续这样做,直到到达文件末尾,如果最终的 DNA 序列被确定为短,则将其添加到Map(或不添加)。

当然可能还有其他选择,无论它们是什么,都需要决定。但是,为了帮助您,这里是涵盖我提到的三个选项的代码:

忽略其余字符:

Map<String, Integer> frequency = new HashMap<>();
String subsequence;
String line;
try (BufferedReader reader = new BufferedReader(new FileReader("DNA.txt"))) {
    while ((line = reader.readLine()) != null) {
        // If file line is blank then skip to next file line.
        if (line.trim().equals("")) {
            continue;
        }

        for (int i = 0; i < line.length(); i += 4) {
            // Get out of loop - Don't want to deal with remaining Chars
            if ((i + 4) > (line.length() - 1)) {
                   break;
            }

            subsequence = line.substring(i, i + 4);
            if (frequency.containsKey(subsequence)) {
                frequency.put(subsequence, frequency.get(subsequence) + 1);
            }
            else {
                frequency.put(subsequence, 1);
            }
        }
    }
}
catch (IOException ex) {
    ex.printStackTrace();
}

将剩余的 DNA 序列(即使它很短)添加到地图中:

Map<String, Integer> frequency = new HashMap<>();
String subsequence;
String line;
try (BufferedReader reader = new BufferedReader(new FileReader("DNA.txt"))) {
    while ((line = reader.readLine()) != null) {
        // If file line is blank then skip to next file line.
        if (line.trim().equals("")) {
            continue;
        }

        String lineRemaining = "";

        for (int i = 0; i < line.length(); i += 4) {
            // Get out of loop - Don't want to deal with remaining Chars
            if ((i + 4) > (line.length() - 1)) {
                lineRemaining = line.substring(i);
                break;
            }

            subsequence = line.substring(i, i + 4);
            if (frequency.containsKey(subsequence)) {
                frequency.put(subsequence, frequency.get(subsequence) + 1);
            }
            else {
                frequency.put(subsequence, 1);
            }
        }
        if (lineRemaining.length() > 0) {
            subsequence = lineRemaining;
            if (frequency.containsKey(subsequence)) {
                frequency.put(subsequence, frequency.get(subsequence) + 1);
            }
            else {
                frequency.put(subsequence, 1);
            }
        }
    }
}
catch (IOException ex) {
    ex.printStackTrace();
}

将剩余的短 DNA 序列添加到下一个传入文件行的开头:

Map<String, Integer> frequency = new HashMap<>();
String lineRemaining = "";
String subsequence;
String line;
try (BufferedReader reader = new BufferedReader(new FileReader("DNA.txt"))) {
    while ((line = reader.readLine()) != null) {
        // If file line is blank then skip to next file line.
        if (line.trim().equals("")) {
            continue;
        }
        // Add remaining portion of last line to new line.
        if (lineRemaining.length() > 0) {
            line = lineRemaining + line;
            lineRemaining = "";
        }

        for (int i = 0; i < line.length(); i += 4) {
            // Get out of loop - Don't want to deal with remaining Chars
            if ((i + 4) > (line.length() - 1)) {
                lineRemaining = line.substring(i);
                break;
            }

            subsequence = line.substring(i, i + 4);
            if (frequency.containsKey(subsequence)) {
                frequency.put(subsequence, frequency.get(subsequence) + 1);
            }
            else {
                frequency.put(subsequence, 1);
            }
        }
    }
    // If any Chars remaining at end of file then
    // add to MAP
    if (lineRemaining.length() > 0) {
        frequency.put(lineRemaining, 1);
    }
}
catch (IOException ex) {
    ex.printStackTrace();
}

推荐阅读