java - 当我达到空字符串时如何继续处理文件
问题描述
我正在尝试读取包含 DNA 序列的文件。在我的程序中,我想读取长度为 4 的 DNA 的每个子序列,并将其存储在我的哈希图中以计算每个子序列的出现次数。例如,如果我有序列CCACACCACACCCACACACCCAC
,并且我想要 的每个子序列,length 4
前 3 个子序列将是:
CCAC, CACA, ACAC
等。
所以为了做到这一点,我必须多次迭代字符串,这是我的实现
try
{
String file = sc.nextLine();
BufferedReader reader = new BufferedReader(new FileReader(file + ".fasta"));
Map<String, Integer> frequency = new HashMap<>();
String line = reader.readLine();
while(line != null)
{
System.out.println("Processing Line: " + line);
String [] kmer = line.split("");
for(String nucleotide : kmer)
{
System.out.print(nucleotide);
int sequence = nucleotide.length();
for(int i = 0; i < sequence; i++)
{
String subsequence = nucleotide.substring(i, i+5);
if(frequency.containsKey(subsequence))
{
frequency.put(subsequence, frequency.get(subsequence) +1);
}
else
{
frequency.put(subsequence, 1);
}
}
}
System.out.println();
line = reader.readLine();
}
System.out.println(frequency);
}
catch(StringIndexOutOfBoundsException e)
{
System.out.println();
}
到达字符串末尾时出现问题,由于错误,它不会继续处理。我将如何解决这个问题?
解决方案
根据您帖子的标题...尝试更改您的while循环的条件。而不是使用当前:
String line = reader.readLine();
while(line != null) {
// ...... your code .....
}
使用此代码:
String line;
while((line = reader.readLine()) != null) {
// If file line is blank then skip to next file line.
if (line.trim().equals("")) {
continue;
}
// ...... your code .....
}
这将涵盖处理空白文件行。
现在关于您遇到的StringIndexOutOfBoundsException异常。我相信现在你已经基本知道为什么你会收到这个异常,因此你需要决定你想做什么。如果要将字符串拆分为特定长度的块,并且如果特定文件行字符,则该长度不能与总长度等分,那么显然有几个选项可用:
- 忽略文件行末尾的剩余字符。虽然是一个简单的解决方案,但它不是很可行,因为它会产生不完整的数据。我对DNA一无所知,但我敢肯定这不是要走的路。
- 将剩余的 DNA 序列(即使它很短)添加到Map中。再说一次,我对 DNA 一无所知,我不确定这是否不是一个可行的解决方案。也许是这样,我根本不知道。
- 将剩余的短 DNA 序列添加到下一个传入文件行的开头,并继续将该行分成 4 个字符块。继续这样做,直到到达文件末尾,如果最终的 DNA 序列被确定为短,则将其添加到Map(或不添加)。
当然可能还有其他选择,无论它们是什么,您都需要决定。但是,为了帮助您,这里是涵盖我提到的三个选项的代码:
忽略其余字符:
Map<String, Integer> frequency = new HashMap<>();
String subsequence;
String line;
try (BufferedReader reader = new BufferedReader(new FileReader("DNA.txt"))) {
while ((line = reader.readLine()) != null) {
// If file line is blank then skip to next file line.
if (line.trim().equals("")) {
continue;
}
for (int i = 0; i < line.length(); i += 4) {
// Get out of loop - Don't want to deal with remaining Chars
if ((i + 4) > (line.length() - 1)) {
break;
}
subsequence = line.substring(i, i + 4);
if (frequency.containsKey(subsequence)) {
frequency.put(subsequence, frequency.get(subsequence) + 1);
}
else {
frequency.put(subsequence, 1);
}
}
}
}
catch (IOException ex) {
ex.printStackTrace();
}
将剩余的 DNA 序列(即使它很短)添加到地图中:
Map<String, Integer> frequency = new HashMap<>();
String subsequence;
String line;
try (BufferedReader reader = new BufferedReader(new FileReader("DNA.txt"))) {
while ((line = reader.readLine()) != null) {
// If file line is blank then skip to next file line.
if (line.trim().equals("")) {
continue;
}
String lineRemaining = "";
for (int i = 0; i < line.length(); i += 4) {
// Get out of loop - Don't want to deal with remaining Chars
if ((i + 4) > (line.length() - 1)) {
lineRemaining = line.substring(i);
break;
}
subsequence = line.substring(i, i + 4);
if (frequency.containsKey(subsequence)) {
frequency.put(subsequence, frequency.get(subsequence) + 1);
}
else {
frequency.put(subsequence, 1);
}
}
if (lineRemaining.length() > 0) {
subsequence = lineRemaining;
if (frequency.containsKey(subsequence)) {
frequency.put(subsequence, frequency.get(subsequence) + 1);
}
else {
frequency.put(subsequence, 1);
}
}
}
}
catch (IOException ex) {
ex.printStackTrace();
}
将剩余的短 DNA 序列添加到下一个传入文件行的开头:
Map<String, Integer> frequency = new HashMap<>();
String lineRemaining = "";
String subsequence;
String line;
try (BufferedReader reader = new BufferedReader(new FileReader("DNA.txt"))) {
while ((line = reader.readLine()) != null) {
// If file line is blank then skip to next file line.
if (line.trim().equals("")) {
continue;
}
// Add remaining portion of last line to new line.
if (lineRemaining.length() > 0) {
line = lineRemaining + line;
lineRemaining = "";
}
for (int i = 0; i < line.length(); i += 4) {
// Get out of loop - Don't want to deal with remaining Chars
if ((i + 4) > (line.length() - 1)) {
lineRemaining = line.substring(i);
break;
}
subsequence = line.substring(i, i + 4);
if (frequency.containsKey(subsequence)) {
frequency.put(subsequence, frequency.get(subsequence) + 1);
}
else {
frequency.put(subsequence, 1);
}
}
}
// If any Chars remaining at end of file then
// add to MAP
if (lineRemaining.length() > 0) {
frequency.put(lineRemaining, 1);
}
}
catch (IOException ex) {
ex.printStackTrace();
}
推荐阅读
- geometry - 将 SkipGeographyChecks 设置为 true
- reactjs - 重载时路由器重复
- python - 如何跳过特定列中的第一行和之后的所有空行?
- ios - 在Objective C中使用NSRegularExpression从字符串中获取键和值
- spring-jms - 如何修复消息转换器错过理解杰克逊解析的对象类型
- python - 如何将基于列值的文件基础与条件基础合并
- javascript - 突发问题使用 node_module 运行 gulp 构建 - 意外令牌 {
- python - 转换器参数取决于另一个列值
- python - 如何使用 __init_subclass__ 而不是 ABCMeta 强制子类实现父类的抽象方法?
- java - Maven资源过滤损坏资源