java - Java: How to read the last chars of a file
问题描述
How could I read the last few chars of a file with the most disk efficiency?
解决方案
这是从 UTF-8 编码的文本文件中读取最后 N 个字符的方法。
/**
* Reads last {@code length} characters from UTF-8 encoded text file.
* <p>
* The returned string may be shorter than requested if file is too
* short, if the leading character is a half surrogate-pair, or if
* file has invalid UTF-8 byte sequences.
*
* @param fileName Name of text file to read.
* @param length Length of string to return.
* @return String with up to {@code length} characters.
* @throws IOException if an I/O error occurs.
*/
public static String readLastChars(String fileName, int length) throws IOException {
// A char can only store characters in the Basic Multilingual Plane, which are
// encoded using up to 3 bytes each. A character from a Supplemental Plane is
// encoded using 4 bytes, and is stored in Java as a surrogate pair, ie. 2 chars.
// Worst case (assuming valid UTF-8) is that file ends with a 4-byte sequence
// followed by length-1 3-byte sequences, so we need to read that many bytes.
byte[] buf;
try (RandomAccessFile file = new RandomAccessFile(fileName, "r")) {
int bytesToRead = length * 3 + 1;
buf = new byte[bytesToRead <= file.length() ? bytesToRead : (int) file.length()];
file.seek(file.length() - buf.length);
file.readFully(buf);
}
// Scan bytes backwards past 'length' characters
int start = buf.length;
for (int i = 0; i < length && start > 0; i++) {
if (buf[--start] < 0) { // not ASCII
// Locate start of UTF-8 byte sequence (at most 4 bytes)
int minStart = (start > 3 ? start - 3 : 0);
while (start > minStart && (buf[start] & 0xC0) == 0x80)
start--; // Skip UTF-8 continuation byte
if (start == minStart)
i++; // 4-byte UTF-8 -> 2 surrogate chars
}
}
// Create string from bytes, and skip first character if too long
// (text starts with surrogate pair, assuming valid UTF-8)
String text = new String(buf, start, buf.length - start, StandardCharsets.UTF_8);
while (text.length() > length)
text = text.substring(text.offsetByCodePoints(0, 1));
return text;
}
推荐阅读
- c# - 每次单击按钮时自定义错误“on”抛出错误
- ios - 如何在 UITableView [Swift] 中设置特定的节标题
- ruby-on-rails - 如何在本地工作的 Rails 项目中从 SQLite 数据库更改为 PostgreSQL 数据库?
- java - 在生产中使用spring boot有什么缺点?
- extjs6 - 卡片布局问题
- php - 如何从日期生成唯一订单号并将其插入数据库?
- c++ - 从 cpp 执行选择查询时,DB 在什么情况下不会关闭游标?
- java - 我如何将语音替换为每个编辑文本
- angular - 如何在角度单元测试期间修复未定义的“无法读取属性订阅”?
- c# - 从 Godaddy 托管的网站发送电子邮件时出错