首页 > 解决方案 > Java中的增量字符串解码

问题描述

假设我以块的形式接收字节,并且我想将它们有效地解码为一个字符串(显然这将是 Unicode),我还想尽快知道该字符串是否以某个序列开头。

一种方法可能是:

public boolean inputBytesMatch(InputStream inputStream, String match) throws IOException {
        byte[] buff = new byte[1024];
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        int len;
        while ((len = inputStream.read(buff)) > 0){
            byteArrayOutputStream.write(buff, 0, len);
            String decoded = new String(byteArrayOutputStream.toByteArray(), Charset.defaultCharset());
            if (decoded.startsWith(match)){
                return true;
            }
        }
        return false;
    }

但这涉及每次有一个新块时从 byteArrayOutputStream 分配一个新数组,并且 String 将在构造函数中进行另一个复制。这一切在我看来效率很低。此外,字符串每次都会对构造函数中的字节进行解码,从头开始。

我怎样才能使这个过程更快?

标签: java

解决方案


其实你根本不需要ByteArrayOutputStream

首先使用您想要的编码将您的String match变成一个byte[]。然后只需将每个传入的块与该数组的下一部分进行比较:

public boolean inputBytesMatch(InputStream inputStream, String match) throws IOException {
    byte[] compare = match.getBytes(Charset.defaultCharset());
    int n = compare.length;

    int compareAt = 0;
    byte[] buff = new byte[n];

    int len;
    while (compareAt < n && (len = inputStream.read(buff, 0, n-compareAt)) > 0) {
        for (int i=0; i < len && compareAt < n; i++, compareAt++) {
            if (compare[compareAt] != buff[i]) {
                // found contradicting byte
                return false;
            }
        }
    }

    // No byte was found which contradicts that the streamed data begins with compare.
    // Did we actually read enough bytes?
    return compareAt >= n;
}

您可能会发现此版本更具可读性:

public boolean inputBytesMatch(InputStream inputStream, String match) throws IOException {
    byte[] compare = match.getBytes(Charset.defaultCharset());
    int n = compare.length;

    int compareAt = 0;
    byte[] buff = new byte[n];

    int len;
    while (compareAt < n && (len = inputStream.read(buff, 0, n-compareAt)) > 0) {
        if (!isSubArray(compare, compareAt, buff, len)) {
            return false;
        }
        compareAt += len;
    }

    return compareAt >= n;
}

private boolean isSubArray(byte[] searchIn, int searchInOffset, byte[] searchFor, int searchForLength)
{
    if (searchInOffset + searchForLength >= searchIn.length) {
        // can not match
        return false;
    }

    for (int i=0; i < searchForLength; i++) {
        if (searchIn[searchInOffset+i] != searchFor[i]) {
            return false;
        }
    }

    return true;
}

推荐阅读