首页 > 解决方案 > 设置后更改 InputStream 字符集

问题描述

如果一串数据包含具有不同编码的字符,是否有办法在创建输入流后更改字符集编码或建议如何实现?

帮助解释的示例:

// data need to read first 4 characters using UTF-8 and next 4 characters using ISO-8859-2?
String data = "testўёѧẅ"
// use default charset of platform, could pass in a charset 
try (InputStream in = new ByteArrayInputStream(data.getBytes())) {
    // probably an input stream reader to use char instead of byte would be clearer but hopefully the idea comes across
    byte[] bytes = new byte[4]; 
    while (in.read(bytes) != -1) {
        // TODO: change the charset here to UTF-8 then read values

        // TODO: change the charset here to ISO-8859-2 then read values
    }
}

一直在寻找解码器,可能是要走的路:

尝试使用相同的输入流:

String data = "testўёѧẅ";
    InputStream inputStream = new ByteArrayInputStream(data.getBytes());
    Reader r = new InputStreamReader(inputStream, "UTF-8");
    int intch;
    int count = 0;
    while ((intch = r.read()) != -1) {
        System.out.println((char)ch);
        if ((++count) == 4) {
            r = new InputStreamReader(inputStream, Charset.forName("ISO-8859-2"));
        }
    }

//输出测试而不是第二部分

标签: javacharacter-encoding

解决方案


假设您知道您的流中会有nUTF-8 字符和mISO 8859-2 字符(在您的示例中为 n=4,m=4),您可以通过使用两个不同InputStreamReader的 s 来实现InputStream

try (InputStream in = new ByteArrayInputStream(data.getBytes())) {
    InputStreamReader inUtf8 = new InputStreamReader(in, StandardCharsets.UTF_8);
    InputStreamReader inIso88592 = new InputStreamReader(in, Charset.forName("ISO-8859-2"));


    // read `n` characters using inUtf8, then read `m` characters using inIso88592
}

请注意,您需要读取字符而不是字节(即检查到目前为止读取了多少个字符,因为在 UTF-8 中,单个字符可能被编码为 1-4 个字节)。


推荐阅读