首页 > 解决方案 > 要解析的文件编码不正确

问题描述

我在解析文件时遇到问题。输入文件是 EE windows 1250 编码的。尝试解析时出现错误


    Exception in thread "main" java.lang.IllegalStateException: MalformedInputException reading next record: java.nio.charset.MalformedInputException: Input length = 1
        at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:145)
        at org.apache.commons.csv.CSVParser$CSVRecordIterator.hasNext(CSVParser.java:155)
        at com.test.converter.CsvConverter.processInputCSV(CsvConverter.java:148)
        at com.test.converter.CsvConverter.main(CsvConverter.java:249)
    Caused by: java.nio.charset.MalformedInputException: Input length = 1
        at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
    Caused by: java.nio.charset.MalformedInputException: Input length =

我的方法

public List<CSVRecord> collectAllEntries(Path path) throws IOException {
        List<CSVRecord> store = new ArrayList<>();
        try (
                Reader reader = Files.newBufferedReader(path);
                CSVParser csvParser = new CSVParser(reader, CSVFormat.EXCEL)
        ) {
            for (CSVRecord csvRecord : csvParser) {
                store.add(csvRecord);
            }
        } catch (IOException e) {
            e.printStackTrace();
            throw e;
        }
        return store;
    }

我该如何解决这个问题?

标签: java

解决方案


这里的问题是您正在尝试windows-1250使用UTF-8. Files.newBufferedReader(path)默认UTF-8为.

当您读取文件时,传递windows-1250文件被编码的编码方案(在这种情况下)以指示缓冲阅读器使用它,如下所示;

Files.newBufferedReader(path, Charset.forName("windows-1250"));

这是编码的良好开端 - https://www.baeldung.com/java-char-encoding


推荐阅读