首页 > 解决方案 > 获取 java.lang.OutOfMemoryError:尝试读取大 excel 文件(.xlsx)时超出 GC 开销限制

问题描述

我只是想使用 POI API 读取一个大的 excel 文件(.xlsx)文件并得到这个异常。

这是我正在使用的代码:

public static void main(String args[]) throws FileNotFoundException {
        File f = new File("E:\\Downloads\\6038_Open_AR.XLSX");
        if (f.exists()){
            System.out.println("Attempting");
            try {
                Workbook workBoo = WorkbookFactory.create(f);
                System.out.println("done");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

这是我得到的例外。

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)
    at org.apache.xerces.xni.XMLString.toString(Unknown Source)
    at org.apache.xerces.parsers.AbstractDOMParser.characters(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
    at org.apache.poi.util.DocumentHelper.readDocument(DocumentHelper.java:137)
    at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:115)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(Unknown Source)
    at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:184)
    at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:176)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.parseSheet(XSSFWorkbook.java:428)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:393)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:260)
    at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:263)
    at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:222)
    at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:201)
    at Main.main(Main.java:180)

我知道这是一个非常受欢迎的话题,我找不到任何有效的直接解决方案。

我尝试过的一些事情:

File f = new File("E:\\Downloads\\6038_Open_AR.XLSX");
File fileInputStream = new FileInputStream(f);
Workbook workBoo = new HSSFWorkbook(fileInputStream);

这给出了以下异常

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)

标签: javaexcelapache-poihssfworkbook

解决方案


无需深入了解我们在工作场所用来解决此问题的特定专有信息,我们使用 Apache POI XSSF 库一次读取一张 .XLSX 文件。我们的源文件为 25MB,有 4 个单独的工作表。

Apache POI 本质上将解压缩 XLSX,它将其大小扩展了大约 10 倍(在这种情况下最多超过 200MB)。解压缩后,您可以访问 Styles 和 SharedStrings。我们使用 XSSFSheetXMLHandler.SheetContentsHandler 然后开始单独处理每个工作表。

我们创建了一个 AbstractSheetParserService(它通过 XMLReader 读取文件),使用上面提到的 XSSFSheetXMLHandler.SheetContentsHandler 设置一个 ContentHandler,最后通过 AbstractParserHandlerService 解析工作表及其值。


推荐阅读