首页 > 解决方案 > 使用 SAX 解析器解析大型 Excel 文件的第一列

问题描述

只想解析大型 excel 文件的第一列并通过连接然后使用逗号 (,) 将数据存储到字符串中,这里我使用 Apache POI 库和 SAX 解析器通过转换为 XML 来解析 excel 文件。由于 XML 文件有两个相同的元素,即“行”内的“单元格”,因为 Excel 文件中有两列。如果有人有想法,请分享。

   public void processFirstSheet(String filename) throws Exception{

        OPCPackage pkg = OPCPackage.open(filename);
        XSSFReader r = new XSSFReader( pkg );
        SharedStringsTable sst = r.getSharedStringsTable();
        XMLReader parser = fetchSheetParser(sst);
        InputStream sheet1 = r.getSheet("rId1");
        InputSource sheetSource = new InputSource(sheet1);
        parser.parse(sheetSource);
        sheet1.close();
   }

   public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException, 
   ParserConfigurationException {
    XMLReader parser = SAXHelper.newXMLReader();
    ContentHandler handler = new SheetHandler(sst);
    parser.setContentHandler(handler);
    return parser;
}

private static class SheetHandler extends DefaultHandler{
     private SharedStringsTable sst;
     private String lastContents;
     private boolean nextIsString;
     private static int count=1;
        
     private SheetHandler(SharedStringsTable sst) {
            
            this.sst = sst;
     }
        
     public void startElement(String uri, String localName, String name,
                Attributes attributes) throws SAXException {
        
    // c => cell
    if(name.equals("c")) {
     // Print the cell reference
     System.out.print(attributes.getValue("r") + " - ");
     // Figure out if the value is an index in the SST
     String cellType = attributes.getValue("t");
     if(cellType != null && cellType.equals("s")) {
        nextIsString = true;
      } else {
         nextIsString = false;
      }
     }
        
        // Clear contents cache
        lastContents = "";
            }
        
        public void endElement(String uri, String localName, String name)
                throws SAXException {
            
            // Process the last contents as required.
            // Do now, as characters() may be called more than once
            if(nextIsString) {
                int idx = Integer.parseInt(lastContents);
                lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
                nextIsString = false;
            }
            // v => contents of a cell
            // Output after we've seen the string contents
            if(name.equals("v")) {
                System.out.println(lastContents);
            }
        }
        public void characters(char[] ch, int start, int length) {
        
            lastContents += new String(ch, start, length);
        }
        
        }

标签: javaexcelxml-parsingapache-poisaxparser

解决方案


推荐阅读