首页 > 解决方案 > Apache POI Excel XLSX Streaming XML 无法正确读取多个内联字符串

问题描述

我有一个带有单个单元格的 XLSX Excel 文件。

在此处输入图像描述

使用 POI 加载时WorkbookFactory,它会作为单个单元格正确读取。

当使用 POI'sXSSFSheetXMLHandler读取时,它就像是两个单独的单元格一样读取。

工作表 XML:

<?xml version="1.0" encoding="UTF-8"?>
<x:worksheet xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
   <x:sheetData>
      <x:row>
         <x:c t="inlineStr">
            <x:is>
               <x:r>
               <x:rPr>
                  <x:rFont val="Segoe UI Emoji"/>
               </x:rPr>
                  <x:t xml:space="preserve">&#x1f61c;</x:t>
               </x:r>
               <x:r>
                  <x:t xml:space="preserve">more text</x:t>
               </x:r>
            </x:is>
         </x:c>
      </x:row>
   </x:sheetData>
   <x:pageSetup paperSize="9" orientation="portrait" />
</x:worksheet>

通常,您希望在每个单元格中看到一个文本项,但这里它分为两个块 - 一个使用与另一个不同的字体格式化。

代码:

import java.io.File;
import java.io.InputStream;
import java.text.MessageFormat;
import java.util.Iterator;

import javax.xml.parsers.SAXParserFactory;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackageAccess;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;
import org.apache.poi.xssf.usermodel.XSSFComment;
import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;

public class MultiTagTest {
    public static void main(final String[] args) throws Exception {
    final File file = new File("Minimised.xlsx");

    try (OPCPackage xlsxPackage = OPCPackage.open(file, PackageAccess.READ)) {
        final XSSFReader reader = new XSSFReader(xlsxPackage);

        final Iterator<InputStream> iter = reader.getSheetsData();

        try (InputStream stream = iter.next()) {
        final SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
        saxParserFactory.setNamespaceAware(true);

        final XMLReader sheetParser = saxParserFactory.newSAXParser().getXMLReader();

        sheetParser.setContentHandler(new XSSFSheetXMLHandler(reader.getStylesTable(),
            new ReadOnlySharedStringsTable(xlsxPackage), new SheetContentsHandler() {
                @Override
                public void startRow(final int rowNum) {
                }

                @Override
                public void endRow(final int rowNum) {
                }

                @Override
                public void cell(final String cellReference, final String formattedValue,
                    final XSSFComment comment) {
                System.out.println(MessageFormat.format(
                    "XSSFSheetXMLHandler Cell - cellReference={0}, formattedValue={1}, comment={2}",
                    cellReference, formattedValue, comment));
                }
            }, true));

        sheetParser.parse(new InputSource(stream));
        }
    }

    try (Workbook workbook = WorkbookFactory.create(file, null, true)) {
        final Row row = workbook.getSheetAt(0).getRow(0);

        for (int col = row.getFirstCellNum(); col < row.getLastCellNum(); col++) {
        System.out.println(MessageFormat.format("WorkbookFactory Cell - {0}", row.getCell(col)));
        }
    }
    }
}

输出:

XSSFSheetXMLHandler Cell - cellReference=null, formattedValue=, comment=null
XSSFSheetXMLHandler Cell - cellReference=null, formattedValue=more text, comment=null
WorkbookFactory Cell - more text

SheetContentsHandler'scell方法内部,无法判断它们是同一个单元格。

标签: javaexcelapache-poi

解决方案


推荐阅读