首页 > 解决方案 > Byte-Order-Mark in file name when writing to xml

问题描述

I have a method, where a .txt file is parsed with Scanner, reassembled with DocumentBuilder, and transformed into an .xml file with TransformerFactory.

Everything works fine, with the exception of a little inconvenience: The file that is created that way contains what I asume to be a BOM at the beginning of its name. I'm encoding in UTF-8.

It's saved under %EF%BB%BFexample.xml instead of example.xml.

How can I avoid that?

EDIT: As you can see in the comments, I was pointed to the possibility, that the first line fileTitle which is read by Scanner from userText probably contains the BOM for UTF-8, what turned out to be true (again, see comments).

private void writeXML() {
    try {
        File userText = new File(passedPath);

        Scanner scn = new Scanner(new FileInputStream(userText), "UTF-8");

        String separate = ";";
        String fileTitle = scn.nextLine();
        int indSepTitle = fileTitle.indexOf(separate);
        fileTitle = fileTitle.substring(0,indSepTitle);

        String fileOutputName = fileTitle+".xml";
        File mOutFile = new File(getFilesDir(), fileOutputName);

        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder = docFactory.newDocumentBuilder();

        //root element
        Document doc = docBuilder.newDocument();
        Element rootElement = doc.createElement("Collection");
        doc.appendChild(rootElement);

        //List element
        Element listElement = doc.createElement("List");
        rootElement.appendChild(listElement);

        //set Attributes to listElement
        Attr attr = doc.createAttribute("name");
        attr.setValue(fileTitle);
        listElement.setAttributeNode(attr);

        while(scn.hasNext()) {
            String line = scn.nextLine();
            String[] parts = line.split(separate);

            //vocabulary element
            Element ringElement = doc.createElement("element_ring");
            listElement.appendChild(n1Element);

            //add 1st Element
            Element n1Element = doc.createElement("element1");
            natWord.appendChild(doc.createTextNode(parts[0]));
            ringElement.appendChild(n1Element);

            //add 2ndElement
            Element n2Element = doc.createElement("element2");
            forWord.appendChild(doc.createTextNode(parts[1]));
            ringElement.appendChild(n2Element);

            ...
            //add other Elements accordingly
            ...
            }

        //write the content into xml file
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
        DOMSource source = new DOMSource(doc);
        StreamResult result = new StreamResult(mOutFile);

        transformer.transform(source, result);


    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    }
    catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (TransformerConfigurationException e) {
        e.printStackTrace();
    } catch (TransformerException e) {
        e.printStackTrace();
    }

}

标签: androidxmlbyte-order-mark

解决方案


为了完成:

我包含以下短代码以从字符串中删除 BOM,该字符串被提取用作正在创建的 .xml 文件的标题名称。

char[] titleChars = fileTitle.toCharArray();

        String cutTitle = "";
        for(int i=1; i<titleChars.length;i++){
            cutTitle = cutTitle+titleChars[i];
        }

String fileOutputName = cutTitle+".xml";

推荐阅读