http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Charter
Release Info

Installation
Download
Bug-Reporting

FAQs
Samples
API JavaDoc

Features
Properties

XNI Manual
XML Schema
SAX
DOM
Limitations

Source Repository
User Mail Archive
Dev Mail Archive

Questions
 

Answers
 
How do I handle errors?
 

You should register an error handler with the parser by supplying a class which implements the org.xml.sax.ErrorHandler interface. This is true regardless of whether your parser is a DOM based or SAX based parser.

You can register an error handler on a DocumentBuilder created using JAXP like this:

import javax.xml.parsers.DocumentBuilder;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

ErrorHandler handler = new ErrorHandler() {
    public void warning(SAXParseException e) throws SAXException {
        System.err.println("[warning] "+e.getMessage());
    }
    public void error(SAXParseException e) throws SAXException {
        System.err.println("[error] "+e.getMessage());
    }
    public void fatalError(SAXParseException e) throws SAXException {
        System.err.println("[fatal error] "+e.getMessage());
    throw e;
    }
};

DocumentBuilder builder = /* builder instance */;
builder.setErrorHandler(handler);

If you are using DOM Level 3 you can register an error handler with the DOMBuilder by supplying a class which implements the org.w3c.dom.DOMErrorHandler interface. For more information, refer to FAQ

You can also register an error handler on a SAXParser using JAXP like this:

import javax.xml.parsers.SAXParser;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

ErrorHandler handler = new ErrorHandler() {
    public void warning(SAXParseException e) throws SAXException {
        System.err.println("[warning] "+e.getMessage());
    }
    public void error(SAXParseException e) throws SAXException {
        System.err.println("[error] "+e.getMessage());
    }
    public void fatalError(SAXParseException e) throws SAXException {
        System.err.println("[fatal error] "+e.getMessage());
    throw e;
    }
};

SAXParser parser = /* parser instance */;
parser.getXMLReader().setErrorHandler(handler);

Why does "non-validating" not mean "well-formedness checking" only?
 

Using a "non-validating" parser does not mean that only well-formedness checking is done! There are still many things that the XML specification requires of the parser, including entity substitution, defaulting of attribute values, and attribute normalization.

This table describes what "non-validating" really means for Xerces-J parsers. In this table, "no DTD" means no internal or external DTD subset is present.

  non-validating parsers  validating parsers 
  DTD present  no DTD  DTD present  no DTD 
DTD is read  Yes  No  Yes  Error 
entity substitution  Yes  No  Yes  Error 
defaulting of attributes  Yes  No  Yes  Error 
attribute normalization  Yes  No  Yes  Error 
checking against model  No  No  Yes  Error 

How do I more efficiently parse several documents sharing a common DTD?
 

By default, the parser does not cache DTD's. The common DTD, since it is specified in each XML document, will be re-parsed once for each document.

However, there are things that you can do to make the process of reading DTD's more efficient:

  • First, have a look at the grammar caching/preparsing FAQ
  • keep your DTD and DTD references local
  • use internal DTD subsets, if possible
  • load files from server to local client before parsing
  • Cache document files into a local client cache. You should do an HTTP header request to check whether the document has changed, before accessing it over the network.
  • Do not reference an external DTD or internal DTD subset at all. In this case, no DTD will be read.
  • Use a custom EntityResolver and keep common DTDs in a memory buffer.

How can I parse documents in a pull-parsing fashion?
 

Since the pull-parsing API is specific to Xerces, you have to use a Xerces-specific method to create parsers, and parse documents.

First, you need to create a parser configuration that implements the XMLPullParserConfiguration interface. Then you need to create a parser from this configuration. To parse documents, method parse(boolean) should be called.

import org.apache.xerces.parsers.XIncludeAwareParserConfiguration;
import org.apache.xerces.parsers.SAXParser;
import org.apache.xerces.xni.parser.XMLInputSource;

  ...

boolean continueParse = true;
void pullParse(XMLInputSource input) throws Exception {
    XIncludeAwareParserConfiguration config = new XIncludeAwareParserConfiguration();
    SAXParser parser = new SAXParser(config);
    config.setInputSource(input);
    parser.reset();
    while (continueParse) {
        continueParse = continueParse && config.parse(false);
    }
}

In the above example, a SAXParser is used to pull-parse an XMLInputSource. DOMParser can be used in a similar way. A flag continueParse is used to indicate whether to continue parsing the document. The application can stop the parsing by setting this flag to false.


I would like to know more about the kind of entity my XMLEntityResolver's been asked to resolve. How can I go about convincing Xerces to tell me more?
 

XNI only guarantees that you'll receive an XMLResourceIdentifier object during an XMLEntityResolver#resolveEntity callback. Nonetheless, the xni.grammars package has a number of interfaces which extend XMLResourceIdentifier that can provide considerably more information.

To take advantage of this, you'll first need to see whether the object you've been passed is an instance of the org.apache.xerces.xni.grammars.XMLGrammarDescription interface. This interface contains a method called getGrammarType which can tell you what kind of grammar is involved (for the moment, XML Schema and DTD's are all that's supported). Once you know the type of grammar, you can cast once again to either org.apache.xerces.xni.grammars.XMLDTDDescription or org.apache.xerces.xni.grammars.XMLSchemaDescription which contain a wealth of information specific to these types of grammars. The javadocs for these interfaces should provide sufficient information for you to know what their various methods return.




Copyright © 1999-2022 The Apache Software Foundation. All Rights Reserved.