XML Document Validation with an XML Schema
by Deepak Vohra09/15/2004
An XML schema defines the structure of the elements and attributes in an XML document. For an XML document to be valid based on an XML schema, the XML document has to be validated against the XML schema. This tutorial explains the procedure of validating an XML document with an XML schema.
In this article, the Xerces2-j and JAXP parsers
are used to validate an XML document with an XML schema. In
Xerces2-j, schema validation is integrated with the
SAXParser and DOMParser parsers. In
JAXP, DocumentBuilder classes are used to
validate a XML document. XML schema validation is illustrated with an
XML document comprising of a catalog. This article is structured into
the following sections:
- Preliminary Setup
- Overview
- Validation of an XML Document with the Xerces2-j Parser
- Validation of an XML Document with the JAXP Parser
Preliminary Setup
To validate an XML document with the Xerces2-j parser,
the Xerces2-j classes need to be in the classpath.
The Xerces2-j parser may be obtained from the Xerces2-j page.
Extract the Xerces-J-bin.2.5.0.zip (for Windows) or
Xerces-J-bin.2.5.0.tar.gz (for Unix) files to the
installation directory of your choice.
Add <XERCES>/xerces-2_5_0/xercesImpl.jar
and <XERCES>/xerces-2_5_0/xml-apis.jar to the
classpath variable, where <XERCES>is the directory in
which Xerces2-j is installed.
To validate a XML document with the JAXP parser, its
DocumentBuilder classes need to be in the classpath. These
are provided by the Java Web Services Developer Pack, which may be
obtained from the JWSDP web site.
Extract the Java Web Services Developer Pack 1.2 (jwsdp-1.2) application
file to an installation directory. Add <JAXP>/jaxp/lib/jaxp-api.jar and
<JAXP>/jaxp/lib/endorsed/xercesImpl.jar to the classpath variable, where <JAXP> is the directory in which you installed jwsdp-1.2.
Overview
In this tutorial, an example XML document named catalog.xml, consisting of an ONJava journal catalog, is used. The xmlns:xsi attribute, xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance",
defines the XML namespace prefix, xsi. The xsi:noNamespaceSchemaLocation attribute,
xsi:noNamespaceSchemaLocation="file://c:/Schemas/catalog.xsd", defines the schema for elements in the XML document without a namespace prefix. The example XML document is shown below:
<?xml version="1.0" encoding="UTF-8"?>
<!--A OnJava Journal Catalog-->
<catalog
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation=
"file://c:/Schemas/catalog.xsd"
title="OnJava.com" publisher="O'Reilly">
<journal date="April 2004">
<article>
<title>Declarative Programming in Java</title>
<author>Narayanan Jayaratchagan</author>
</article>
</journal>
<journal date="January 2004">
<article>
<title>Data Binding with XMLBeans</title>
<author>Daniel Steinberg</author>
</article>
</journal>
</catalog>
The example XML document is validated with an example XML schema file,
catalog.xsd. The elements in this schema document are in
the XML schema namespace of http://www.w3.org/2001/XMLSchema. The catalog.xsd file
looks like this:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element ref="journal" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="title" type="xs:string"/>
<xs:attribute name="publisher" type="xs:string"/>
</xs:complexType>
</xs:element>
<xs:element name="journal">
<xs:complexType>
<xs:sequence>
<xs:element ref="article" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="date" type="xs:string"/>
</xs:complexType>
</xs:element>
<xs:element name="article">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element ref="author" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="author" type="xs:string"/>
</xs:schema>
In the following sections, we'll discuss validation of the example XML document, catalog.xml, with the example schema document, catalog.xsd.
Validation of an XML Document with the Xerces2-j Parser
Xerces2-j provides the DOMParser and the
SAXParser for parsing an XML document. To use SAX parsing,
import the SAXParser.
import org.apache.xerces.parsers.SAXParser;
A DefaultHandler is used as the
ErrorHandler with the parser. An ErrorHandler
registers the errors generated by the parser. Import the
DefaultHandler class.
import org.xml.sax.helpers.DefaultHandler;
To validate with a SAXParser, create a
SAXParser. The SAXParser class is a subclass
of the XMLParser class.
SAXParser parser = new SAXParser();
Set the validation feature to true to report validation
errors. If the validation feature is set to true, the XML
document should specify a XML schema or a DTD.
parser.setFeature("http://xml.org/sax/features/validation",
true);
Set the validation/schema feature to true to report
validation errors against a schema.
parser.setFeature("http://apache.org/xml/features/validation/schema",
true);
Set the validation/schema-full-checking feature to true to
enable full schema, grammar-constraint checking.
parser.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
true);
Specify a validation schema for the parser with the
schema/external-noNamespaceSchemaLocation or the
schema/external-schemaLocation property. The
schema/external-schemaLocation property is used to specify
a schema with a namespace. A schema list may be specified with the
schema/external-schemaLocation property. The
schema/external-noNamespaceSchemaLocation property is
used to specify a schema that does not have a namespace. A parser is
not required to locate a schema specified with the
schema/external-noNamespaceSchemaLocation and
schema/external-schemaLocation properties. For our purposes,
a schema without a namespace is used to validate an XML document.
parser.setProperty(
"http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
SchemaUrl);
Create a class that extends the DefaultHandler class.
private class Validator extends DefaultHandler {
public boolean validationError = false;
public SAXParseException saxParseException = null;
public void error(SAXParseException exception)
throws SAXException {
validationError = true;
saxParseException = exception;
}
public void fatalError(SAXParseException exception)
throws SAXException {
validationError = true;
saxParseException=exception;
}
public void warning(SAXParseException exception)
throws SAXException { }
}
The DefaultHandler class implements the
ErrorHandler interface, and is used to specify an
ErrorHandler for the Xerces parser. Instantiating the
above Validator class allows us to register it as an
ErrorHandler with the parser.
Validator handler = new Validator();
parser.setErrorHandler(handler);
Since Validator implements ErrorHandler,
you can use it to parse the example XML document. The parse methods
parse(java.lang.String systemId) and
parse(org.xml.sax.InputSource inputSource) may be used for
parsing an XML document.
parser.parse(XmlDocumentUrl);
The errors generated by the parser get registered with the
ErrorHandler and are retrieved from the
ErrorHandler. The example program, SchemaValidator.java
(see Resources below), is used to validate the example XML document, catalog.xml, with the example XML schema, catalog.xsd.
String variables such as SchemaUrl and XmlDocumentUrl
are specified as file URLs. For example:
SchemaUrl: file://c:/schema/catalog.xsd
XmlDocumentUrl: file://c:/catalog/catalog.xml.
Validation of an XML Document with the JAXP Parser
Another way to validate an XML document is with a JAXP
DocumentBuilder. To begin, import the
DocumentBuilderFactory and DocumentBuilder classes.
The DocumentBuilder class is used to obtain a
org.w3c.dom.Document document from an XML document, while the
DocumentBuilderFactory class is used to obtain a
DocumentBuilder parser.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
To validate with a DocumentBuilder parser, set the System property
javax.xml.parsers.DocumentBuilderFactory:
System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
"org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");
Next, you need to create a DocumentBuilderFactory.
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
An instance of DocumentBuilderFactory is found by applying
the following rules and taking the first one that succeeds:
- Use the
javax.xml.parsers.DocumentBuilderFactorysystem property. - Use the properties file lib/jaxp.properties in the JRE directory.
- Use the META-INF/services/javax.xml.parsers.DocumentBuilderFactory file with the Services API.
- Use the Platform default
DocumentBuilderFactoryinstance.
To parse a XML document with a namespace, set the
setNamespaceAware() feature to true. By
default, the setNamespaceAware() feature is set to
false.
factory.setNamespaceAware(true);
Set the setValidating() feature of the
DocumentBuilderFactory to true to make the parser a validating parser.
By default, the setValidating() feature is set
to false.
factory.setValidating(true);
Set the schemaLanguage and schemaSource
attributes of the DocumentBuilderFactory. The
schemaLanguage attribute specifies the schema language for
validation. The schemaSource attribute specifies the XML
schema document to be used for validation.
factory.setAttribute(
"http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
factory.setAttribute(
"http://java.sun.com/xml/jaxp/properties/schemaSource",
SchemaUrl);
Create a DocumentBuilder parser.
DocumentBuilder builder = factory.newDocumentBuilder();
This returns a new DocumentBuilder, with the parameters configured in
the DocumentBuilderFactory. Create
and register an ErrorHandler with the parser.
Validator handler=new Validator();
builder.setErrorHandler(handler);
Validator is a class that extends the
DefaultHandler class. The DefaultHandler class
implements the ErrorHandler interface. The
Validator class is listed in the previous section. Parse
the XML document with the DocumentBuilder parser. The
different parse methods are parse(InputStream is),
parse(File f), parse(InputSource is),
parse(InputStream is,String systemId),
and parse(String uri).
builder.parse(XmlDocumentUrl);
Validator, an ErrorHandler of the type
DefaultHandler, registers errors generated by the
validation. The example program, JAXPValidator.java
(see Resources below), is used to validate
the example XML document, catalog.xml, with the example XML schema,
catalog.xsd, using the JAXP parser.
Conclusion
For an XML document to be based on an XML schema,
the XML document is required to be validated with the schema. This
tutorial explained the validation of an example XML document with an
example XML schema with a Xerces2-j parser, and the
JAXP DocumentBuilder parser.
Resources
- Sample code for this article.
- Xerces2-j
- Java Web Services Developer Pack
Deepak Vohra is a NuBean consultant and a web developer.
Return to ONJava.com.
-
org.xml.sax.SAXParseException: Document root element is missing
2008-09-07 22:56:30 Iruu [View]
-
Digester with xerces
2007-10-01 00:48:22 rayon [View]
-
Digester with xerces
2007-10-01 10:07:37 Deepak Vohra | [View]
-
Digester with xerces
2007-10-01 00:48:11 rayon [View]
-
what about "external-schemaLocation"?
2007-05-04 03:22:02 gneidisch [View]
-
what about "external-schemaLocation"?
2007-05-04 05:39:53 Deepak Vohra | [View]
-
what about "external-schemaLocation"?
2007-05-04 05:37:09 Deepak Vohra | [View]
-
Error coming
2007-01-09 14:15:25 javanew [View]
-
Error coming
2007-01-09 14:57:47 Deepak Vohra | [View]
-
Error coming
2007-01-09 15:19:41 javanew [View]
-
Error coming
2007-01-09 15:08:32 javanew [View]
-
Error coming
2007-01-09 15:13:51 Deepak Vohra | [View]
-
Error coming
2007-01-09 15:13:04 Deepak Vohra | [View]
-
Error coming
2007-01-09 15:26:03 javanew [View]
-
Error coming
2007-01-09 15:06:15 javanew [View]
-
Error while validating with a DTD.
2006-08-03 13:51:13 Salli [View]
-
Error while validating with a DTD.
2006-08-03 14:00:43 Salli [View]
-
Error while validating with a DTD.
2006-08-05 13:00:10 Deepak Vohra | [View]
-
Handler exception
2006-06-14 02:35:57 nnc [View]
-
Handler exception
2006-06-14 07:23:45 Deepak Vohra | [View]
-
Handler exception
2006-06-14 21:44:49 nnc [View]
-
Validating more than one schema
2006-05-21 08:51:36 iragoler [View]
-
Validating more than one schema
2006-05-21 09:24:48 Deepak Vohra | [View]
-
Validating more than one schema
2006-05-21 09:21:07 Deepak Vohra | [View]
-
sample code quality - bad
2006-04-24 06:23:48 gninneh [View]
-
sample code quality - bad
2006-04-24 07:09:04 Deepak Vohra | [View]
-
sample code quality - bad
2006-04-26 03:26:17 gninneh [View]
-
sample code quality - bad
2006-04-26 08:02:02 Deepak Vohra | [View]
-
sample code quality - bad
2006-05-26 06:45:06 gninneh [View]
-
sample code quality - bad
2006-05-27 16:06:51 Deepak Vohra | [View]
-
sample code quality - bad
2006-05-27 16:01:56 Deepak Vohra | [View]
-
Validation with javax.xml.validation package
2006-04-07 14:21:10 Deepak Vohra | [View]
-
Validation with javax.xml.validation package
2006-04-07 14:23:11 Deepak Vohra | [View]
-
Null Pointer exception while running
2005-12-18 20:53:08 NehaK [View]
-
Null Pointer exception while running
2005-12-19 08:32:20 Deepak Vohra | [View]
-
Null Pointer exception while running
2005-12-17 03:21:25 NehaK [View]
-
Null Pointer exception while running
2005-12-17 14:23:44 Deepak Vohra | [View]
-
Exceptions when run
2005-12-17 03:18:16 NehaK [View]
-
Exceptions when run
2005-12-17 03:18:10 NehaK [View]
-
Exceptions when run
2005-12-17 03:17:52 NehaK [View]
-
Validation does not work with Xerces or JAXP
2005-12-16 09:38:44 sathishforxml [View]
-
Validation does not work with Xerces or JAXP
2005-12-16 10:51:15 Deepak Vohra | [View]
-
Excellent Article
2005-11-17 17:18:06 girish_bhatia@hotmail.com [View]
-
Exceptions when run
2005-08-22 03:29:26 khylo [View]
-
Exceptions when run
2005-08-22 03:48:37 khylo [View]
-
schema check failed
2005-08-01 09:34:57 Coke [View]
-
schema check failed
2005-08-13 14:28:21 Deepak Vohra | [View]
-
Parsing XML Schema
2005-06-16 00:34:46 AbhijeetSun [View]
-
Parsing XML Schema
2005-06-16 05:07:02 Deepak Vohra | [View]
-
IOException
2005-03-25 06:14:52 Deepak Vohra | [View]
-
I only get Errors :-/
2005-03-25 04:47:37 Jonas123 [View]
-
I only get Errors :-/
2005-03-25 06:30:09 Deepak Vohra | [View]
-
I only get Errors :-/
2005-03-25 06:17:36 Deepak Vohra | [View]
-
I only get Errors :-/
2005-03-25 10:27:31 Jonas123 [View]
-
I only get Errors :-/
2006-11-21 05:10:08 srivas2 [View]
-
I only get Errors :-/
2005-03-25 10:38:35 Deepak Vohra | [View]
-
I only get Errors :-/
2005-03-25 10:48:58 Deepak Vohra | [View]
-
Not able to get it to work :(
2005-04-20 00:19:36 java_kid [View]
-
Not able to get it to work :(
2005-04-20 14:10:39 Deepak Vohra | [View]
-
Not able to get it to work :(
2005-04-20 14:09:52 Deepak Vohra | [View]
-
Not able to get it to work :(
2005-04-20 04:44:13 Deepak Vohra | [View]
-
Better validation : xs:restriction & xs:pattern
2004-09-23 01:31:00 http://www.r0main.com [View]
-
Better validation : xs:restriction & xs:pattern
2004-09-23 06:03:59 Deepak Vohra | [View]