The presentation of this document has been augmented to identify changes from a previous version. Three kinds of changes are highlighted: new, added text, changed text, and deleted text.


W3C

XML processor profiles

Editor's Draft 26 September 2013

This version:
http://www.w3.org/XML/XProc/docs/xml-proc-profiles.html
Latest version:
http://www.w3.org/TR/xml-proc-profiles/
Previous versions:
http://www.w3.org/TR/2011/WD-xml-proc-profiles-20120124/ http://www.w3.org/TR/2011/WD-xml-proc-profiles-20110412/
Editors:
Henry S. Thompson, University of Edinburgh <ht@inf.ed.ac.uk>
Norman Walsh, MarkLogic Corporation <norman.walsh@marklogic.com>
James Fuller, Webcomposite S.R.O <jim.fuller@webcomposite.com>

This document is also available in these non-normative formats: XML and Version with differences from January WD highlighted.


Abstract

This specification defines several XML processor profiles, each of which defines how any given XML document should be processed, both operationally and in terms of what information must be made available to applications. It is intended as a resource for other specifications, which can by a single normative reference establish precisely what input processing they require as well as what information they require.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a public Working Draft for review by W3C members and other interested parties. This document is a product of the XML Processing Model Working Group which is part of the W3C XML Activity. The English version of this specification is the only normative version. However, for translations of this document, see http://www.w3.org/2003/03/Translations/byTechnology?technology=xproc.

This is a Last Call Working Draft for review by W3C members and other interested parties. It contains one significant addition to previous drafts: a discussion of validation, as well as extensive editorial changes made in response to reviewers comments on our previous draft. Once again it is the Working Group's intention, since this specification does not require new implementations, as many existing XML processors implement one or more of the profiles defined below, that no Candidate Recommendation version will be published, and that the next step for this specification will be to Proposed Recommendation—interested parties please take note and comment accordingly.

The effective deadline for comments is 29 February 2012. Please send comments on this draft to the public mailing list public-xml-processing-model-comments@w3.org (public archives are available).

As this specification is intended for use by other specifications which themselves define one or more XML languages, the Working Group particularly welcomes input for other Working Groups who are responsible for such specifications.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
    1.1 Background
    1.2 Terminology
2 XML processor profiles
    2.1 The basic XML processor profile
    2.2 The id XML processor profile
    2.3 The external declarations XML processor profile
    2.4 The full XML processor profile
3 Classes of Information
4 Relations and Invariants
    4.1 Information invariants within a given profile
    4.2 Information variation between profiles
        4.2.1 Between basic and richer profiles
        4.2.2 Between id and richer profiles
        4.2.3 Between external declarations and full profiles
5 Other profiles (non-normative)
6 Conformance
7 Validation (Non-normative)
    7.1 Specifying validation

Appendix

A References
    A.1 Normative References
    A.2 Non-normative References


1 Introduction

Few specifications are implemented in their entirety, in exactly the same way, by every implementor. Many specifications contain optional features or areas of acknowledged variation and some implementors choose to ignore required features that aren't needed by the community they serve, chosing to trade conformance for other benefits.

In the case of XML, there are not only optional features in the XML Recommendation itself, but there are a whole family of additional specifications which an implementor may choose to support or ignore. In principle, there are an enormous number of possible variations. In practice, there are dependencies between the specifications that limit the number of possible variations and implementors aren't motivated to implement completely arbitrary selections.

The [XML Information Set] gave the community a vocabulary for discussing the information items passed by an XML processor to an application. This specification gives the community a vocabulary for describing common sets of higher level features by describing profiles, collecting specific sets of features drawn from the family of specifications, and providing names for them.

One goal of this work is to help establish a lower bound on the number and nature of features supported. The ability to communicate by sending XML documents back and forth is predicated on the notion that we have the same understanding of those documents. While we might wish for the richest possible understanding, that's not likely to be supported by the widest range of implementations. Establishing a few basic profiles, we hope, provides a foundation on which other specifications can build.

1.1 Background

The XML specification [Extensible Markup Language (XML) 1.0 (Fifth Edition)] defines an XML processor as "a software module…used to read XML documents and provide access to their content and structure…on behalf of another module, called the application." XML applications are often defined in terms of operations on instances of XML data models such as [XML Path Language (XPath) Version 1.0] or [XQuery 1.0 and XPath 2.0 Data Model (XDM)], or on information identified by terms in the [XML Information Set] vocabulary. Such definitions have suffered to some extent from an uncertainty inherent in using that kind of foundation, in that the kind of processing which XML processors carry out on XML documents, as well as the amount of information they provide to applications as a result, is flexible to a certain extent. Some of this flexibility stems from the XML specification itself, which is not always explicit about what information must be passed from processor to application, and which also leaves open the possiblity of reading and interpreting external entities, or not. Another kind of flexibility has arisen from the growth of the XML family of specifications: if the input document includes uses of XInclude, for instance, the XML processor may or may not perform the indicated inclusions.

This specification addresses this issue by defining several XML processor profiles, each of which defines how any given XML document should be processed, both operationally and in terms of what information must be made available to applications. It is intended as a resource for other specifications, which can by a single normative reference establish precisely what input processing they require as well as what information they require.

The profiles presented here are designed for use with respect to static outcomes, that is, to the result of XML processing as (if) produced by a batch process. They do not attempt to address the question of the preservation or lack thereof of information itself, or of information invariants, in the course of incremental construction or in the face of piecemeal modification.

The profiles defined here are appropriate for processing both XML 1.0 [Extensible Markup Language (XML) 1.0 (Fifth Edition)] and XML 1.1 [Extensible Markup Language (XML) 1.1 (Second Edition)] documents. References to XML or XML Namespaces below should be understood as references to 1.0 or 1.1 as required by the relevant document or application.

1.2 Terminology

[Definition: The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC 2119].]

[Definition: A base URI is an absolute URI against which relative URIs are resolved; this specification assumes that base URIs are established and used as specified in [RFC 3986].]

[Definition: The term implementation-defined indicates an aspect that may differ between implementations, but must be specified by the implementor for each particular implementation.]

[Definition: The term implementation-dependent indicates an aspect that may differ between implementations, is not specified by this or any W3C specification, and is not required to be specified by the implementor for any particular implementation.]

2 XML processor profiles

The profile definitions which follow all assume that the starting point is a well-formed and namespace well-formed XML document. This specification does not consider documents that are not namespace well-formed. Documents which are not well-formed are not XML.

Each profile is defined in terms of conformance requirements on processors with respect to various XML-family specifications, and in terms of requirements on the information they provide to applications. Information provision requirements are specified by reference to classes of information items and properties, as further defined in 3 Classes of Information.

It is the information itself which is required, not the particular packaging of it implied by the items and properties used to define those information classes. Processors typically package information in terms of more-or-less standardized data models or application program interfaces (APIs). How the information required for conformance to a particular profile defined below is conveyed by a data model or API need not correspond point-for-point to the Infoset terminology. For example, a data model may define element content as an array of strings and not as an array of characters. That does not prevent it from conforming to the requirements expressed below in terms of the [XML Information Set]'s Character Information Items, for example requirement (3) of 2.1 The basic XML processor profile.

The four profiles defined here identify four increasingly rich profiles, in terms of kinds of processing and amount of information provided to applications, starting from a profile very close to what many XML processors do already in their minimal configuration:

The precise nature of each of these profiles is described in the sections which follow.

2.1 The basic XML processor profile

To conform to the basic profile an XML processor must

  1. Process the document as required of conformant non-validating XML processors while not reading any external markup declarations;

  2. Maintain the base URI of each element in conformance with [XML Base];

  3. Accurately provide to the application the information in the document corresponding to information items and properties in classes Core, Signal, Decl and ImplDef;

Note:

Since the [Extensible Markup Language (XML) 1.0 (Fifth Edition)] specification requires validating processors to read the external subset, it follows that a processor which validates cannot be conforming to this profile, nor to the 2.2 The id XML processor profile defined below.

Note:

If an XML document which specifies standalone="no" in its XML Declaration is processed with either this profile or the 2.2 The id XML processor profile, defined below, the resulting infoset may be lacking items that the author deemed significant. This is not an error, because checking the standalone declaration is a validity constraint.

2.2 The id XML processor profile

To conform to the id profile an XML processor must

  1. Process the document as required of conformant non-validating XML processors while not reading any external markup declarations;

  2. Maintain the base URI of each element in conformance with [XML Base];

  3. Perform ID type assignment for all xml:id attributes as required by [xml:id Version 1.0] by reporting their attribute type Infoset property as ID to the application;

  4. Accurately provide to the application the information in the document corresponding to information items and properties in classes Core, Signal, Decl and ImplDef.

Note:

This profile, like the 2.1 The basic XML processor profile, reads only declarations in the internal subset, this means that types, such as ID, that appear in declarations in the internal subset will be processed while such declarations in the external subset will not.

2.3 The external declarations XML processor profile

To conform to the external declarations profile an XML processor must

  1. Process the document as required of conformant non-validating XML processors while reading and processing all external markup declarations (as specified in the discussion of non-validating processors in the XML specification);

  2. Maintain the base URI of each element in conformance with [XML Base];

  3. Perform ID type assignment for all xml:id attributes as required by [xml:id Version 1.0] by reporting their attribute type Infoset property as ID to the application;

  4. Accurately provide to the application the information in the document corresponding to information items and properties in classes Core, Extended and ImplDef;

Note:

Conformance to this profile, or to the 2.4 The full XML processor profile defined below, neither requires nor excludes validation. They leave it open to specifications which cite them to forbid, allow or require validation.

A non-validating processor (see 5.1 Validating and Non-Validating Processors in [Extensible Markup Language (XML) 1.0 (Fifth Edition)]) conformant to this profile gives the complete infoset of a well-formed XML document. In the absence of well-formedness and validity errors, a validating processor using this profile gives the complete infoset of a valid XML document.

2.4 The full XML processor profile

To conform to the full profile an XML processor must

  1. Process the document as required of conformant non-validating XML processors while reading and processing all external markup declarations (as specified in the discussion of non-validating processors in the XML specification);

  2. Maintain the base URI of each element in conformance with [XML Base];

  3. Perform ID type assignment for all xml:id attributes as required by [xml:id Version 1.0] by reporting their attribute type Infoset property as ID to the application;

  4. Recursively replace all include elements in the XInclude namespace, and carry out namespace, xml:base and xml:lang fixup of the result, as required for conformance to [XML Inclusions (XInclude) Version 1.0 (Second Edition)];

  5. Accurately provide to the application the information in the document corresponding to information items and properties in classes Core, Extended and ImplDef.

The following [XProc: An XML Pipeline Language] pipeline implements the 2.4 The full XML processor profile when executed by a conformant XProc processor which

  • Processes its input as required by point 1 above;

  • Recognizes and reports the ID type of all xml:id attributes in conformance with [xml:id Version 1.0].

3 Classes of Information

For the profile definitions above and the invariants below, we categorize the information expressed in XML documents, which may be made available to applications, into a number of (overlapping) classes. What follows is a complete tabulation of all the information items and their properties from [XML Information Set], annotated with one or more class labels.

Note:

The glosses which follow immediately below here are explanatory: the actual class definitions are given in the subsequent table.

Class Core

Items and properties which are fundamental for all XML applications and so must be provided by all profiles.

Class Extended

Items and properties which depend on declarations and so must be provided by 2.3 The external declarations XML processor profile and 2.4 The full XML processor profile only. These items and properties may be absent if the 2.1 The basic XML processor profile or 2.2 The id XML processor profile are used.

Class Signal

Items and properties which only are relevant when entity declarations are not available and so must be provided by 2.1 The basic XML processor profile and 2.2 The id XML processor profile only.

Class Decl

Items and properties which depend on declarations. For 2.1 The basic XML processor profile and 2.2 The id XML processor profile, they will not be provided if the relevant declaration is in an unprocessed external entity, or is after the first reference to an external entity which is not processed.

Class Validated

Items and properties which will be present for validating processors, but for which support by non-validating processors is implementation-defined. Non-validating processors must document whether they provide this information to applications or not.

Class ImplDef

Items and properties for which support is implementation-defined. Processors must document whether they provide this information to applications or not.

The tabulation which follows defines the information classes by enumerating their membership in terms of information items and their properties—each class contains all and only those items and properties against which its name appears below.

Document Information Item
the item itselfCore
[children]ImplDef
[document element]Core
[notations]Extended, Decl
[unparsed entities]Extended, Decl
[base URI]Core
[character encoding scheme]Core
[standalone]Core
[version]Core
[all declarations processed]Core
Element Information Item
the item itselfCore
[namespace name]Core
[local name]Core
[prefix]Core
[children]Core
[attributes]Core
[namespace attributes]Core
[in-scope namespaces]Core
[base URI]Core
[parent]Core
Attribute Information Item
the item itselfCore
[namespace name]Core
[local name]Core
[prefix]Core
[normalized value]Extended, Decl
[specified]Core
[attribute type]Extended, Decl
[references] to Element Information Items, i.e. for attributes of types IDREF and IDREFSExtended, Decl
[references] to Notation and Unparsed Entity Information Items, i.e. for attributes of types ENTITY, ENTITIES and NOTATIONImplDef
[owner element]Core
Processing Instruction Information Item
the item itselfCore
[target]Core
[content]Core
[base URI]Core
[notation]ImplDef
[parent]Core
Unexpanded Entity Reference Information Item
the item itselfSignal
all propertiesSignal
Character Information Item
the item itselfCore
[character code]Core
[element content whitespace]Validated
[parent]Core
Comment Information Item
the item itselfCore
[content]Core
[parent]Core
Document Type Declaration Information Item
the item itselfImplDef
all propertiesImplDef
Unparsed Entity Information Item
the item itselfExtended, Decl
all propertiesExtended, Decl
Notation Information Item
the item itselfExtended, Decl
all propertiesExtended, Decl
Namespace Information Item
the item itselfCore
[prefix]Core
[namespace name]Core

4 Relations and Invariants

Whenever a document is processed in conformance with one of the profiles defined above, the information made available to applications will be guaranteed to have certain properties. The relation between the profiles and information classes defined above is summarized in the illustration below (PNG,SVG), then the sub-sections which follow describe this in terms of invariants with respect to the information made available.

Venn diagram of profiles and classes

Note: in an effort to maintain consistent relationships in the diagram, the label for the inner-most circle, around “Full Profile”, has been omitted. It should be read as if it was labeled “Perform XInclude processing”.

4.2 Information variation between profiles

In comparing two cases when a given namespace-well-formed XML document is processed in conformance with two different profiles, the information made available will in some cases (depending on the specifics of the document in question) differ with repect to the following information items and properties (leaving aside the items and properties classified as implementation-defined above):

5 Other profiles (non-normative)

The profiles defined here can be used as a starting point for the definition of further profiles. For example, the media type registrations for stylesheet languages applicable to XML such as application/xslt+xml or text/css might define a profile specifying appropriate <?xml-stylesheet type="[their media type]"…?> processing in addition to the processing required by 2.2 The id XML processor profile.

6 Conformance

Conformance to this specification means conformance by XML processors to profiles, as specified in 2 XML processor profiles.

Which profile or profiles an XML processor conforms to may depend on how it is configured. The conformance conditions for any specific processor configuration with respect to each profile are specified in the corresponding sub-section of 2 XML processor profiles.

Accordingly, any specification which references this one normatively is recommended to do so in terms such as "Conforming implementations must process XML documents and make information available as required by the id XML processor profile."

7 Validation (Non-normative)

Specifying desired information outcomes is not sufficient to completely determine XML processor behaviour. In particular, if validation is performed and errors detected, the result may be no outcome at all.

A range of schema languages and approaches to validation exist. Some may provide for additional information items and/or properties which are not addressed by this specification. Also, the validation-dependent [element content whitespace] property of Character Information Items may only be reliably provided in conjunction with some approaches to validation, specifically DTD validation.

Furthermore, not all of the profiles defined above can be combined with all forms of validation: in particular, DTD validation requires that all external markup declarations be read and processed, and so cannot be required in conjunction with 2.1 The basic XML processor profile or 2.2 The id XML processor profile.

Accordingly, specifications referencing this one should also specify whether validation is forbidden, optional or required, with respect to which schema language(s) with what validation control settings, if any. If the 2.4 The full XML processor profile is involved, careful consideration is required as to whether validation is to happen before XInclude processing, or after, or both.

7.1 Specifying validation

To enable a degree of consistency in this area, specifications are recommended to use language matching the following ABNF when stating conformance requirements in terms of this specification (with whitespace added whereever necessary):

Invoking a profile
[1]   ProfileRef   ::=   "Conforming implementations must process XML documents and make information available as required by the" ( noValidProf | extProf | fullProf )
[2]   noValidProf   ::=   "basic XML processor profile" | "id XML processor profile"
[3]   extProf   ::=   "external declarations XML processor profile" ( "validating with" vClauses | "with no validation of any kind" )?
[4]   fullProf   ::=   "full XML processor profile" ( "validating with" ( vClausesAfter | vClausesBefore ( ", and then" vClausesAfter )? | "with no validation of any kind" )?
[5]   vClausesBefore   ::=   vClausesX "before XInclude"
[6]   vClausesAfter   ::=   vClausesX "after XInclude"
[7]   vClausesX   ::=    vClause ( "and then" vClauses ", all" )?
[8]   vClauses   ::=   vClause ( "and then" vClauses )*
[9]   vClause   ::=   ( "DTD" | "XML Schema" | "Relax NG" | "Schematron" | /* other validators ad lib. */ ) ( /* validator-appropriate qualification, parameters, etc. */ )?
Editorial note: HST2013-09-25
This doesn't provide any place for banning only some validation. There are (at least) two ways to address this: 1) add "all and only" in some place(s); 2) make the BNF much more complex to allow a combination of "with" and "without" sub-clauses.

Some examples of the kind of language that results:

Conforming implementations must process XML documents and make information available as required by the

A References

A.1 Normative References

XML Information Set
XML Information Set, World Wide Web Consortium. Most recent edition (the second) is dated 04 Feb 2004, John Cowan and Richard Tobin, Editors. The latest version is available at http://www.w3.org/TR/xml-infoset/.
RFC 2119
RFC 2119: Key words for use in RFCs to Indicate Requirement Levels. Internet Engineering Task Force, 1997.
RFC 3986
RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force, 2005.
XProc: An XML Pipeline Language
XProc: An XML Pipeline Language, Norman Walsh, Alex Milowski, and Henry S. Thompson, Editors. World Wide Web Consortium, 9 March 2010. This version is http://www.w3.org/TR/2010/REC-xproc-20100511/. The latest version is available at http://www.w3.org/TR/xproc/.
XPath 2.0
XML Path Language (XPath) 2.0 (Second Edition), Anders Berglund et al. Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/REC-xpath20-20101214/. The latest version is available at http://www.w3.org/TR/xpath20/.
xml:id Version 1.0
xml:id Version 1.0, Norman Walsh, Daniel Veillard, and Jonathan Marsh, Editors. World Wide Web Consortium, 09 Sep 2005. This version is http://www.w3.org/TR/2005/REC-xml-id-20050909/. The latest version is available at http://www.w3.org/TR/xml-id/.
XML Inclusions (XInclude) Version 1.0 (Second Edition)
XML Inclusions (XInclude) Version 1.0 (Second Edition), David Orchard, Jonathan Marsh, and Daniel Veillard, Editors. World Wide Web Consortium, 15 Nov 2006. This version is http://www.w3.org/TR/2006/REC-xinclude-20061115/. The latest version is available at http://www.w3.org/TR/xinclude/.
Extensible Markup Language (XML) 1.0 (Fifth Edition)
Extensible Markup Language (XML) 1.0 (Fifth Edition), Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, et. al., Editors. World Wide Web Consortium, 28 Nov 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126/. The latest version is available at http://www.w3.org/TR/xml/.
Extensible Markup Language (XML) 1.1 (Second Edition)
Extensible Markup Language (XML) 1.1 (Second Edition), Tim Bray, John Cowan, Jean Paoli, et. al., Editors. World Wide Web Consortium, 16 Aug 2006. This version is http://www.w3.org/TR/2006/REC-xml11-20060816/. The latest version is available at http://www.w3.org/TR/xml11/.
Namespaces in XML 1.0 (Third Edition)
Namespaces in XML 1.0 (Third Edition), Tim Bray, Dave Hollander, Richard Tobin, and Andrew Layman, Editors. World Wide Web Consortium, 8 Dec 2009. This version is http://www.w3.org/TR/2009/REC-xml-names-20091208/. The latest version is available at http://www.w3.org/TR/xml-names/.
Namespaces in XML 1.1 (Second Edition)
Namespaces in XML 1.1 (Second Edition), Tim Bray, Dave Hollander, Andrew Layman, and Richard Tobin, Editors. World Wide Web Consortium, 16 Aug 2006. This version is http://www.w3.org/TR/2006/REC-xml-names11-20060816/. The latest version is available at http://www.w3.org/TR/xml-names11/.
XML Base
XML Base (Second Edition), Jonathan Marsh, Editor. World Wide Web Consortium, 28 January 2009. This version is http://www.w3.org/TR/2009/REC-xmlbase-20090128/. The latest version is available at http://www.w3.org/TR/xmlbase/.

A.2 Non-normative References

XML Path Language (XPath) Version 1.0
XML Path Language (XPath) Version 1.0, James Clark and Steven DeRose, Editors. World Wide Web Consortium, 16 Nov 1999. This version is http://www.w3.org/TR/1999/REC-xpath-19991116/. The latest version is available at http://www.w3.org/TR/xpath/.
XQuery 1.0 and XPath 2.0 Data Model (XDM)
XQuery 1.0 and XPath 2.0 Data Model (XDM), Ashok Malhotra, Jonathan Marsh, Norman Walsh, et al., Editors. World Wide Web Consortium, 14 Dec 2010. This version is http://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/. The latest version is available at http://www.w3.org/TR/xpath-datamodel/.
  翻译: