Bug / Issue Tracking Service
Bugzilla – Bug 6466
[SER] quotes in doctype-system parameter
Last modified: 2009-06-16 15:26:55 UTC
The serialization spec says that the value of serialization parameter doctype-system can be any sequence of Unicode charaters. However, if the string contains both a single-quote character and a double-quote character, then it is not possible to construct a doctype declaration that satisfies the XML syntax. The corresponding restriction for doctype-public was introduced as a side-effect of erratum SE.E1
I'll put forward two alternatives. In section 3 "Serialization Parameters," in the row that describes "doctype-system," after "A string of Unicode characters. This parameter may be absent," add either 1) "It is an error if doctype-system does not conform to the syntax of SystemLiteral[XML]." or 2) "It is an error if the value of doctype-system contains both an apostrophe and a quotation mark." The advantage of 1) is that it follows the model of SE.E1.[1] The problem is that it's not strictly correct - the syntax of SystemLiteral (and of PubidLiteral, in the case of SE.E1) includes the enclosing apostrophes or quotation marks, while the values of doctype-system and doctype-public do not include those delimiters. We could just ignore that issue, and go with 1) - it's not very likely to cause confusion - or we go with 2) and also alter the text added by SE.E1 to say "It is an error if the value of doctype-public contains a character that is not PubidChar[XML]." I'm inclined to take the latter route. [1] http://www.w3.org/XML/2007/qt-errata/xslt-xquery-serialization-errata.html#E1
>we go with 2) and also alter the text added by SE.E1 to say "It is an error if the value of doctype-public contains a character that is not PubidChar[XML]." I'm inclined to take the latter route. I agree.
At its teleconference of 2009-02-05, the XSL WG considered this bug report. The WG approved the substantive changes proposed by the second alternative of the last paragraph of comment 1. To reiterate: In section 3 "Serialization Parameters," in the row that describes "doctype-system," after "A string of Unicode characters. This parameter may be absent," add "It is an error if the value of doctype-system contains both an apostrophe and a quotation mark." And alter the text added by SE.E1 to say "It is an error if the value of doctype-public contains a character that is not a PubidChar[XML]." XQuery WG consideration of the bug is still pending.
At the joint teleconference of 2009-02-10, the XQuery WG concurred with the decision of the XSL WG. This will be erratum SE.E10.
After the changes to the Serialization 1.0 recommendation were accepted, I noted that the second paragraph of section 3 already states, "It is a serialization error [err:SEPM0016] if a parameter value is invalid for the given parameter," so I decided to make an editorial change to restate the descriptions of the doctype-system and doctype-public parameters in the positive, saying instead what values are permitted: For doctype-public, "A string of PubidCharXML characters. This parameter may be absent." For doctype-system, "A string of Unicode characters that does not include both an apostrophe (#x27) and a quotation mark (#x22) character. This parameter may be absent." I trust this change will be acceptable.
Published in "Errata for XSLT 2.0 and XQuery 1.0 Serialization"[1] and PER draft of "XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)."[2] [1] http://www.w3.org/XML/2007/qt-errata/xslt-xquery-serialization-errata.html [2] http://www.w3.org/TR/2009/PER-xslt-xquery-serialization-20090421/