Bug 5223 - [XPath] Casting rules in 3.5.2 General Comparisons (editorial)
: [XPath] Casting rules in 3.5.2 General Comparisons (editorial)
Status: CLOSED FIXED
Product: XPath / XQuery / XSLT
XPath 2.0
: Recommendation
: PC Windows XP
: P2 normal
: ---
Assigned To: Don Chamberlin
: Mailing list for public feedback on specs from XSL and XML Query WGs
:
:
:
:
:
  Show dependency treegraph
 
Reported: 2007-10-23 20:59 UTC by Hans-Juergen Rennau
Modified: 2008-02-26 20:24 UTC (History)
0 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hans-Juergen Rennau 2007-10-23 20:59:34 UTC
The casting rules for xs:untypedAtomic as described in items a-c depend on
whether the situation that "the other operand is an instance of T" is given or
not when the other operand's dynamic type is a subtype of T. 

Consider these example:
ex 1:    <a>1</a> = xs:Name("A1")
ex 2:    <a>A1 </a> = xs:Name("A1")

If the right operand is "an instance of xs:string", ex1 and ex2 yield false.
Otherwise, ex1 yields a cast error, and ex2 yields true.

Testing 3 major processors, I found both behaviours!

I believe "is an instance of T" is meant in accordance with the instance
operator, that is, to include all subtypes of T. However, the term is nowhere
defined, and, interestingly, not used when explaining the semantics of the
instance operator. Therefore perhaps it would be worthwhile to add a note to
b), making it clear that the situation "other operand is an instance of a
subtype of xs:string" is included.

With kind regards,
Hans-Juergen Rennau
Comment 1 Michael Kay 2007-10-23 21:19:32 UTC
I think "A is an instance of T" definitely includes the case where A is an
instance of a subtype of T. I haven't looked to see whether we say that clearly
anywhere, but it is undoubtedly the intent.

The interesting thing about your example is that it sheds new light on the
phrase "is cast to the dynamic type of the other value". I had always assumed
that it was intended that one should cast to the primitive type of the other
value, that is, in your example, to cast to xs:string. In fact it never
occurred to me that casting to xs:Name could give a different result for the
comparison, but your example clearly shows that because of whitespace
normalization, it can.

I find it hard to believe that we really intended to require casting to the
derived type, because that would cause a large number of errors in places where
a false result is surely more reasonable. Also, instead of optimization using
indexes or hash tables being difficult, it becomes virtually impossible. It
would also defy expectations on substitutability: if a developer writes
//a[.=1000] in the knowledge and belief that a is typed as xs:int, it's
unreasonable that this should fail at run-time because someone has created a
subtype in which a is an xs:byte.

Michael Kay
Comment 2 Michael Kay 2007-10-24 08:22:27 UTC
Reassigned to XPath.

I realized that my rationale in comment #1 was OK in principle, but flawed in
the detail. Here is a better example.

Consider the following function

<xsl:function name="x" as="xs:boolean">
  <xsl:param name="y" as="xs:integer"/>
  <xsl:sequence select="exists($input//a[.=$y])"/>
</xsl:function>

where $input is untyped, and it is known that the <a> elements have values
whose lexical form makes them castable to integer.

Now it seems entirely unreasonable to me that this function should cause a
dynamic error when someone calls it supplying a value of type
xs:negativeInteger, merely because casting one of the <a> values to
xs:negativeInteger fails. The writer of the function should not have to defend
against that possibility.

Michael Kay
Comment 3 Hans-Juergen Rennau 2007-10-28 23:09:47 UTC
(In reply to comment #2)

It took me some time to fully understand the implications of your remarks! We
must pay attention to the particular situation when one operand is a formal
parameter, which implies: the query writer has no possibility whatsoever to
know *exactly* which subtype of the formal parameter type has been provided by
the function call (unless he himself wrote the call, of course).

The "right" to provide any subtype of the formal parameter type is really a
vital aspect of function call semantics! This implies a general rule,
pertaining to the semantics of any expression (excepting sequence-type related
expressions like "typeswitch"): the semantics should warrant that changing any
subexpression's type annotation to a derived type does not affect the
expression's evaluation result. (In a P.S. I try to formulate this rule more
formally.)

So your remarks reveal that the present semantics of general comparisons should
be changed indeed because they constitute a conceptual bug - not less.
(Although practical consequences will be very rare, because present rules a)
and b) exclude any trouble as long as the operand compared with the
untypedAtomic operand is any numeric type, or a string-derived type.) Here
comes a proposal for new rules, which should replace "3.5.2 General
Comparisons, rules a) to c):

<proposedNewText>
(a) If both atomic values are instances of xs:untypedAtomic, then the values
are cast to the type xs:string.
(b) If exactly one of the atomic values is an instance of xs:untypedAtomic, it
is cast to a type depending on the other value's dynamic type T according to
the following rules, in which V denotes the value to be cast:
(b1) If T is an instance of a numeric type, V is cast to xs:double
(b2) If T is an instance of xs:dayTimeDuration, V is cast to xs:dayTimeDuration
(b3) If T is an instance of xs:yearMonthDuration, V is cast to
xs:yearMonthDuration
(b4) In all other cases, V is cast to the primitive base type of T

Note:
The special treatment of the duration types is required to avoid errors that
may arise when comparing the primitive type xs:duration with any duration type.
</proposedNewText>

Finally, one question concerning the rule: 
<quote>
If a cast operation called for by these rules is not successful, a dynamic
error is raised.
</quote>

Might we not completely drop this rule? It constitutes a permanent threat to
queries' runtime safety, and what does it protect, which quality does it
assert? It seems quite natural to discard any value pair where the cast is not
possible as simply not having the required magnitude relationship.

With kind regards -
Hans-Juergen Rennau


P.S.
An attempt at a formal rule to be observed when defining expression semantics
in order to protect the "right" of a function caller to provide a formal
parameter's subtype.

<rule>
Consider an expression E containing a subexpression U which has the value V of
type T. Let E neither contain any type-related subexpression (like typeswitch)
nor explicitly refer to any type S that is a subtype of T (like "let $x as S :=
..."). For any value V1 from the value space of T let V2 be a value obtained by
replacing the type annotation of V1 by a subtype of T. Then XPath expression
semantics SHOULD guarantee the following rules, where E(V) denotes the value of
E, as dependent on the value V of subexpression U:
- if E(V1) raises an error, E(V2) raises an error
- if E(V1) evaluates to a value V3, E(V2) evaluates to a value V4 which can be
obtained from V3 by replacing the type annotation by a subtype.
</rule>

In particular, if a certain value V of the subexpression raises no error,
submitting the same value with a subtyped type annotation should also raise no
error. And this requirement is exactly what the present rules of 3.5.2 do not
meet.

P.P.S Privetik ot zheni i Marini.
Comment 4 Michael Kay 2007-10-29 09:03:32 UTC
Thanks for your comment - your proposed reformulation of the type conversion
rule seems very precise and (speaking personally of course) I favour it.

We had a lot of debates about the "fail vs. return false" question. On the
whole I was personally inclined to favour the "return false" approach. In fact
this semantic is the one that was eventually adopted for some analogous cases
including the functions distinct-values(), index-of(), and deep-equal() (and
also for key() in XSLT). It's also (more-or-less) the semantics we adopted for
pattern matching in XSLT: a failure during attempted matching is treated as
no-match.

Although I would have preferred the "return false" behaviour, I don't think
there is any rationale that would justify a change to the spec at this stage.
However, there is some latitude under the "errors an optimization" rules.

Arguably a conformant implementation could exploit the "errors and
optimization" rules to deliver false for the general comparison "a"=3. In
section 2.3.4 we discuss the example //product[id = 47]. We say "if an
implementation can find (for example, by using an index) the product
element-nodes that have an id child with the value 47, it is allowed to return
these nodes as the result of the path expression, without searching for another
product node that would raise an error because it has an id child whose value
is not an integer." I think this includes the case where the set of product
elements with id=47 is empty; in this case we can return an empty sequence
without testing that all id's are numeric. That's equivalent to returning false
rather than an error from the general comparison.

There is of course one big disadvantage to returning false in such cases - it
makes it quite hard for the user who has made a genuine mistake (like writing
id=47 instead of id='47') to work out what has gone wrong. This I think is the
reason the spec is written as it is.

Michael Kay
Comment 5 Michael Kay 2007-10-30 16:03:37 UTC
(Discussed on 2007-10-30. No clear consensus on what the original intention of
the WG was - some thought we intended the untypedAtomic value to be cast to the
primitive type, some that we intended it to be cast to the specific type,
others that we never gave the question any thought... Will come back to it.)
Comment 6 Don Chamberlin 2008-02-26 20:10:46 UTC
This bug report was discussed by the working group on 26 Feb 2008. The group
decided to accept the changes labeled as <proposedNewText> in Comment #3 and to
make no other changes. Hans-Juergen, if this resolution is acceptable to you,
please change the status of this bug to "Closed".
Regards,
Don Chamberlin (for the Query and XSL working groups)


  翻译: