Bug 5671 - [FO] Type promotion in fn:min and fn:max
: [FO] Type promotion in fn:min and fn:max
Status: RESOLVED FIXED
Product: XPath / XQuery / XSLT
Functions and Operators 1.0
: Candidate Recommendation
: PC Windows NT
: P2 normal
: ---
Assigned To: Michael Kay
: Mailing list for public feedback on specs from XSL and XML Query WGs
:
:
:
:
:
  Show dependency treegraph
 
Reported: 2008-04-30 13:59 UTC by Oliver Hallam
Modified: 2009-03-18 20:45 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oliver Hallam 2008-04-30 13:59:26 UTC
The summary for fn:min/fn:max says:
Selects an item from the input sequence $arg whose value is [less/greater] than
or equal to the value of every other item in the input sequence

However further down in their summaries:
This function returns an item from the converted sequence rather than the input
sequence.

Could this be worded more clearly.


Reading the rules for promotion:
Numeric and xs:anyURI values are converted to the least common type that
supports the [le/ge] operator by a combination of type promotion and subtype 

From reading this I would say that if your input is a single value of type
xs:unsignedShort, then you would return a value of type xs:integer, as this is
"the least common type that supports the [le/ge] operator"; however the XQTS
test K2-SeqMINFunc-15 seems to disagree with me here.  What is the correct
behaviour?
Comment 1 Michael Kay 2008-05-20 15:56:15 UTC
This was considered by the WGs on 20 May 2008. It was noted that the resolution
of bug #3358 proposed that the text should read "converted to their least
common type by a combination of type promotion and subtype substitution. ". The
actual text published was "converted to the least common type that supports the
ge operator by a combination of type promotion and subtype substitution". It is
not clear why the phrase "that supports the ge operator" was added, or what it
was supposed to mean. The discussion of bug #3358 makes it clear that the
intention of the WG was that the min() or max() of a sequence of hatsizes
should be a hatsize.

I therefore propose to delete the phrase "that supports the ge operator".

Regarding the summary, I think it's a major piece of editorial work to ensure
that all the function summaries are indeed summaries of the detailed rules for
each function. I would like to tackle this for the 1.1 release, but I don't
think piecemeal improvements by means of errata are appropriate in cases like
this.
Comment 2 Michael Dyck 2008-05-20 20:10:58 UTC
So I believe the implication is that the XQTS test K2-SeqMINFunc-15 is correct.
That is,
    min(xs:unsignedShort(<e>1</e>)) instance of xs:unsignedShort
yields
    true

But what about something like
    min((xs:unsignedShort(<e>1</e>), xs:positiveInteger(<e>2</e>)))
    instance of xs:unsignedShort
?

The least common supertype of xs:unsignedShort and xs:positiveInteger is
xs:nonNegativeInteger, so the text suggests that the two items in the input
sequence are converted to xs:nonNegativeInteger by subtype substitution,
yielding a 'converted sequence' consisting of two xs:nonNegativeInteger values,
one of which is returned as the result of the fn:min call, which then fails the
"instance of" test (i.e. yielding false for the whole expr).

However, XQuery 2.5.4 (SequenceType Matching) says:
    Subtype substitution does not change the actual type of a value.
    For example, if an xs:integer value is used where an xs:decimal
    value is expected, the value retains its type as xs:integer.
In terms of the example, if an xs:unsignedShort value is used where an
xs:nonNegativeInteger is expected, the value retains its type as
xs:unsignedShort. This suggests that the result of the fn:min call is actually
an xs:unsignedShort value, and the "instance of" test yields true.
Comment 3 Michael Kay 2008-05-20 20:57:15 UTC
>The least common supertype of xs:unsignedShort and xs:positiveInteger is
xs:nonNegativeInteger, so the text suggests that the two items in the input
sequence are converted to xs:nonNegativeInteger

Yes, precisely.

>Subtype substitution does not change the actual type of a value

The phrase about conversion to the least common type is used in a number of
places in the XPath/XQuery language specs. The formula used in some cases is
"converted to the least common type reachable by a combination of type
promotion and subtype substitution". Would you be more comfortable with that?

Incidentally, my interpretation of this rule is that it guarantees that the
result will be an instance of xs:nonNegativeInteger. It does not say that the
value might not also be an instance of some other type, such as
xs:unsignedShort. Functions (and expressions generally) are always free to
return a result that belongs to a subtype of the required type.
Comment 4 Tim Mills 2008-05-27 09:15:33 UTC
I believe that the upshot of this is that:

1. When no type promotion is required, we can always return an item (with its
type unchanged) from the input sequence.  The static type of the function call
will be the least common type of the input item types.

2. When type promotion is required, we can always return an item from the input
cast as the promoted type (which will be xs:decimal, xs:float, xs:double or
xs:string).  The static type of the function call will be the promoted type.
Comment 5 Michael Dyck 2008-07-08 00:21:59 UTC
[personal response:]

(In reply to comment #4)
> 
> 1. When no type promotion is required, we can always return an item (with its
> type unchanged) from the input sequence.

According to MKay's interpretation, you can. Perhaps some other interpretation
says you can't. Anyhow, the user isn't guaranteed that you will. (You might
return that item converted to a supertype.)

> The static type of the function call
> will be the least common type of the input item types.

Yes.

> 2. When type promotion is required, we can always return an item from the
> input cast as the promoted type (which will be xs:decimal, xs:float,
> xs:double or xs:string).

Any case in which the resultant type is xs:decimal wouldn't *require* type
promotion (you could do it all with subtyping), so you could leave that one off
the list.

> The static type of the function call will be the promoted type. 

Yes.
Comment 6 Michael Kay 2008-07-08 22:09:39 UTC
On 27 May 2008 (recorded in the minutes but sadly not here), the joint WGs
decided to resolve this by changing the text (in fn:max() and fn:min()) from

Numeric and xs:anyURI values are converted to the least common type that
supports the ge operator by a combination of type promotion and subtype
substitution.

to

Numeric and xs:anyURI values are converted to the least common type reachable
by a combination of type promotion and subtype substitution.

This decision was confirmed at the joint WG meeting on 7 July 2008.

I am marking this as resolved/fixed. Oliver, if you are content with this
resolution, I would be grateful if you could mark the bug as closed.

(Incidentally, I think comments #4 and #5 are correct)
Comment 7 Michael Kay 2008-07-08 22:21:01 UTC
Will be the subject of erratum E27
Comment 8 Oliver Hallam 2008-07-09 14:31:56 UTC
I am marking this bug closed.

However this solution does have ramifications for formal semantics, and the
typing rules should be updated (which are broken anyway - see bug #5459)
Comment 9 Michael Dyck 2008-07-19 08:04:46 UTC
On further reflection (working on Bug #5459), I don't think the new wording
correctly captures our intent.

Consider a sequence containing both values matching xs:anyURI and values
matching xs:string. I believe the intent is that all of those values will
be converted to xs:string (when forming the "converted sequence"). But if
we say "numeric and xs:anyURI values are converted to the least common type
reachable by a combination of type promotion and subtype substitution",
then we'll look at just the xs:anyURI values, convert them to a common type
(some subtype of xs:anyURI, possibly xs:anyURI itself), and leave the
xs:string values untouched. Then, when we say "All items in $arg must be
numeric or derived from a single base type for which the ge/le operator is
defined", it fails, because the xs:anyURI values and xs:string values are
not derived from a single base type. [I'm assuming that where it says
"$arg", it actually means "converted sequence", otherwise other things
happen.]

...

Also, in that latter quoted sentence, the "numeric or" is unnecessary,
since all numeric values have already been converted to a common type,
which certainly qualifies as "derived from a single base type".

We say "... a single base type for which the ge operator is defined. In
addition, the values in the sequence must have a total order." But does the
second sentence actually add anything?

It's odd that we would require values of two subtypes of xs:integer to be
converted to a common type (because they're numeric values), but not
require values of two subtypes of (say) xs:date to be converted to a common
type. Wouldn't it be correct to say that *all* values are converted to a
common type, not just numerics and xs:anyURI? (If so, it's redundant to say
"all items ... must be ... derived from a single base type".)

And it's odd that we say "Duration values must either all be
xs:yearMonthDuration values or must all be xs:dayTimeDuration values",
since surely that's implied by the "derived from a single base type"
requirement.
Comment 10 Michael Kay 2008-07-19 08:56:03 UTC
>But if we say "numeric and xs:anyURI values are converted to the least common type
reachable by a combination of type promotion and subtype substitution",
then we'll look at just the xs:anyURI values, convert them to a common type
(some subtype of xs:anyURI, possibly xs:anyURI itself), and leave the
xs:string values untouched.

My reading of "least common type" was "least common type of all the items in
the input sequence", not "least common type among the numeric and xs:anyURI
values". As you say, that latter reading wouldn't make sense.

>"All items in $arg must be
numeric or derived from a single base type for which the ge/le operator is
defined", it fails, because the xs:anyURI values and xs:string values are
not derived from a single base type. [I'm assuming that where it says
"$arg", it actually means "converted sequence", otherwise other things
happen.]

I think that where it says $arg, it means $arg, and that it fails to capture
the effective equivalence of xs:anyURI and xs:string.

>It's odd that we would require values of two subtypes of xs:integer to be
converted to a common type (because they're numeric values), but not
require values of two subtypes of (say) xs:date to be converted to a common
type. Wouldn't it be correct to say that *all* values are converted to a
common type, not just numerics and xs:anyURI?

Yes, it's a bit odd, but not odd enough to require a 1.0 change that will
impact existing implementations.

>And it's odd that we say "Duration values must either all be
xs:yearMonthDuration values or must all be xs:dayTimeDuration values",
since surely that's implied

It's not unusual, unfortunately, for the F+O spec to say things more than once
in different ways.
Comment 11 Michael Dyck 2008-07-19 09:56:20 UTC
(In reply to comment #10)
>
> My reading of "least common type" was "least common type of all the items in
> the input sequence",

Ah, I see. Well, I think that's a sufficiently non-obvious reading that it
should be made explicit.

> I think that where it says $arg, it means $arg, and that it fails to capture
> the effective equivalence of xs:anyURI and xs:string.

Okay, so that's a mistake, right? Also, it fails to capture the exception for
xs:untypedAtomic (i.e., you can have xs:untypedAtomic values in $arg even
though they're neither numeric nor derived from a type for which the ge
operator is defined).

Is it intended that $collation be ignored for comparison of xs:anyURI values?

> >It's odd that we would require values of two subtypes of xs:integer to be
> converted to a common type (because they're numeric values), but not
> require values of two subtypes of (say) xs:date to be converted to a common
> type. Wouldn't it be correct to say that *all* values are converted to a
> common type, not just numerics and xs:anyURI?
> 
> Yes, it's a bit odd, but not odd enough to require a 1.0 change that will
> impact existing implementations.

I'm not clear on how it would affect an existing implementation. If (for the
example above) an implementation returns a subtype-of-date value that hasn't
been converted to the common type, that would still be conformant, under the
interpretation you gave in Comment #3.
Comment 12 Michael Dyck 2009-01-09 20:07:13 UTC
Reopening, which I probably should have done at comment #9,
as the points I raised then and since haven't been resolved yet.
Comment 13 Michael Kay 2009-02-10 17:04:24 UTC
The WG agreed subject to detailed wording that we need to fix the sentence

"All items in $arg must be numeric or derived from a single base type for which
the ge operator is defined."

so that a sequence containing a mix of xs:string and xs:anyURI is acceptable.
Comment 14 Michael Kay 2009-02-14 21:21:45 UTC
Erratum E47 has been drafted to reflect this decision. It changes the wording
of the relevant paragraph from "All items in $arg must be numeric or ..." to
"All items in the converted sequence must be numeric or ...".
Comment 15 Michael Kay 2009-02-16 10:34:27 UTC
The problem noted in comment #9 also affects two other sentences.

The sentence "All items in $arg must be numeric or derived from a single base
type for which the ge operator is defined." should be changed to "All items in
the converted sequence must be derived from a single base type for which the ge
operator is defined." (There is no need to mention numerics as a special case
any more, since if they are numerics the condition will automatically be
satisfied).

The paragraph 

"If the items in the value of $arg are of type xs:string or types derived by
restriction from xs:string, then the determination of the item with the largest
value is made according to the collation that is used. If the type of the items
in $arg is not xs:string and $collation is specified, the collation is
ignored." 

should change to:

"If the items in the converted sequence are of type xs:string or types derived
by restriction from xs:string, then the determination of the item with the
largest value is made according to the collation that is used. If the type of
the items in the converted sequence is not xs:string and $collation is
specified, the collation is ignored."

I am revising the draft E47 accordingly.
Comment 16 Michael Dyck 2009-02-24 08:48:48 UTC
Those changes are improvements, but I think there's still the problem,
raised in comment #9, of the wording:
    Numeric and xs:anyURI values are converted to the least common type...
The question is: the least common type of what?  I claim that a plausible
answer is:
    the least common type of the numeric and/or xs:anyURI values
    in the input sequence
(i.e., the values identified in the sentence's subject), which leads to
unintended results. In comment #10, you say that your reading is:
    the least common type of all the items in the input sequence
which I say is a non-obvious reading.

Moreover, I believe the wording still indicates that, if you call the
function with a sequence of xs:anyURI values and a collation, the values
are compared using the default collation, not the supplied collation.
I'm still wondering if that's intended.
Comment 17 Michael Dyck 2009-03-15 01:59:56 UTC
Here is a specific proposal to resolve the remaining concerns expressed in
the previous comment.

In the second bullet, change
    * Numeric and xs:anyURI values ...
to just
    * Numeric values ...

and instead, handle xs:anyURI values in a new (second) bullet:
    * Values of type xs:anyURI are promoted to xs:string.

With that, I believe the question of "the least common type of what?"
becomes moot. (That is, you get the same result whether you think it means
"the least common type of the numeric values" or "the least common type
of all values".)

Also, it ensures that xs:anyURI values in the input sequence will
(due to their promotion to xs:string) be subject to the paragraph re
collation-aware comparisons. (I assume that's what we intended.)
Comment 18 Michael Kay 2009-03-18 20:45:10 UTC
The change in comment #17 has been added to the draft erratum E47, as decided
by the WG yesterday. Note that with the splitting of the "promotion" bullet
into two, it is no longer to specify that promotion is to a type having a le
operator, since all numeric types have such an operator.


  翻译: