W3C W3C Internationalization (I18n) Activity: Making the World Wide Web truly world wide!

On this page

Related links

Current status

This index is still a work in progress. It doesn't yet point to all resources on the site. The content will also continually grow and change as resources are added to the site.

W3C I18N Techniques: Developing specifications

This page lists links to resources on the W3C Internationalization Activity site and elsewhere that help you perform particular tasks. It is a subpage of the techniques index.

Characters

In this section

Choosing a definition of 'character'

Best practices
How to's

Defining a Reference Processing Model

Best practices
    • Specifications MUST define text in terms of Unicode characters, not bytes or glyphs.
    • For their textual data objects specifications MAY allow use of any character encoding which can be transcoded to a Unicode encoding form.
    • Specifications MAY choose to disallow or deprecate some character encodings and to make others mandatory. Independent of the actual character encoding, the specified behavior MUST be the same as if the processing happened as follows:
      • The character encoding of any textual data object received by the application implementing the specification MUST be determined and the data object MUST be interpreted as a sequence of Unicode characters - this MUST be equivalent to transcoding the data object to some Unicode encoding form, adjusting any character encoding label if necessary, and receiving it in that Unicode encoding form.
      • All processing MUST take place on this sequence of Unicode characters.
      • If text is output by the application, the sequence of Unicode characters MUST be encoded using a character encoding chosen among those allowed by the specification.
    • If a specification is such that multiple textual data objects are involved (such as an XML document referring to external parsed entities), it MAY choose to allow these data objects to be in different character encodings. In all cases, the Reference Processing Model MUST be applied to all textual data objects.
How to's

Including and excluding character ranges

Best practices
How to's

Using the Private Use Area

Best practices
How to's

Choosing character encodings

Best practices
How to's
Background reading
  • What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents? W3C article.

Identifying character encodings

Best practices
How to's

Designing character escapes

Best practices
How to's

Storing text

Best practices
How to's

Specifying sort and search functionality

Best practices
How to's

Defining 'string'

Best practices
How to's

Indexing strings

Best practices
How to's

Referencing the Unicode Standard

Best practices
How to's

Resource identifiers

In this section

Defining protocol or format elements to be interpreted as URIs

Best practices
How to's

Defining a new syntax for URIs

Best practices
How to's

Date & time

In this section

Choosing a date format

How to's

Working with time zones

Best practices
How to's

Contact: Richard Ishida (ishida@w3.org).

Content last changed 2010-02-02 13:30 GMT

  翻译: