W3C I18N Techniques: Developing specifications
This page lists links to resources on the W3C Internationalization Activity site and elsewhere that help you perform particular tasks. It is a subpage of the techniques index.
Characters
In this section
- Choosing a definition of 'character'
- Defining a Reference Processing Model
- Including and excluding character ranges
- Using the Private Use Area
- Choosing character encodings
- Identifying character encodings
- Designing character escapes
- Storing text
- Specifying sort and search functionality
- Defining 'string'
- Indexing strings
- Referencing the Unicode Standard
Choosing a definition of 'character'
Best practices
How to's
-
In W3C Recommendation, Character Model for the World Wide Web.
See also
Defining a Reference Processing Model
Best practices
-
- Specifications MUST define text in terms of Unicode characters, not bytes or glyphs.
- For their textual data objects specifications MAY allow use of any character encoding which can be transcoded to a Unicode encoding form.
- Specifications MAY choose to disallow or deprecate some character encodings and to make others mandatory. Independent of the actual character encoding, the specified behavior MUST be the same as if the processing happened as follows:
- The character encoding of any textual data object received by the application implementing the specification MUST be determined and the data object MUST be interpreted as a sequence of Unicode characters - this MUST be equivalent to transcoding the data object to some Unicode encoding form, adjusting any character encoding label if necessary, and receiving it in that Unicode encoding form.
- All processing MUST take place on this sequence of Unicode characters.
- If text is output by the application, the sequence of Unicode characters MUST be encoded using a character encoding chosen among those allowed by the specification.
- If a specification is such that multiple textual data objects are involved (such as an XML document referring to external parsed entities), it MAY choose to allow these data objects to be in different character encodings. In all cases, the Reference Processing Model MUST be applied to all textual data objects.
How to's
-
Digital encoding of characters
In W3C Recommendation, Character Model for the World Wide Web.
Including and excluding character ranges
Best practices
How to's
-
Digital encoding of characters
In W3C Recommendation, Character Model for the World Wide Web.
See also
Using the Private Use Area
Best practices
How to's
-
In W3C Recommendation, Character Model for the World Wide Web.
Choosing character encodings
Best practices
How to's
-
Choice and identification of code points
In W3C Recommendation, Character Model for the World Wide Web.
Background reading
-
What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents? W3C article.
Identifying character encodings
Best practices
How to's
-
Choice and identification of code points
In W3C Recommendation, Character Model for the World Wide Web.
Designing character escapes
Best practices
Specifications SHOULD NOT invent a new escaping mechanism if an appropriate one already exists.
The number of different ways to escape a character SHOULD be minimized (ideally to one).
Specifications SHOULD NOT invent a new escaping mechanism if an appropriate one already exists.
Specifications SHOULD NOT invent a new escaping mechanism if an appropriate one already exists.
How to's
-
In W3C Recommendation, Character Model for the World Wide Web.
Storing text
Best practices
How to's
-
Visual rendering and logical order
In W3C Recommendation, Character Model for the World Wide Web.
Specifying sort and search functionality
Best practices
How to's
-
In W3C Recommendation, Character Model for the World Wide Web.
Defining 'string'
Best practices
How to's
-
In W3C Recommendation, Character Model for the World Wide Web.
Indexing strings
How to's
-
In W3C Recommendation, Character Model for the World Wide Web.
See also
Referencing the Unicode Standard
Best practices
How to's
-
Referencing the Unicode Standard and ISO/IEC 10646
In W3C Recommendation, Character Model for the World Wide Web.
Resource identifiers
In this section
Defining protocol or format elements to be interpreted as URIs
Best practices
How to's
-
Character encoding in resource identifiers
In W3C Recommendation, Character Model for the World Wide Web: Resource Identifiers .
Defining a new syntax for URIs
Best practices
How to's
-
Character encoding in resource identifiers
In W3C Recommendation, Character Model for the World Wide Web: Resource Identifiers .