Accesskey n skips to in-page navigation. Skip to the content start.

s_gotoW3cHome Internationalization
 

Test results: HTML character encoding basics

These tests check whether user agents recognise character encoding declarations for HTML and XHTML documents, and apply the expected prioritisation in case of mismatches between multiple declarations.

Summary & conclusions

See the results below for user agents tested. This section summarizes the results of those tests. In what follows, 'HTML' means HTML4.01, HTML5, or XHTML 1.0 served as text/html. XML means XHTML 1.0, XHTML5, or XHTML 1.1 served as application/xhtml+xml.

Basic declarations

All user agents detected character encodings declared in the HTTP header for HTML and XML.

All user agents use a UTF-8 BOM to set the page encoding in the absence of any other encoding information. Likewise for little- and big-endian UTF-16 BOMs (though only the HTML5 format was tested.)

The XML declaration was used by all XML pages, but was also used by HTML documents for all browsers tested apart from IE. This use for HTML is not specified. (Note that this test, like the others in this section, uses the XML declaration as the only source of information about encoding.)

The meta Content-Type element was used to set the encoding for HTML but not XML.

The HTML5 charset meta element was recognized for HTML, but not for XML by all browsers. (Note that this syntax is not described in the HTML 4.01 specification, but is in HTML5, so documents containing this declaration do not validate.)

If a charset attribute was added to an a element that pointed to an HTML file with no other encoding information, only Opera rendered the target page with the specified encoding.

Precedence

On the whole, the precedence rules described in the HTML5 specification are all followed by all browsers tested, ie. HTTP header trumps all; BOM overrides all in-document declarations; XML declaration wins over meta declarations for XML, and vice versa for HTML.

There are the following differences:

  1. A UTF-16 or UTF-8 BOM overrides the HTTP declaration for Internet Explorer, Safari and Chrome browsers.
  2. The meta Content-Type or meta charset declarations are stronger than the UTF-8 (but not the UTF-16) BOM in Firefox.

When both a meta Content-Type and HTML5 meta charset declaration appear in an HTML page, the first always trumps the second.

Latest results

These are results for the latest versions of each user agent tested. A green background (yes) means that the assertion associated with the test held true; red (no) means that it did not; orange (partially) means that it was only partially true. Mouse over the left column to view the assertion.

Results are given for four types of document format: H4 (HTML 4.01), H5 (HTML5), XH (XHTML 1.0 served as text/html), X (XHTML 1.0 served as XML), X5 (XHTML5), X11 (XHTML 1.1, served as XML). The UTF-16 tests are run only in the HTML5 format. Internet Explorer doesn't handle pages served as XML, so those tests are ignored in these results.

Basic declarations

UA IE Firefox Opera Safari Chrome
version 8 3.6.3 10.51 4.05 4.1.249.1045
OS XP XP XP XP XP
date 20100414 20100414 20100414 20100414 20100414
format H4,H5,XH H4,H5,XH X,X5,X11 H4,H5,XH X,X5,X11 H4,H5,XH X,X5,X11 H4,H5,XH X,X5,X11
HTTP charset declaration: Setting the HTTP header charset declaration will will affect the encoding of a page. yes
yes
yes yes
yes yes yes yes yes
UTF-8 BOM: A page with no encoding declarations, but with a UTF-8 signature will be recognized as UTF-8. no no yes no yes no yes no yes
UTF-16LE BOM: A page with no encoding declarations, but with a UTF-16 little-endian BOM will be recognized as UTF-16. yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1
UTF-16BE BOM: A page with no encoding declarations, but with a UTF-16 big-endian BOM will be recognized as UTF-16. yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1
XML declaration: Setting the encoding in the XML declaration will affect the encoding of a page served as XML, but will not affect pages served as text/html. yes no2 yes no2 yes no2 yes no2 yes
meta Content-Type charset declaration: Setting the encoding in the meta Content-Type element will affect the encoding of a page served as text/html, but will not affect pages served as XML. yes yes yes yes yes yes yes yes yes
HTML5 meta charset declaration: Setting the encoding in the HTML5 meta charset element will affect the encoding of a page served as text/html, but will not affect pages served as XML. yes yes yes yes yes yes yes yes yes
charset on an a element A link to a page using the a element with a charset attribute will cause a page with no other encoding information to be rendered using the encoding in the charset attribute. no no no yes yes no no no no

Assertion:  

Link to tests: Basic tests

Notes:

  1. The tests for the UTF16 BOM where only carried out for the HTML5 and XHTML5 formats.
  2. The XML declaration is unexpectedly used to determine the encoding for content served as HTML.

Precedence

UA IE IE Firefox Opera Safari Chrome
version 8 7 3.6.3 10.51 4.05 4.1.249.1045
OS XP XP XP XP XP XP
date 20100414 20100414 20100414 20100414 20100414 20100414
format H4,H5,XH H4,H5,XH H4,H5,XH X,X5,X11 H4,H5,XH X,X5,X11 H4,H5,XH X,X5,X11 H4,H5,XH X,X5,X11
HTTP vs UTF-8 BOM: If the HTTP header of a page is not set to UTF-8, a UTF-8 BOM will not cause a file to be treated as UTF-8. no
no
yes
yes yes
yes no no no no
HTTP vs UTF-16LE BOMIf the HTTP header of a page is not set to UTF-16, a UTF-16 little-endian BOM will not cause a file to be recognized as UTF-16. no1 no1 yes1 yes1 yes1 yes1 no1 no1 no1 no1
HTTP vs UTF-16BE BOMIf the HTTP header of a page is not set to UTF-16, a UTF-16 big-endian BOM will not cause a file to be recognized as UTF-16. no1 no1 yes1 yes1 yes1 yes1 no1 no1 no1 no1
HTTP vs XML declarationThe HTTP header has a higher precedence than an XML declaration. yes yes yes yes yes yes yes yes yes yes
HTTP vs meta Content-TypeThe HTTP header has a higher precedence than a meta Content-Type encoding declaration. yes yes yes yes yes yes yes yes yes yes
HTTP vs HTML5 metaThe HTTP header has a higher precedence than an HTML5 meta encoding declaration. yes yes yes yes yes yes yes yes yes yes
UTF-8 BOM vs XML declarationA page with a UTF-8 BOM will be recognized as UTF-8 even if the XML declaration declares a different encoding. yes
yes
yes
yes yes
yes yes yes yes yes
UTF-8 BOM vs meta Content-Type A page with a UTF-8 BOM will be recognized as UTF-8 even if the meta Content-Type declares a different encoding. no no no2 yes yes yes yes yes yes yes
UTF-8 BOM vs HTML5 meta charset A page with a UTF-8 BOM will be recognized as UTF-8 even if the HTML5 meta charset declares a different encoding. yes yes no2 yes yes yes yes yes yes yes
UTF-16LE BOM vs XML declaration A page with a UTF-16 little-endian BOM will be recognized as UTF-16 even if the XML declaration declares a different encoding. yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1
UTF-16LE BOM vs meta Content-Type A page with a UTF-16 little-endian BOM will be recognized as UTF-16 even if the meta Content-Type element declares a different encoding. yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1
UTF-16LE BOM vs HTML5 meta charset A page with a UTF-16 little-endian BOM will be recognized as UTF-16 even if the meta charset element declares a different encoding. yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1 yes1
XML declaration vs meta Content-Type The XML declaration has a higher precedence than a meta Content-Type encoding declaration for pages served as XML, but not for pages served as text/html. yes yes yes yes yes yes yes yes yes yes
XML declaration vs HTML5 meta The XML declaration has a higher precedence than an HTML5 meta encoding declaration for pages served as XML, but not for pages served as text/html. yes yes yes yes yes yes yes yes yes yes
meta Content-Type, then HTML5 meta A meta Content-Type encoding declaration has a higher precedence than a following HTML5 meta encoding declaration. yes yes yes yes yes yes yes yes yes yes
HTML5 meta, then meta Content-Type An HTML5 meta encoding declaration has a higher precedence than a following meta Content-Type encoding declaration. yes yes yes yes yes yes yes yes yes yes

Assertion:  

Link to tests: Precedence

Notes:

  1. The tests for the UTF16 BOM where only carried out for the HTML5 and XHTML5 formats.
  2. The UTF-8 BOM is ignored (unlike the UTF-16 BOM).

Older results

This section provides additional information about older UA versions, where that information exists. It does not provide enough information to trace the history of feature support in a given UA. The conventions used are as described in the previous section.

Basic declarations

UA IE1 IE1 Firefox Opera Safari Chrome
version 8 7 3.5.1 9.64 4.0.2 2.0.172.37
OS XP XP XP XP XP XP
date 20090717 20090717 20090717 20090717 20090717 20090717
format H4 XH X X11 H4 XH X X11 H4 XH X X11 H4 XH X X11 H4 XH X X11 H4 XH X X11
HTTP charset declaration yes yes n/a n/a yes yes n/a n/a yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes
XML declaration yes yes n/a n/a yes yes n/a n/a no1 no1 yes yes no1 no1 yes yes no1 no1 yes yes no1 no1 yes yes
meta Content-Type charset declaration yes yes n/a n/a yes yes n/a n/a yes yes yes yes yes yes no3 no3 yes yes yes yes yes yes yes yes
HTML5 meta charset declaration yes yes n/a n/a yes yes n/a n/a yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes

Notes:

  1. Internet Explorer doesn't handle pages served as XML, so those tests are ignored in these results.
  2. The XML declaration is used to determine the encoding in HTML4 and XHTML 1.0 served as text/html.
  3. XML pages use the meta Content-Type declaration to determine the encoding.

Precedence

UA IE1 IE1 Firefox Opera Safari Chrome
version 8 7 3.5.1 9.6.4 4.0.2 2.0.172.37
OS XP XP XP XP XP XP
date 20090717 20090717 20090717 20090717 20090717 20090717
format H4 XH X X11 H4 XH X X11 H4 XH X X11 H4 XH X X11 H4 XH X X11 H4 XH X X11
HTTP vs meta Content-Type yes yes n/a n/a yes yes n/a n/a yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes
HTTP vs HTML5 meta yes yes n/a n/a yes yes n/a n/a yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes
HTTP vs XML declaration yes yes n/a n/a yes yes n/a n/a yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes
XML declaration vs meta Content-Type yes yes n/a n/a yes yes n/a n/a yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes
XML declaration vs HTML5 meta yes yes n/a n/a yes yes n/a n/a yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes
meta Content-Type, then HTML5 meta yes yes n/a n/a yes yes n/a n/a yes yes yes yes yes yes no yes yes yes yes yes yes yes yes yes
HTML5 meta, then meta Content-Type yes yes n/a n/a yes yes n/a n/a yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes

Tell us what you think (English).

Subscribe to an RSS feed.

New resources

Home page news

Twitter (Home page news)

@webi18n

Further reading

Author: Richard Ishida, W3C.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Content first published 2009-04-16. Last substantive update 2009-07-18 7:00 GMT. This version 2010-04-12 12:02 GMT

For the history of document changes, search for results-lang-and-cjk-font in the i18n blog.

  翻译: