Internationalization Tips For Your Website

The following tips will help prepare your website for localization.

1. Encode and serve your pages as Unicode

If you are planning on localizing your site into different languages than the best encoding you should choose for your web pages is Unicode. Unicode is a kind of numeric presentation of characters that is two bytes and so can contain up to 65535 characters to show. There are no currently spoken languages that have more than 65,535 characters, so by using Unicode as your codepage you should not run into problems with characters not displaying. To represent your web page as a Unicode page, specify the character set as UTF-8 rather than the default add below line of code in the head part of your page:

<meta HTTP-equiv="Content-Type" content="text/html; charset=utf-8">

Another benefit of using unicode to author your pages is that it uniquely allows you to display multiple character sets on the one page. This is of most benefit for pages where you wish to display more than one type of language e.g. introductory splash pages, language content lists etc. You will also need to ensure your web server is setup to send the page as Unicode to the browser. If you are using the Apache webserver you can do this by editing the .htaccess file and making sure that the AddDefaultCharset Directive is set as:

AddDefaultCharset utf-8

2. Use styles rather than <font> and use Unicode font faces in your css

Using styles rather than embedding <font> tags in your pages makes it easier for you to adjust text size for some languages site wide at a later stage. When you are creating your web pages in a language other than English, you are limited in using font faces. There are a few font faces in windows that are installed automatically and can show multilingual characters. If you like your visitors view your pages correct without any need to install fonts, you will need to use these multilingual fonts. The most common Unicode font face choices include Arial, Verdana, Times New Roman & Tahoma.

3. Consider authoring your documents in an XML language, for separation of style and content

XHTML and XML + XSLT allows your content source files to be translated efficiently. XML document types have widespread support in a number of translation memory applications, meaning that translators can work directly with your source files, reducing the time spent on formatting.

4. Consider a liquid flow of text to allow for differences in word length

German text on average uses 1/3rd more characters per words than English and Asian languages often take up much less character space than English words. Your design interface, layout and navigation must make considerations for these differences. Consider using CSS positioning for text elements or use flexible relative table widths where the cells expand or contract depending on character layout.

5. Navigation considerations

Some languages force you to apply some changes in the design and navigation of your website. For example, if you have a vertical menu bar, you may put it in the left of your pages for English pages and in the right for Arabic pages because Arabic is a RTL language and people are used to start reading pages from right to left. However this is not a rule, but it is better to consider it in designing pages.

6. Minimize text contained in graphics where possible

Graphics or images containing text, will takes longer to localize and are not as easy to translate as HTML text. Thus it is recommended to consider replacing text in a graphic with HTML text where possible. Often the formatting graphical text in Asian languages (double-byte character sets) may require special computing tools which can add to your cost & time resources. It is much easier to use text and encode your pages in Unicode.

7. Don't embed data and avoid use of CDATA sections

All data that is not text (e.g. javascripts, SQL queries) should be kept outside of the document if possible and instead linked with an include mechanism. Most translation tools do not handle CDATA very well and it can be hard to keep track of inline CDATA code segments, which can be overlooked in the translation process.