SquirrelMail  
Donations
News
About
Support
Screen shots
Download
Plugins
Documentation
Sponsors
Bounties



SquirrelMail Developer's Manual: Internationalization Next Previous Contents

3. Internationalization

This chapter should explain how SquirrelMail internationalization works and provide information about some aspects of implementation.

3.1 Supported Languages

TODO: Remove statistics and add them dynamically to the i18n stat's page, using a script.

Valid language codes are (list depends on SquirrelMail version):

  • ar - Arabic, windows-1256 charset
  • bg_BG - Bulgarian, windows-1251 charset
  • bn_BD - Bangladeshi Bengali, utf-8 charset
  • bn_IN - Indian Bengali, utf-8 charset
  • ca_ES - Catalan, iso-8859-1 charset
  • cs_CZ - Czech, utf-8 charset
  • cy_GB - Welsh, iso-8859-1 charset
  • da_DK - Danish, iso-8859-1 charset
  • de_DE - German, iso-8859-1 charset
  • el_GR - Greek, iso-8859-7 charset
  • en_GB - British, iso-8859-15 charset
  • en_US - English, charset depends on $default_charset
  • es_ES - Spanish, iso-8859-1 charset
  • et_EE - Estonian, iso-8859-15 charset
  • eu_ES - Basque, iso-8859-1 charset
  • fa_IR - Persian (Farsi), utf-8 charset
  • fi_FI - Finnish, iso-8859-1 charset
  • fo_FO - Faroese, iso-8859-1 charset
  • fr_FR - French, iso-8859-1 charset
  • fy - Frisian, utf-8 charset
  • he_IL - Hebrew, windows-1255 charset
  • hr_HR - Croatian, iso-8859-2 charset
  • hu_HU - Hungarian, utf-8 charset
  • id_ID - Indonesian, iso-8859-1 charset
  • is_IS - Icelandic, iso-8859-1 charset
  • it_IT - Italian, utf-8 charset
  • ja_JP - Japanese, euc-jp charset (emails are created in iso-2022-jp)
  • ka - Georgian, utf-8 charset (since 1.5.1 and 1.4.6)
  • km - Khmer, utf-8 charset
  • ko_KR - Korean, euc-kr charset
  • lv_LV - Latvian, utf-8 charset
  • lt_LT - Lithuanian, utf-8 charset
  • mk - Macedonian, utf-8 charset
  • ms_MY - Malay, iso-8859-1 charset
  • nb_NO - Norwegian (Bokmal), iso-8859-1 charset
  • nl_NL - Dutch, iso-8859-1 charset
  • nn_NO - Norwegian (Nynorsk), iso-8859-1 charset
  • pl_PL - Polish, iso-8859-2 charset
  • pt_BR - Portuguese (Brazil), iso-8859-1 charset
  • pt_PT - Portuguese (Portugal), iso-8859-1 charset
  • ro_RO - Romanian, iso-8859-2 charset
  • ru_UA - Ukrainian Russian, koi8-r charset
  • ru_RU - Russian, utf-8 charset
  • si_LK - Sinhala, utf-8 charset
  • sk_SK - Slovak, iso-8859-2 charset
  • sl_SI - Slovenian, iso-8859-2 charset
  • sr_YU - Serbian, iso-8859-2 charset
  • sv_SE - Swedish, iso-8859-1 charset
  • ta_LK - Tamil (Sri Lanka), utf-8 charset
  • th_TH - Thai, tis-620 charset
  • tl_PH - Tagalog, iso-8859-1 charset (main translation is missing, only some plugins are translated)
  • tr_TR - Turkish, iso-8859-9 charset
  • ug - Uighur, utf-8 charset (some systems don't support Uighur system locale)
  • uk_UA - Ukrainian, koi8-u charset
  • vi_VN - Vietnamese, utf-8 charset
  • zh_CN - Chinese Simplified, gb2312 charset
  • zh_TW - Chinese Traditional, big5 charset

Charset totals:

  • iso-8859-1 = 20
  • iso-8859-2 = 6
  • utf-8 = 17
  • iso-8859-15 = 2
  • iso-8859-7 = 1
  • iso-8859-9 = 1
  • koi8-r = 1
  • koi8-u = 1
  • windows-1251 = 1
  • windows-1255 = 1
  • windows-1256 = 1
  • tis-620 = 1
  • gb2312 = 1
  • big5 = 1
  • euc-jp = 1
  • euc-kr = 1
  • TOTAL = 57

3.2 The $languages array

The $languages array is defined in functions/i18n.php (which has been moved to include/languages.php as of SquirrelMail 1.5.2) and defines translations that are enabled in SquirrelMail. Starting with SquirrelMail 1.5.1, only the English language entry is defined in the core; other languages are added automatically to the $languages array from a file in the locale pack: locale/<language_code>/setup.php.

Format of array:

    $languages['language_code']['key'] = 'value';

Possible array key names:

NAME

The name of the language in English. HTML encoding must be used for any 8-bit symbols.

ALTNAME

The native language name (the name of the language in that language itself). HTML encoding must be used for any 8-bit symbols. This name is shown when $show_alternative_names is enabled in config/config.php (or use the configuration utiltiy and choose "10. Language settings" --> "3. Show alternative language names") (for SquirrelMail 1.5.0 and up).

CHARSET

The character set used for the translation.

LOCALE

The full locale name (in "xx_XX.charset" format or other format required by PHP gettext functions). Starting with 1.4.4/1.5.1 and up, 'value' can contain an array. If the PHP version is older than 4.3.0, the system will use only the first locale name listed in the array. The first locale name must be compatible with the FreeBSD system locale names. Under all other setups, if the first locale is not supported, SquirrelMail will continue to try the other locales in the order given in this array. Developers can check for supported locale names on their system by checking the contents of, for example, /usr/lib/locale/ (RedHat/Fedora).

ALIAS

Any number of aliased language codes can be linked to a given translation by creating one ALIAS entry for each link. The 'language_code' for the alias should contain the aliased language code and the 'value' should contain the language code SquirrelMail uses to store the actual translation. For example, the Norwegian Translation is stored in SquirrelMail as "nb_NO", but it also has an alias for its ISO-639-1 code "nb". Note that while you may string together any number of aliases (for example, "xx" ==> "yy" ==> "no" ==> "nb_NO"), the final code must point to a language that has a NAME and CHARSET defined in the $languages array. Aliases are generally used to unify two and four-letter language codes for the same language (such as "sv" and "sv_SE", which should both be the same Swedish translation). See: ISO 639 list and country code list

DIR

The text direction of the language. This is used to indicate right-to-left languages and is not needed otherwise. Possible values are 'rtl' or 'ltr' (when undefined, it defaults to 'ltr').

XTRA_CODE

Indicates that the translation uses special functions. (see chapter about XTRA_CODE functions)

Note that each 'language_code' definition requires at least the NAME and CHARSET keys or just the ALIAS key. All other keys are optional.

3.3 XTRA_CODE functions

XTRA_CODE functions provide way to change interface behavior, when translation requires special handling of some SquirrelMail functions. Functions are enabled by setting XTRA_CODE option in $languages array and including appropriate functions in locale/language_code/setup.php (SquirrelMail 1.5.x) or functions/i18n.php (SquirrelMail 1.4.x). First part of function name is word listed in $languages['language_code']['XTRA_CODE'] value. Second part is one of special keywords. Possible keywords:

_decode

Used in src/compose.php, src/i18n.php, src/view_text.php, and functions/mime.php. Requires mbstring support.

_encode

Used in src/compose.php, and src/read_body.php.

_encodeheader

Used in functions/mime.php. Should accept one string argument and return correctly encoded MIME header string.

_decodeheader

Used in functions/mime.php. Returning function.

_downloadfilename

Used in functions/mime.php.

_utf7_imap_encode

Used in functions/imap_utf7_local.php. Returning function.

_utf7_imap_decode

Used in functions/imap_utf7_local.php. Returning function.

_strimwidth

Used in functions/mailbox_display.php. Returning function.

_wordwrap

Used in functions/strings.php (sqWordWrap).

3.4 Display of different charsets

When SquirrelMail generates HTML pages, it uses charset defined in translation selected by end user. Interface can display emails encoded in different charsets. In order to display characters that might be unsupported by user's charset, SquirrelMail uses decoding functions that convert non us-ascii symbols into HTML entities. All decoding functions are stored in functions/decode/ directory.

By default SquirrelMail includes decoding functions that support iso-8859-x, windows-125x, utf-8, us-ascii, koi8-r, koi8-u, tis-620, ns-4551_1, iso-ir-111, cp855 and cp866 charsets. Other decoding functions are distributed in separate packages. Separate packaging of decoding functions is supported from SquirrelMail 1.4.4 and 1.5.0. us-ascii decoding replaces all 8bit symbols with question marks. UTF-8 decoding function does not enable decoding of five and six byte UTF-8 symbols by default (code is commented) and replaces all incorrectly formated 8bit symbols with question marks.

Some decoding functions might require PHP recode extension or PHP 4.3+ mbstring extension. If your PHP installation does not support them, you might be using slower and cpu/memory intensive functions.

3.5 IMAP folder names

IMAP folder names use UTF7-IMAP charset. Folder names that are stored in conf.pl must be encoded in UTF7-IMAP charset. SquirrelMail uses internal functions that convert folder names from/to UTF7-IMAP charset. By default those functions work with ISO-8859-1 charset. Other charsets are supported only when PHP mbstring extension supports them.

TODO: Write independent implementation of charset to UTF7-IMAP conversion.

3.6 Plural forms

From v.1.5.1 SquirrelMail includes support for plural forms. It allows the use of correct translation forms with numbers. For example. "We have %s squirrel on the roof." and "We have %s squirrels on the roof." can be written in one function call without checking actual number of squirrels. The Gettext functions also deal with non English languages that might use different word forms for two, five, ten or more units.

Plural forms support is provided by ngettext functions that exist in the PHP Gettext extension as of PHP 4.2.0 and by ngettext function replacements from the php-gettext classes (http://savannah.nongnu.org/projects/php-gettext). In order to provide identical functionality when the PHP Gettext extension does not have ngettext support, SquirrelMail uses bindtextdomain and textdomain wrappers that load the missing functions.

If plugin authors want to use ngettext functions without increasing PHP requirements to 4.2.0 with Gettext support, they should require at least SquirrelMail 1.5.1, and use the sq_change_text_domain function instead of separate calls to bindtextdomain and textdomain. If sq_change_text_domain cannot be used, the sq_bindtextdomain function should be used instead of bindtextdomain and the sq_textdomain function should be used instead of the textdomain function. If these latter two SquirrelMail wrapper functions are used (but again, please use sq_change_text_domain), there is no need to issue a call to sq_bindtextdomain when a plugin reverts to the SquirrelMail domain.

More information about ngettext and plural forms can be found at: http://www.gnu.org/software/gettext/manual/html_chapter/gettext_10.html#SEC150

3.7 Language setup

SquirrelMail uses set_up_language() function to setup language environment. Environment is setup automatically when include/validate.php is loaded.

SquirrelMail gets interface language from three places: a) user preference. It is set in Options -> Display Preferences -> Language. preference uses language key. If user's preferences are not available (user is not logged in), system tries to extract language value from 'squirrelmail_language' cookie. b) default SquirrelMail language that is set in configuration ($squirrelmail_default_language variable). c) preferred language setting provided by browser. It is used only when default SquirrelMail language is set to empty string

If language information is not available, SquirrelMail falls back to US English translation.

3.8 Time zones

If the PHP installation allows modifying environment variable TZ, SquirrelMail allows the end users to select different time zone in their preferences. It can be set in Options -> Personal Information -> Your current timezone. Time zone is setup automatically when include/validate.php is loaded.

If TZ variable can't be modified (PHP is running is safe mode and variable is not listed in PHP safe_mode_allowed_env_vars), user's time zone options are not visible and interface use default webserver's time zone.

SquirrelMail 1.5.0 and older store list of available time zones in locale/timezones.cfg. Since 1.5.1 standard times zones are moved to include/timezones/standard.php and time zone handling differs from older SquirrelMail versions. Time zone configuration is controlled in SquirrelMail configuration utility (conf.pl), 4. General Options > 15. Time zone configuration menu option. Administrator can select standard, strict, custom and custom strict time zone handling.

Standard handling does not differ from previous SquirrelMail versions and SquirrelMail uses GNU C geographical location based time zone names. Strict handling uses time zone codes with offset from GMT. Strict time zones should work on systems that don't support GNU C time zone naming. Custom and custom strict handling uses config/timezones.php file instead of include/timezones/standard.php.

config/timezones.php file should store $aTimeZones array with different set of time zones. See default time zone set in include/timezones/standard.php. For example:

<?php
// World outside US border is a mirage

$aTimeZones=array();
$aTimeZones['America/New_York']['NAME']='US Eastern standard time';
$aTimeZones['America/New_York']['TZ']='EST5EDT';

$aTimeZones['America/Chicago']['NAME']='US Central standard time';
$aTimeZones['America/Chicago']['TZ']='CST6CDT';

// Oliver County, ND
$aTimeZones['America/North_Dakota/Center']['NAME']='US, Oliver County [ND]';
$aTimeZones['America/North_Dakota/Center']['TZ']='CST6CDT'; // CST since 1992

$aTimeZones['America/Denver']['NAME']='US Mountain standard time';
$aTimeZones['America/Denver']['TZ']='MST7MDT';

$aTimeZones['America/Los_Angeles']['NAME']='US Pacific standard time';
$aTimeZones['America/Los_Angeles']['TZ']='PST8PDT';

// Aliaska
$aTimeZones['America/Juneau']['NAME']='Aliaska, Juneau';
$aTimeZones['America/Juneau']['TZ']='NAST9NADT';
$aTimeZones['America/Yakutat']['NAME']='Aliaska, Yakutat';
$aTimeZones['America/Yakutat']['TZ']='NAST9NADT';
$aTimeZones['America/Anchorage']['NAME']='Aliaska, Anchorage';
$aTimeZones['America/Anchorage']['TZ']='NAST9NADT';
$aTimeZones['America/Nome']['NAME']='Aliaska, Nome';
$aTimeZones['America/Nome']['TZ']='NAST9NADT';
$aTimeZones['America/Adak']['NAME']='US, Aleutian Islands';
$aTimeZones['America/Adak']['TZ']='AST10ADT';

$aTimeZones['Pacific/Honolulu']['NAME']='US, Hawaii';
$aTimeZones['Pacific/Honolulu']['TZ']='UCT10';
$aTimeZones['America/Phoenix']['NAME']='US, Arizona';
$aTimeZones['America/Phoenix']['TZ']='MST7'; // gmt-7
$aTimeZones['America/Shiprock']['LINK']='America/Denver';

$aTimeZones['America/Boise']['NAME']='US, South Idaho';
$aTimeZones['America/Boise']['TZ']='MST7MDT';
$aTimeZones['America/Indianapolis']['NAME']='US, Indiana';
$aTimeZones['America/Indianapolis']['TZ']='EST5';
$aTimeZones['America/Indiana/Indianapolis']['LINK']='America/Indianapolis';
// Crawford County, Indiana
$aTimeZones['America/Indiana/Marengo']['NAME']='US, Crawford County [IN]';
$aTimeZones['America/Indiana/Marengo']['TZ']='EST5';
// Starke County, Indiana
$aTimeZones['America/Indiana/Knox']['NAME']='US, Starke County [IN]';
$aTimeZones['America/Indiana/Knox']['TZ']='EST5';
// Switzerland County, Indiana
$aTimeZones['America/Indiana/Vevay']['NAME']='US, Switzerland County [IN]';
$aTimeZones['America/Indiana/Vevay']['TZ']='EST5';
$aTimeZones['America/Louisville']['NAME']='US, Louisville [KY]';
$aTimeZones['America/Louisville']['TZ']='EST5EDT';
$aTimeZones['America/Kentucky/Louisville']['LINK']='America/Louisville';
// Wayne, Clinton, and Russell Counties, Kentucky
$aTimeZones['America/Kentucky/Monticello']['NAME']='US, Wayne, Clinton, and Russell Counties [KY]';
$aTimeZones['America/Kentucky/Monticello']['TZ']='EST5EDT';
// Michigan
$aTimeZones['America/Detroit']['NAME']='US, Michigan';
$aTimeZones['America/Detroit']['TZ']='EST5EDT';
// The Michigan border with Wisconsin switched from EST to CST/CDT in 1973.
$aTimeZones['America/Menominee']['NAME']='US, Menominee [MI]';
$aTimeZones['America/Menominee']['TZ']='CST6CDT';
?>

GNU C time zone naming should be supported by many Unix OSes. It is recommended way of setting time zone, because it handles historical changes and daylight savings specific to selected geographical location. Strict time zones might provide inaccurate or outdated time zone settings.

If modifications in TZ environment are visible in your webserver's logs (time offset is changed), make sure that you can reproduce such behavior in latest PHP version and report bug to PHP developers. Issue can be fixed by blocking use of time zone (PHP safe mode and TZ is not listed in safe_mode_allowed_env_vars setting or forced_prefs plugin) or by attaching special PHP script with putenv('TZ=some time zone') call in PHP auto_append_file setting (suggestion is not tested and you might have to fix all SquirrelMail exit calls).

Please note, that use of auto_append_file provides only temporally workaround and does not fix your PHP setup. Script that runs as unprivileged user, should be unable to affect webserver's logging system.

3.9 Sanitizing HTML strings

PHP provides the functions htmlspecialchars() and htmlentities() for HTML string sanitizing. When SquirrelMail developers want to sanitize HTML formating symbols, they should use htmlspecialchars() and avoid using htmlentities().

htmlentities() uses the ISO-8859-1 charset by default, sanitizing the ISO-8859-1 eight bit symbols. Other charsets use the same eight bit ranges to store different symbols, so this will break all translations not using ISO-8859-1.

Depending on the parameters, htmlspecialchars() only sanitizes three, four or five seven bit symbols (&, ", ', < and >). htmlspecialchars() only breaks HTML encoded strings using the ISO-2022 charsets. ISO-2022 charsets use seven bit ranges to store different symbols. The used encoding table depends on escape sequences present in ISO-2022 text.

A charset option is added to htmlentities() and htmlspecialchars() since PHP 4.1.0 but list of supported charsets is limited. The fallback charset is same good old and dangerous ISO-8859-1.

Differences between two functions can be examined with the get_html_translation_table() function.

If SquirrelMail charset decoding functions are used, they should apply htmlspecialchars() to the decoded string automatically. Don't try to use htmlspecialchars() twince on the same string, since that might break the decoded string.


Next Previous Contents
© 1999-2016 by The SquirrelMail Project Team