Encoded nightmares!

Monday, December 24, 2007

Well everyone in web development that doesn't uses English as the default language for his/her applications might come in a rough position when he'll have to deal with data encoding. Let's see the following scenario...

You have finally completed your project! Everything works fine until the time the clients asks you to add a couple of languages to his application interface... it may be simple but it isn't. If you end up using the wrong encoding for you application the results are gonna be awful for one simple reason:

BAD ENCODING == DATA LOSS!

If you are into web design and especially in PHP you'll have to take precaution soon enough. PHP 5 doesn't provide much of (useful) encoding functions/extensions something that it going to change in the future (PHP 6).

A couple of small tips to get you started (I'm saying started because your are not going to solve your encoding problems in one night!):

<meta equiv="Content-type" content="text/html; charset=utf-8">

has nothing to do with PHP's:

header('Content-Type: text/html; charset=UTF-8')

In other words the encoding that your browser uses parses the data ma be different from the encoding that PHP/Apache/MySQL sends the data.. so be careful when setting the encodings at the different configuration files of your system.

Secondly you have to be sure that the encoding you use to save your files is "UTF-8 Without BOM". If you aren't sure which kind of encoding your text editor uses to save files you could take a look at Notepad++.

For further reading on encoding issues check these: