Monday, August 18, 2008

A Unicode Disaster

I was making an update to PHP site today: the podcast feed had quit working.
Turns out there was broken markup and all kinds of special characters in the content.
So I created a bunch of filters to scrub the content before sticking it in the podcast.
One of these converted a bunch of diacriticals to plain ASCII. I had the actual characters in the code, so when I went to save the file, my editor warned me about encoding and offered to set it to UTF-8.
All good. Saved away and started debugging the new code.
Then I got "PHP Warning: Cannot modify header information - headers already sent by (..."
I was stumped for a bit since my header statement was on the 1st line of the program.

After confirming nothing funky was going on with Apache or my php.ini, I remembered the encoding change.

My editor had set the encoding to "UTF-8 BOM."

BOM in this case means Byte Order Mark. It adds a few bytes to the beginning of the file which act as a signature to indicate the encoding and byte order for Unicode characters.
You generally won't see those extra bytes, because a "smart" editor won't display them.
But they are there and messing up the PHP file.

The solution in my case was trivial: just change the encoding type in the pop-up menu to "UTF-8 (no BOM)"

Technorati Tags: , , , ,

No comments:

Post a Comment

Please leave your comment here.