Heckroth Industries


UTF-8 and CSVs

Recently I have had to produce some CSVs using UTF-8 character encoding. The UTF-8 encoding is easy to do you just need to remember to set the header charset to be utf-8 when printing the CGI header.

use CGI;
my $CGI=new CGI;
print $CGI->header(-type=>'text/csv', -charset=>'utf-8', -attachment =>$filename);

Then you have to print the Byte Order Mark (BOM) which in hex is FEFF as the very first thing so that Excel will recorgnise the CSV as being in UTF-8 and not in its default character set.

print "\x{FEFF}";

Interestingly from what I can tell this BOM is actually for UTF-16, the BOM for UTF-8 should be 0xEFBBBF, but this didn’t seem to work with Excel.

Note: Usually the BOM is not recommended for UTF-8 as it can cause problems, but in the case of CSV’s that you want to open in Excel it is required.

Jason — 2011-04-13