I’ve volunteered to help an international group update its web site. One of the first things I noticed about the site is that it has no character set specified ... and yes, there are problems in getting apostrophes to render properly on some pages, and likely other characters too.
The bulk of the text will be in English, but there will be non-Latin characters used from time to time. I know how to use the numerical codes to generate those, but I’m wondering: is there a character set I should choose that is optimal for rendering mixed-language text? Or am I overthinking this issue?
Thanks in advance for all contributions!
Anyone Got Some Charset Recommendations?

- Sunni's blog
- Login to post comments











Unicode
I'm no expert, but I believe that UTF-8 Unicode is the standard for international web sites.
Character Set?
I too endorse UTF-8. It finally seems to have supplanted ISO-8859-1 on the web. It will give you five times the characters available under the old western ISO. Though anyone whose browser cannot either auto-detect, support or manually change over to your web pages will see a block character or something strange appear on any higher bit characters. But that happens now with other character sets. The first 128 bits are standard between almost all and the most commonly used so it should not matter much.
Old advice
I picked this up a long time ago (and it may be so old as to be obsolete, but...): For greatest compatibility with a variety of browsers, use UTF-8 as the character set but encode all non-ASCII characters numerically.
UTF-8
Sunni, it's all about UTF-8 (charset=utf-8). These days you don't even need to use the numerical escaping of non-ASCII characters (at least not for mostly-common characters -- there are no guarantees for wacky characters like Klingon).
Sold, to the lovely octagonal charset over in the corner!
Thanks, all!
I know this will seem weird to most everyone, but after day after day of wrangling with Freedom News Daily, trying to ensure all characters were ASCII, it’s easier for me now to use the numeric codes. I use them for all my apostrophes, em and en dashes, quotation marks ... having the proper-looking punctuation or spelling (for Spanish and other words with non-Latin characters) is sufficiently important to me to do it. And the more I use them, the more I absorb into long-term memory.
Now, if only I could get them to supplant the Gilligan’s Island theme song ...
Good Choice
You Rock®! If only I were ½ as cool as you. ;-)
Heh.
∪ ℜ more cool than I, ∞. I ♥ your mad skillz.