/* If you are uncertain of the correct encoding, you should use UTF-8, */ /* which is the encoding designated by RFC 2396 as the correct encoding */ /* for use in URLs.… */
— CFURL.h
This echos my experience, when in doubt, choose UTF8 for the web. UTF8 is backwards compatible with 7-bit ASCII (eg. ‘A’ is 0x41 in ASCII and UTF8).
But know that UTF8 is a variable-length encoding: non-ASCII characters maybe represented by > 1 byte. As a general rule with Unicode, I do not expect a char
or wchar_t
to always map to a character in a string. Encoding details can be messy, e.g. “É” might be represented as one character, or two composed characters “´E”. It never hurts to brush up on Unicode.