Vincent Gable’s Blog

September 5, 2008

ASCII is Dangerous

Never use NSASCIIStringEncoding

“Foreign” characters, like the ï in “naïve”, will break your code, if you use NSASCIIStringEncoding. Such characters are more common then you might expect, even if you do not have an internationalized application. “Smart quotes”, and most well-rendered punctuation marks, are not 7-bit ASCII. For example, that last sentence can’t be encoded into ASCII, because my blog uses smart-quotes. (Seriously, [thatSentence cStringUsingEncoding:NSASCIIStringEncoding] will return nil!)

Here are some simple alternatives:

C-String Paths
Use - (const char *)fileSystemRepresentation; to get a C-string that you can pass to POSIX functions. The C-string will be freed when the NSString it came from is freed.

An Alternate Encoding
NSUTF8StringEncoding is the closest safe alternative to NSASCIIStringEncoding. ASCII characters have the same representation in UTF-8 as in ASCII. UTF-8 strings will printf correctly, but will look wrong (‘fancy’ characters will be garbage) if you use NSLog(%s).

Native Foundation (NSLog) Encoding
Generally, Foundation uses UTF-16. It is my understanding that this is what NSStrings are by default under the hood. UTF-16 strings will look right if you print them with NSLog(%s), but will not print correctly using printf. In my experience printf truncates UTF-16 strings in an unpredictable way. Do not mix UTF-16 and printf.

Convenience C-Ctrings
[someNSString UTF8String] will give you a const char * to a NULL-terminated UTF8-string. ASCII characters have the same representation in UTF-8 as in ASCII.

Take a minute to search all your projects for NSASCIIStringEncoding, and replace it with a more robust option.

It never hurts to brush up on unicode.

Powered by WordPress